fix: allow co-authored test files in validate_mutation_boundary (#163)#170
Merged
Conversation
…on_boundary (#163) validate_mutation_boundary flagged co-authored test files (test_*.py, *_test.*, *.spec.*, conftest.py, anything under tests/) as "outside affected_files", emitting a scope warning that hard-stopped the first validate_step 2.4 on essentially every subtask under a test-alongside policy. The decomposer lists only production files in affected_files, so the accompanying test was always "unexpected". Fix (validator side, deterministic): partition out-of-scope paths by test convention. Test-convention paths are surfaced as `allowed_test_files` and excluded from `unexpected` (they stay in `actual`, keeping the false-progress check honest); real production scope leaks are still reported. This makes the check independent of decomposer description wording. Reuses the existing _is_test_path helper, extended to also recognize pytest conftest.py at any depth. Edited the templates_src .jinja single source and re-rendered all generated trees (make render-templates; check-render byte-clean). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_every_configured_hook_execs_via_shebang runs every configured hook past its MAP_INVOKED_BY guard to assert exec-ability (+x, shebang, no 126/127). map-memory-finalize.py, exec'd that way, calls `claude -p` to finalize a dirty scratch the loop's earlier memory hooks accumulate in the shared project. On a machine where the `claude` CLI is installed and authenticated, that subprocess runs up to its default 50s budget — past the probe's 20s cap — so the test timed out locally (it passes on CI, which has no `claude` CLI, so the subprocess fails fast there). Production finalize is unchanged: it correctly gets 50s under the 60s SessionStart budget. Bound MAP_MEMORY_FINALIZE_TIMEOUT=5 in the probe's env so the exec check never blocks on a real finalization. Test-only; no production behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Closes #163.
validate_mutation_boundarydistinguished a subtask's scope by exact match against the blueprint'saffected_files. Thetask-decomposerlists only production modules there, so a co-authored test file (test_foo.pybesidefoo.py, ortests/test_foo.pyin the common src/+tests split) was always "outside affected_files". Under a test-alongside policy (validation = pytest green) that produced a spurious Scope warning that hard-stopped the firstvalidate_step 2.4on essentially every subtask, forcing a redundant second call.The fix (validator side, deterministic)
Partition out-of-scope paths by test convention:
test_*.*,*_test.*,*.spec.*,*.test.*, anything under atests//test//__tests__/segment, and pytestconftest.py) are implied by the test-alongside policy. They are surfaced in a newallowed_test_filesreport key and excluded fromunexpected— they remain inactual, so reality (and the false-progress check) stays honest.unexpected(warning / strictviolation) exactly as before.This makes the check independent of decomposer description wording (no magic prose needed), and reuses the existing
_is_test_pathhelper — extended to also recognizeconftest.pyat any depth (a coherent improvement for its other caller, the cross-subtask regression-risk signal).Per the repo's single-source invariant, the change was made in
src/mapify_cli/templates_src/map/scripts/map_step_runner.py.jinjaand propagated viamake render-templates;make check-renderis byte-clean.Behavior
pipeline/test_foo.py(co-authored test)warning→ hard-stopclean(inallowed_test_files)tests/test_foo.py,tests/conftest.pywarningcleanb.py(real source leak)warningwarning(unchanged)MAP_STRICT_SCOPE=1violationcleanTests
tests/test_map_step_runner.py:clean, inallowed_test_files, still inactualsrc/+tests/tree (incl.conftest.py) →cleanunexpected, test partitioned intoallowed_test_files(the fix does not mask genuine leaks)violationeven underMAP_STRICT_SCOPE=1_is_test_pathrecognizesconftest.py/python/pipeline/conftest.py, still rejectscontest.pySecond commit (test-only, unrelated pre-existing flake)
make checksurfaced a pre-existing, local-only failure intest_every_configured_hook_execs_via_shebang: run past itsMAP_INVOKED_BYguard,map-memory-finalize.pyinvokesclaude -p(up to its default 50s budget) on a machine where theclaudeCLI is installed+authenticated, blowing past the probe's 20s cap. It's green on CI (noclaudeCLI → fast fail). Per the repo's "fix every surfaced error" rule it's bounded here viaMAP_MEMORY_FINALIZE_TIMEOUT=5in the probe env. Production finalize is unchanged (it gets 50s under the 60s SessionStart budget).Gate
make lint✅ ·make check-render✅ (generated trees matchtemplates_src)pytest→ 2282 passed, 3 skipped🤖 Generated with Claude Code