fix(mutation-boundary): diff committed subtask against its parent (#162)#171
Merged
azalio merged 1 commit intoJun 11, 2026
Merged
Conversation
`validate_step 2.4` auto-runs `validate_mutation_boundary`, which compared the working tree against the subtask's declared `affected_files`. In the documented per-subtask close order — commit → `record_subtask_result --commit-sha` → `validate_step 2.4` — the working tree is clean and `last_subtask_commit_sha` already points at the subtask's OWN commit. The diff against that commit is therefore empty, so `actual=[]` while `expected` is non-empty, tripping the false-progress guard with "MONITOR is closing ST-XXX but NO files changed" on every committed subtask and forcing a redundant second `validate_step 2.4` call. Fix: extract `_resolve_subtask_diff_base`, which re-bases onto the subtask commit's parent when the auto-resolved base equals the commit recorded for THIS subtask, so the committed work shows up as `actual`. The parent is probed first so a root commit safely keeps the commit itself. The same base-ref resolution (and the same latent bug) is shared by `_current_subtask_changed_files`, now also routed through the helper. The check is now commit-order-agnostic: it yields the same `actual` whether run before or after the per-subtask commit. Tests: - unit: committed subtask -> `actual==['a.py']`, base_ref re-based to `<sha>^` - unit (negative guard): no recorded commit + clean tree -> `actual==[]` so genuine false-progress still fires - orchestrator: documented commit→record→validate flow passes 2.4 on the FIRST call (no redundant second call), `progress_feedback_subtasks` untouched Negative-proofed: disabling the re-base reproduces the exact #162 message. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #162 —
validate_step 2.4false-progress reject on every committed subtask in/map-efficient.validate_step 2.4auto-runsvalidate_mutation_boundary, which diffs the working tree against the subtask's declaredaffected_files. But the documented per-subtask close order is:record_subtask_result --commit-shaadvanceslast_subtask_commit_shato the subtask's own commit. By the timevalidate_step 2.4runs, the working tree is clean andgit diff <own-commit>is empty →actual=[]whileexpectedis non-empty → the false-progress guard fires:The operator then had to call
validate_step 2.4a second time to get past the once-per-subtask nudge. Per the issue, this happened on every subtask of a 25-subtask plan.Fix
Implements the issue's suggested fix #1 (diff the recorded subtask commit against its parent). Extracted a shared
_resolve_subtask_diff_basehelper that:last_subtask_commit_sha→HEAD→None(fresh repo), as before;<sha>^) when the auto-resolved base equals the commit recorded for this subtask, so the committed work shows up in the diff;The same base-ref resolution — and the same latent bug — lived in
_current_subtask_changed_files(used by cross-subtask regression detection); it's now routed through the same helper, fixing the bug class everywhere (per the "fix every instance" rule).Net effect: the boundary check is now commit-order-agnostic — it computes the same
actualwhether run before or after the per-subtask commit.Tests
actual == ['a.py'], statusclean,base_ref == '<sha>^'actual == []so genuine false-progress still firesvalidate_step 2.4flow passes on the first call;progress_feedback_subtasksis not consumedNegative-proofed: disabling the re-base makes the orchestrator test reproduce the exact #162 message, then re-render restores green.
Validation
make checkgreen:ruff✓,mypy✓,pyright0/0/0, hook lint ✓, 2285 passed / 3 skippedmake check-render✓ (generated trees matchtemplates_src— change made in the.jinjasingle source and re-rendered)Closes #162
🤖 Generated with Claude Code