What happened
PR #652 removed the enhancement category, the feature-request action, and consolidated type/feature label handling. The review agent examined only the 4 files in the diff and approved. The human reviewer caught stale references to type/feature in pre-triage.sh (lines 30, 36), shim-workflow-call.yaml (line 52), and fullsend.yaml (line 59) -- all outside the diff. The human also identified that the triaged label was missing from pre-triage.sh's label-strip list, a behavioral issue only visible by reading a file not in the PR.
The review agent ran 5 times (once per force-push) and never expanded its search scope beyond the changed files. Each review produced nearly identical Info-level findings.
What could go better
When a PR removes or renames a concept (enum value, label name, action type, function), the review agent should grep the codebase for remaining references to the old name. This is a high-confidence finding -- the pattern is clear and the fix is mechanical. The human explicitly called this out in cycle 1 (requesting changes because type/feature references remained), yet the agent's subsequent 4 reviews still never searched beyond the diff.
This is a recurring theme: issue #871 (code agent internal consistency) and #870 (review agent missing medium findings) both point to agents not checking cross-file consistency. The review agent's prompt or skill likely needs an explicit step for rename/removal PRs.
Uncertainty: I could not access the JSONL session artifacts (403 error, related to #834), so I cannot confirm exactly what tool calls the agent made. My analysis is based on the review comments it posted, which show no evidence of codebase-wide searches.
Proposed change
Add a review step to the review agent's skill or prompt (likely in the code-review skill or the agents/review.md definition in the .fullsend scaffold) that detects when a PR removes or renames identifiers (enum values, labels, function names, config keys). When detected, the agent should grep the full codebase for remaining references to the old terms and flag any hits as medium-severity findings.
Concretely, add a "staleness check" phase after the diff analysis: if the diff removes a string literal, enum value, or label name, run a repo-wide search for that string and report any remaining occurrences outside the changed files.
Validation criteria
On the next 3 PRs that rename or remove a concept (enum, label, config key), the review agent should identify stale references in files outside the diff. Verify by comparing the agent's review comments against a manual grep for the removed terms. The agent should flag at least the same references a human reviewer would catch.
Generated by retro agent from #652
What happened
PR #652 removed the
enhancementcategory, thefeature-requestaction, and consolidatedtype/featurelabel handling. The review agent examined only the 4 files in the diff and approved. The human reviewer caught stale references totype/featureinpre-triage.sh(lines 30, 36),shim-workflow-call.yaml(line 52), andfullsend.yaml(line 59) -- all outside the diff. The human also identified that thetriagedlabel was missing frompre-triage.sh's label-strip list, a behavioral issue only visible by reading a file not in the PR.The review agent ran 5 times (once per force-push) and never expanded its search scope beyond the changed files. Each review produced nearly identical Info-level findings.
What could go better
When a PR removes or renames a concept (enum value, label name, action type, function), the review agent should grep the codebase for remaining references to the old name. This is a high-confidence finding -- the pattern is clear and the fix is mechanical. The human explicitly called this out in cycle 1 (requesting changes because
type/featurereferences remained), yet the agent's subsequent 4 reviews still never searched beyond the diff.This is a recurring theme: issue #871 (code agent internal consistency) and #870 (review agent missing medium findings) both point to agents not checking cross-file consistency. The review agent's prompt or skill likely needs an explicit step for rename/removal PRs.
Uncertainty: I could not access the JSONL session artifacts (403 error, related to #834), so I cannot confirm exactly what tool calls the agent made. My analysis is based on the review comments it posted, which show no evidence of codebase-wide searches.
Proposed change
Add a review step to the review agent's skill or prompt (likely in the
code-reviewskill or theagents/review.mddefinition in the.fullsendscaffold) that detects when a PR removes or renames identifiers (enum values, labels, function names, config keys). When detected, the agent should grep the full codebase for remaining references to the old terms and flag any hits as medium-severity findings.Concretely, add a "staleness check" phase after the diff analysis: if the diff removes a string literal, enum value, or label name, run a repo-wide search for that string and report any remaining occurrences outside the changed files.
Validation criteria
On the next 3 PRs that rename or remove a concept (enum, label, config key), the review agent should identify stale references in files outside the diff. Verify by comparing the agent's review comments against a manual grep for the removed terms. The agent should flag at least the same references a human reviewer would catch.
Generated by retro agent from #652