You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue body drafted by the scan loop is the only spec the fix loop has. If calibration (flag, ignore) is miscalibrated, or if the draft/review steps produce a vague issue, the fix agent works from a bad spec — and there's no feedback until the groom loop closes the issue weeks later or the PR is reviewed.
This is a silent quality decay path: miscalibrated scan → vague issue → poor fix → stalled PR → human cleanup. Each step amplifies the original error.
Failure modes today
flag criteria are too broad → findings are noise → triage clusters junk → draft writes vague issues → fix loop attempts unfixable work
ignore criteria miss known patterns → scan re-files issues the team already decided to accept → issue accumulates agent-fix-stalled label each cycle
Review step approves a technically-valid-but-vague issue → fix agent guesses intent → PR diverges from intent
What's needed
Short term — better review gate:
The scan review step (prompts/scan/review.md) should explicitly reject issues where the "Definition of Done" is ambiguous or not independently verifiable by a future agent with no prior context. The current 4 rules catch format problems but not semantic vagueness.
Medium term — fix loop issue feedback:
After each fix loop run (success or failure), record a structured verdict on issue quality: was the issue body sufficient to implement from? Feed this signal back into the scan calibration docs as evidence of what makes a good vs. bad issue for this project.
Long term — calibration validation:
Add a command to audit how often each ignore pattern fires and how often flag criteria produce actionable vs. stale issues. Surface patterns that suggest the calibration is drifting.
Definition of Done (short term)
review.md includes an explicit criterion: "Definition of Done must be independently verifiable by an agent with no prior context"
Problem
The issue body drafted by the scan loop is the only spec the fix loop has. If calibration (
flag,ignore) is miscalibrated, or if the draft/review steps produce a vague issue, the fix agent works from a bad spec — and there's no feedback until the groom loop closes the issue weeks later or the PR is reviewed.This is a silent quality decay path: miscalibrated scan → vague issue → poor fix → stalled PR → human cleanup. Each step amplifies the original error.
Failure modes today
flagcriteria are too broad → findings are noise → triage clusters junk → draft writes vague issues → fix loop attempts unfixable workignorecriteria miss known patterns → scan re-files issues the team already decided to accept → issue accumulatesagent-fix-stalledlabel each cycleWhat's needed
Short term — better review gate:
The scan review step (
prompts/scan/review.md) should explicitly reject issues where the "Definition of Done" is ambiguous or not independently verifiable by a future agent with no prior context. The current 4 rules catch format problems but not semantic vagueness.Medium term — fix loop issue feedback:
After each fix loop run (success or failure), record a structured verdict on issue quality: was the issue body sufficient to implement from? Feed this signal back into the scan calibration docs as evidence of what makes a good vs. bad issue for this project.
Long term — calibration validation:
Add a command to audit how often each
ignorepattern fires and how oftenflagcriteria produce actionable vs. stale issues. Surface patterns that suggest the calibration is drifting.Definition of Done (short term)
review.mdincludes an explicit criterion: "Definition of Done must be independently verifiable by an agent with no prior context"rejection_reasonfield to make feedback more actionableOut of Scope