Skip to content

scan: poor-quality issue body is a bad spec that silently degrades fix quality #53

@ooloth

Description

@ooloth

Problem

The issue body drafted by the scan loop is the only spec the fix loop has. If calibration (flag, ignore) is miscalibrated, or if the draft/review steps produce a vague issue, the fix agent works from a bad spec — and there's no feedback until the groom loop closes the issue weeks later or the PR is reviewed.

This is a silent quality decay path: miscalibrated scan → vague issue → poor fix → stalled PR → human cleanup. Each step amplifies the original error.

Failure modes today

  • flag criteria are too broad → findings are noise → triage clusters junk → draft writes vague issues → fix loop attempts unfixable work
  • ignore criteria miss known patterns → scan re-files issues the team already decided to accept → issue accumulates agent-fix-stalled label each cycle
  • Review step approves a technically-valid-but-vague issue → fix agent guesses intent → PR diverges from intent

What's needed

Short term — better review gate:
The scan review step (prompts/scan/review.md) should explicitly reject issues where the "Definition of Done" is ambiguous or not independently verifiable by a future agent with no prior context. The current 4 rules catch format problems but not semantic vagueness.

Medium term — fix loop issue feedback:
After each fix loop run (success or failure), record a structured verdict on issue quality: was the issue body sufficient to implement from? Feed this signal back into the scan calibration docs as evidence of what makes a good vs. bad issue for this project.

Long term — calibration validation:
Add a command to audit how often each ignore pattern fires and how often flag criteria produce actionable vs. stale issues. Surface patterns that suggest the calibration is drifting.

Definition of Done (short term)

  • review.md includes an explicit criterion: "Definition of Done must be independently verifiable by an agent with no prior context"
  • At least one test case in the prompt eval harness (see prompts: no evaluation harness — regressions discovered only at runtime #50) covers a vague DoD being rejected
  • The review step's structured output includes a rejection_reason field to make feedback more actionable

Out of Scope

  • Automated calibration self-tuning
  • Rewriting existing open issues retroactively

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestscope:fixFix loop, implement, review, PR openingscope:scanScan loop, triage, draft, review pipeline

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions