Skip to content

fix loop: multi-round implement→review degrades because feedback is relayed through JSON, not shared context #54

@ooloth

Description

@ooloth

Problem

The fix loop's implement→review cycle relays reviewer feedback through a coordinator-constructed JSON blob, not through a shared conversation. This means:

  1. The implement agent on round N+1 sees only the feedback string the coordinator extracted from the review output — not the full review reasoning, the parts of the diff the reviewer found problematic, or the history of what was tried before.
  2. Round-by-round context accumulates in the coordinator as Python data, but is summarized rather than preserved verbatim when passed back to the implement agent.
  3. If a reviewer rejects for a subtle reason ("this fix is correct but changes more than necessary"), the implement agent on the next round has no way to see what "more than necessary" means in the specific diff context.

In practice: early rounds are productive; later rounds degrade into the implement agent repeating the same fix with minor variations, which the coordinator detects as "no progress" and escalates.

Root cause

Fresh context per step is the right architecture for independent steps (find, triage, draft). It's a poor fit for tightly-coupled, iterative steps (implement→review) where each round builds on the last.

Options

Option A: Richer feedback injection
Pass the full previous diff and the verbatim review output into the implement step prompt on each round, not just the extracted feedback string. This stays within the fresh-context architecture but gives the implement agent enough raw material to reason from.

Option B: Shared implement+review session
Run implement and review in a single multi-turn session where the reviewer's output is appended to the session before the implement agent continues. Abandons fresh-context-per-step for this specific loop.

Option A is simpler and preserves the architecture. Option B produces better multi-round behavior but increases complexity.

Definition of Done

  • Implement step prompt (round 2+) includes: the full previous diff, the verbatim review output, and the extracted feedback string
  • Test: mock reviewer rejects with a detailed reason; assert the next implement invocation receives the full review output in its prompt, not just the extracted string
  • Document the decision in docs/decisions/ (Option A vs B choice and rationale)

Out of Scope

  • Persisting context across separate fix loop runs (not rounds)
  • Shared session for steps other than implement→review

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestscope:agentClaude subprocess wrapper, transcript, timeoutscope:fixFix loop, implement, review, PR opening

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions