Skip to content

RLCR: 23-round large-plan session — scope friction, stagnation false positives, and emergent architecture gaps #53

@zevorn

Description

@zevorn

Context

A 23-round RLCR session on a large systems project (30 tasks across 7 sequential phases, explicit dependency graph). The loop produced 37 commits, ~13.7K lines of new code, and 900+ passing tests before the circuit breaker terminated it after 3 consecutive "stalled" verdicts. 13/30 tasks were completed and reviewer-verified. The session revealed systemic friction in scope management, stagnation detection, ordering autonomy, and emergent architectural work.

Observations

What worked well

  1. Reviewer caught critical bugs through execution-level verification: The reviewer ran actual builds and test suites, catching 4 critical bugs that the implementer missed — a state synchronization regression causing infinite loops, a protection mechanism parameter mismatch, a condition variable lost-wakeup race, and a device access crash in generated code. These high-value catches justify the RLCR overhead.

  2. Goal tracker prevented task amnesia: All 30 original tasks remained tracked throughout 23 rounds — none "disappeared" from Active/Completed/Deferred. The immutable acceptance criteria section prevented goal drift at the top level.

  3. Evidence-based verification held the implementer accountable: The reviewer caught claims like "output visible" when the binary actually crashed with a fatal signal. Summary-only claims were consistently challenged with execution evidence.

Friction points

  1. Scope mismatch — no concept of loop milestones: The reviewer treated every pending task as an "unjustified deferral" (reported as high as 13 in one review) and refused COMPLETE until all 30 tasks were done. Even after Rounds 0-8 completed 18 tasks (Phases 1-4), the reviewer still blocked with "3 ACs not met, 12 tasks remaining." A 30-task plan is structurally infeasible for a single RLCR session with 42 max iterations.

  2. Stagnation detector conflates topic recurrence with no progress: Rounds 14-16 and 20-22 were marked STALLED because "the same topic keeps appearing," but each round produced substantial new code (one round added 517 lines of infrastructure). The circuit breaker triggered on Round 23 despite that round implementing the most architecturally significant change in the session. Topic recurrence ≠ stagnation when code delta is high.

  3. Reviewer directives create ordering deadlocks: The reviewer demanded task A (execution manager state) before task B (device access dispatch), but task B was the actual technical blocker — all execution crashed without it, making task A meaningless. This created a 10-round friction loop where the implementer chose the technically necessary path and the reviewer marked it as "deferring the mainline." Neither side could resolve the disagreement within the current protocol.

  4. Close-out claims drift on both sides: The implementer repeatedly over-claimed ("all 3 blocking issues resolved" when only 2 were fixed). The reviewer repeatedly raised the completeness bar between rounds (from "add interrupt routing" to "routing must be level-triggered with source-level resample on complete"). Neither side had a stable per-task "done" definition.

  5. Plans missed emergent architectural work: Three cross-cutting architectural concerns — not present in any of the 30 original tasks — consumed ~12 of 23 rounds. The plan's dependency graph was correct for known tasks but couldn't predict emergent work. One task required an execution bridge abstraction (3 rounds), another required device access in generated code (2 rounds), and a third required an ownership model redesign (7 rounds).

  6. Task granularity wildly uneven: Task sizes ranged from a 20-line single-struct addition to an 84-file mechanical rename to a task that expanded into 5 sub-problems over 12 rounds. The reviewer evaluated all tasks equally, penalizing the underscoped large task for 12 consecutive rounds of "not done."

  7. Dependency graph couldn't capture cross-phase emergent dependencies: The plan had Phase 5 → Phase 7 dependency. The real dependency chain included 2 unlisted intermediate tasks discovered during implementation. The linear phase model caused the reviewer to repeatedly ask "why isn't Phase 7 done?" when the actual blockers were architectural gaps not in the plan.

Suggested Improvements

# Suggestion Mechanism
1 Allow plans to define explicit loop milestones Plans can partition ACs into "this loop" vs "future loops." The reviewer evaluates against the current milestone, not the full plan.
2 Track code delta alongside topic recurrence in stagnation detection No-commit rounds are stalls; high-delta rounds on the same topic are iterative refinement. Add a "deep iteration" mode for complex subsystem work that relaxes the topic-recurrence heuristic.
3 Allow implementer to override reviewer ordering with rationale Round contracts can include a [rationale] field to explicitly disagree with reviewer ordering. The reviewer evaluates the rationale rather than auto-rejecting deviations. Alternatively, reviewer ordering preferences become [recommended] rather than [required].
4 Define per-task acceptance tests at plan creation time Each task gets binary completion criteria (e.g., specific test names that must pass). The reviewer cannot raise new requirements after acceptance criteria are set — new findings become new tasks.
5 Add "architecture risk" section to plans Plans should list areas where the implementation approach is uncertain. These become explicit "spike" tasks. Support mid-loop plan amendments for adding newly discovered tasks without the reviewer treating them as scope creep.
6 Classify tasks by estimated complexity Plans tag tasks as [S] (hours), [M] (day), [L] (multi-day), [XL] (architecture-level). XL tasks must decompose before the loop starts. The reviewer expects XL tasks to evolve and doesn't penalize intermediate completion states.
7 Support dynamic dependency insertion When an architectural discovery creates a new implicit task, it should be formally added to the dependency graph rather than absorbed into an existing task's ever-expanding scope.

Quantitative Summary

Metric Value
Total rounds 23
Exit reason Circuit breaker (3 consecutive STALLED verdicts)
ADVANCED verdicts 17
STALLED verdicts 5
Tasks completed (reviewer verified) 13/30
Commits 37
Lines inserted ~13,700
Lines deleted ~6,300
Tests at end 900+ passing
Critical bugs caught by reviewer 4
Rounds consumed by emergent architecture ~12/23 (52%)
Ordering deadlock rounds ~10
RLCR artifacts produced 24 reviews + 24 summaries + 20 contracts = 1.3 MB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions