Problem
The continuous improvement loop (--loop) works but has several rough edges that limit its effectiveness over long runs:
- Stuck detection is too coarse — only triggers after 3 consecutive reverts in the same FEEC category. Three reverts across different categories doesn't trigger it, even though you're clearly stuck
- FEEC heuristic doesn't learn — uses static keyword matching to categorize hypotheses. Doesn't incorporate outcome history (e.g., "FIX hypotheses succeed 80% of the time here, EXPLORE only 40%")
- Backlog is passive — items are plain text with no scoring, prioritization, or dependency tracking. Loop mode ignores the backlog entirely
- ACE playbooks are injected once at agent startup — no mid-cycle adaptation. If Builder violates playbook rules 3 times in a row, CEO doesn't know until the cycle ends
- Cycle state expires after 24 hours — legitimate long cycles get abandoned. No cost tracking across cycles
- No loop-level telemetry — can't see keep rate, cost efficiency, or hypothesis completion rate across cycles
What's needed
Smarter stuck detection
- Trigger on N consecutive reverts regardless of category (the current same-category requirement is too narrow)
- Add velocity metric: if revert rate > 60% over last 10 experiments, pause and suggest course correction
- Make stuck signals visible in CEO prompt so it can adapt
FEEC learning
- Track which FEEC categories actually succeed for this project
- Weight recent experiments more heavily (temporal decay)
- Feed success rates back into strategy: "FIX has 75% keep rate, EXPLORE has 40%" → soft bias toward what works
- Don't disable categories, just reorder by learned effectiveness
Active backlog management
- Store metadata with backlog items: category, estimated scope, tried count, dependencies
- In loop mode, process 1-2 backlog items per cycle alongside auto-detected work
- Graduate completed items to archive. Mark items tried 3x and reverted 3x as "needs research first"
Tighter feedback loops
- After each agent completes, check alignment with playbook bullets in real-time
- Mid-cycle cost tracking: if cost_spent > 70% of budget, skip Explore, prioritize FIX only
- Replace 24h cycle TTL with task-based TTL (cycle lives until all planned hypotheses have verdicts)
Loop telemetry
- Per-cycle metrics: hypotheses planned/completed, keep rate, avg score delta, cost, duration
- Rolling health indicators on the dashboard
- Cost efficiency metric: score_delta per dollar spent
Problem
The continuous improvement loop (
--loop) works but has several rough edges that limit its effectiveness over long runs:What's needed
Smarter stuck detection
FEEC learning
Active backlog management
Tighter feedback loops
Loop telemetry