Skip to content

Auto-improve loop refinement and enhancement #208

@akashgit

Description

@akashgit

Problem

The continuous improvement loop (--loop) works but has several rough edges that limit its effectiveness over long runs:

  1. Stuck detection is too coarse — only triggers after 3 consecutive reverts in the same FEEC category. Three reverts across different categories doesn't trigger it, even though you're clearly stuck
  2. FEEC heuristic doesn't learn — uses static keyword matching to categorize hypotheses. Doesn't incorporate outcome history (e.g., "FIX hypotheses succeed 80% of the time here, EXPLORE only 40%")
  3. Backlog is passive — items are plain text with no scoring, prioritization, or dependency tracking. Loop mode ignores the backlog entirely
  4. ACE playbooks are injected once at agent startup — no mid-cycle adaptation. If Builder violates playbook rules 3 times in a row, CEO doesn't know until the cycle ends
  5. Cycle state expires after 24 hours — legitimate long cycles get abandoned. No cost tracking across cycles
  6. No loop-level telemetry — can't see keep rate, cost efficiency, or hypothesis completion rate across cycles

What's needed

Smarter stuck detection

  • Trigger on N consecutive reverts regardless of category (the current same-category requirement is too narrow)
  • Add velocity metric: if revert rate > 60% over last 10 experiments, pause and suggest course correction
  • Make stuck signals visible in CEO prompt so it can adapt

FEEC learning

  • Track which FEEC categories actually succeed for this project
  • Weight recent experiments more heavily (temporal decay)
  • Feed success rates back into strategy: "FIX has 75% keep rate, EXPLORE has 40%" → soft bias toward what works
  • Don't disable categories, just reorder by learned effectiveness

Active backlog management

  • Store metadata with backlog items: category, estimated scope, tried count, dependencies
  • In loop mode, process 1-2 backlog items per cycle alongside auto-detected work
  • Graduate completed items to archive. Mark items tried 3x and reverted 3x as "needs research first"

Tighter feedback loops

  • After each agent completes, check alignment with playbook bullets in real-time
  • Mid-cycle cost tracking: if cost_spent > 70% of budget, skip Explore, prioritize FIX only
  • Replace 24h cycle TTL with task-based TTL (cycle lives until all planned hypotheses have verdicts)

Loop telemetry

  • Per-cycle metrics: hypotheses planned/completed, keep rate, avg score delta, cost, duration
  • Rolling health indicators on the dashboard
  • Cost efficiency metric: score_delta per dollar spent

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions