Auto-improve loop refinement and enhancement

## Problem

The continuous improvement loop (`--loop`) works but has several rough edges that limit its effectiveness over long runs:

1. **Stuck detection is too coarse** — only triggers after 3 consecutive reverts in the *same* FEEC category. Three reverts across different categories doesn't trigger it, even though you're clearly stuck
2. **FEEC heuristic doesn't learn** — uses static keyword matching to categorize hypotheses. Doesn't incorporate outcome history (e.g., "FIX hypotheses succeed 80% of the time here, EXPLORE only 40%")
3. **Backlog is passive** — items are plain text with no scoring, prioritization, or dependency tracking. Loop mode ignores the backlog entirely
4. **ACE playbooks are injected once at agent startup** — no mid-cycle adaptation. If Builder violates playbook rules 3 times in a row, CEO doesn't know until the cycle ends
5. **Cycle state expires after 24 hours** — legitimate long cycles get abandoned. No cost tracking across cycles
6. **No loop-level telemetry** — can't see keep rate, cost efficiency, or hypothesis completion rate across cycles

## What's needed

### Smarter stuck detection
- Trigger on N consecutive reverts regardless of category (the current same-category requirement is too narrow)
- Add velocity metric: if revert rate > 60% over last 10 experiments, pause and suggest course correction
- Make stuck signals visible in CEO prompt so it can adapt

### FEEC learning
- Track which FEEC categories actually succeed for this project
- Weight recent experiments more heavily (temporal decay)
- Feed success rates back into strategy: "FIX has 75% keep rate, EXPLORE has 40%" → soft bias toward what works
- Don't disable categories, just reorder by learned effectiveness

### Active backlog management
- Store metadata with backlog items: category, estimated scope, tried count, dependencies
- In loop mode, process 1-2 backlog items per cycle alongside auto-detected work
- Graduate completed items to archive. Mark items tried 3x and reverted 3x as "needs research first"

### Tighter feedback loops
- After each agent completes, check alignment with playbook bullets in real-time
- Mid-cycle cost tracking: if cost_spent > 70% of budget, skip Explore, prioritize FIX only
- Replace 24h cycle TTL with task-based TTL (cycle lives until all planned hypotheses have verdicts)

### Loop telemetry
- Per-cycle metrics: hypotheses planned/completed, keep rate, avg score delta, cost, duration
- Rolling health indicators on the dashboard
- Cost efficiency metric: score_delta per dollar spent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-improve loop refinement and enhancement #208

Problem

What's needed

Smarter stuck detection

FEEC learning

Active backlog management

Tighter feedback loops

Loop telemetry

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Auto-improve loop refinement and enhancement #208

Description

Problem

What's needed

Smarter stuck detection

FEEC learning

Active backlog management

Tighter feedback loops

Loop telemetry

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions