Problem
Today all agents use the same model/runner regardless of task complexity. The Researcher gets the same model as the Archivist. There's no cost tracking per agent, no latency analysis, and no feedback loop from experiment outcomes to routing decisions.
Current state:
- Model is globally set via env var or
--runner flag — all agents get the same one
- Bob runner tracks invocation counts and duration, but Claude runner tracks nothing
- No token usage or cost attribution per agent role
- No correlation between model choice and experiment success
- FEEC strategy doesn't consider cost or time — just keyword-based priority
What's needed
The CEO should learn over time how to route work to minimize cost and latency while maintaining accuracy:
- Telemetry layer — track tokens, cost, latency, and outcome per (agent role, task type, model) tuple. Extend the existing events.jsonl infrastructure
- Routing statistics — build up a table: for each (role, hypothesis category), which model has the best success rate, cost, and latency?
- Smart routing decisions — before spawning an agent, CEO (or a dedicated router) selects model based on:
- Task complexity (scope of hypothesis, number of files likely affected)
- Budget remaining in cycle
- Historical success rate for this role + task type + model combo
- Latency constraints (is this blocking the critical path?)
- Graceful degradation — when approaching budget ceiling, automatically downshift to cheaper models for non-critical roles (Archivist, Researcher on simple lookups)
- ACE integration — feed routing statistics into playbook evolution so routing preferences are learned cross-project
Could be implemented as routing logic in runner.py, or as a dedicated Router agent that the CEO consults before each spawn.
Example routing decisions
| Role |
Task Type |
Route To |
Why |
| Builder |
Simple bugfix |
Haiku/Sonnet |
Low complexity, high success rate on cheap models |
| Builder |
Complex feature |
Opus |
Needs deep reasoning, worth the cost |
| Reviewer |
Guard check |
Opus |
Precision matters, false negatives are expensive |
| Archivist |
Record keeping |
Haiku |
Append-only, minimal reasoning needed |
| Researcher |
Deep analysis |
Opus |
Thorough research pays off |
| Researcher |
Quick lookup |
Sonnet |
Simple web search, fast turnaround |
Problem
Today all agents use the same model/runner regardless of task complexity. The Researcher gets the same model as the Archivist. There's no cost tracking per agent, no latency analysis, and no feedback loop from experiment outcomes to routing decisions.
Current state:
--runnerflag — all agents get the same oneWhat's needed
The CEO should learn over time how to route work to minimize cost and latency while maintaining accuracy:
Could be implemented as routing logic in
runner.py, or as a dedicated Router agent that the CEO consults before each spawn.Example routing decisions