Problem
Role system prompts in src/runtime/roles.zig are currently static and hand-tuned. There's no systematic way to evaluate whether a prompt is optimal for its role, or to evolve prompts over time based on actual task performance.
Proposal
Implement an evolutionary prompt optimization loop inspired by quality-diversity (QD) search methods, specifically the Rainbow Teaming approach (arXiv:2402.16822).
Core idea
Cast prompt optimization as a quality-diversity problem:
- Quality: task success rate, code correctness, fix accuracy
- Diversity: coverage across different task types, codebase patterns, failure modes
Use open-ended search to generate prompt variants that are both effective and diverse, maintaining a MAP-Elites style archive of best prompts per niche.
Implementation sketch
- Prompt genome: each role's system prompt is a "genome" that can be mutated
- Fitness function: run the role on a benchmark task suite, measure success metrics
- Diversity dimensions: task category (bug fix, review, search), codebase size, language features used
- Selection: MAP-Elites archive keeps the best prompt per (quality, diversity) cell
- Mutation operators: LLM-guided rewriting — "make this prompt better at X while keeping Y"
- Evolution loop: generate variants → evaluate on benchmarks → archive best → repeat
Integration with grid
The evolved prompts feed back into the grid system:
grid.zig currently maps role → model tier
- Extend to map role → (model tier, prompt variant ID)
- Store winning prompts in
.devswarm/evolved_prompts/ per project
- Fall back to built-in defaults when no evolved prompts exist
Connects to
Why P0
The swarm now has 12 roles with structured prompts (#352). The next leverage point is making those prompts self-improving rather than hand-tuned. This is the foundation for the entire adaptive agent system.
Problem
Role system prompts in
src/runtime/roles.zigare currently static and hand-tuned. There's no systematic way to evaluate whether a prompt is optimal for its role, or to evolve prompts over time based on actual task performance.Proposal
Implement an evolutionary prompt optimization loop inspired by quality-diversity (QD) search methods, specifically the Rainbow Teaming approach (arXiv:2402.16822).
Core idea
Cast prompt optimization as a quality-diversity problem:
Use open-ended search to generate prompt variants that are both effective and diverse, maintaining a MAP-Elites style archive of best prompts per niche.
Implementation sketch
Integration with grid
The evolved prompts feed back into the grid system:
grid.zigcurrently maps role → model tier.devswarm/evolved_prompts/per projectConnects to
Why P0
The swarm now has 12 roles with structured prompts (#352). The next leverage point is making those prompts self-improving rather than hand-tuned. This is the foundation for the entire adaptive agent system.