Skip to content

feat: evolutionary prompt optimization — Rainbow Teaming QD loop for role system prompts #353

@justrach

Description

@justrach

Problem

Role system prompts in src/runtime/roles.zig are currently static and hand-tuned. There's no systematic way to evaluate whether a prompt is optimal for its role, or to evolve prompts over time based on actual task performance.

Proposal

Implement an evolutionary prompt optimization loop inspired by quality-diversity (QD) search methods, specifically the Rainbow Teaming approach (arXiv:2402.16822).

Core idea

Cast prompt optimization as a quality-diversity problem:

  • Quality: task success rate, code correctness, fix accuracy
  • Diversity: coverage across different task types, codebase patterns, failure modes

Use open-ended search to generate prompt variants that are both effective and diverse, maintaining a MAP-Elites style archive of best prompts per niche.

Implementation sketch

  1. Prompt genome: each role's system prompt is a "genome" that can be mutated
  2. Fitness function: run the role on a benchmark task suite, measure success metrics
  3. Diversity dimensions: task category (bug fix, review, search), codebase size, language features used
  4. Selection: MAP-Elites archive keeps the best prompt per (quality, diversity) cell
  5. Mutation operators: LLM-guided rewriting — "make this prompt better at X while keeping Y"
  6. Evolution loop: generate variants → evaluate on benchmarks → archive best → repeat

Integration with grid

The evolved prompts feed back into the grid system:

  • grid.zig currently maps role → model tier
  • Extend to map role → (model tier, prompt variant ID)
  • Store winning prompts in .devswarm/evolved_prompts/ per project
  • Fall back to built-in defaults when no evolved prompts exist

Connects to

Why P0

The swarm now has 12 roles with structured prompts (#352). The next leverage point is making those prompts self-improving rather than hand-tuned. This is the foundation for the entire adaptive agent system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions