Slow-Path Agent

The slow-path agent is CUCo's performance optimization engine. Starting from the fast-path baseline (generation 0), it runs an island-based evolutionary search with LLM-driven mutation to discover high-performance compute-communication kernels.

Evolution Loop

Module: cuco/core/runner.py Class: EvolutionRunner

For each generation:
  1. Select parent from population (fitness-weighted)
  2. Sample archive inspirations + top-k programs
  3. Assemble prompt (task msg + parent + inspirations + meta-recommendations)
  4. LLM generates mutation (diff / full rewrite / crossover)
  5. Apply patch to parent code
  6. Novelty check (reject near-duplicates)
  7. Submit for evaluation (build → run → score)
  8. Store result in database (including failures)
  9. Periodically: meta-summarization
  10. Periodically: island migration

Two-Phase Scheduling

Evolution is typically split into two phases:

Phase 1 — Explore (first 40% of budget):

70% full rewrites, 15% diffs, 15% crossover
Higher temperature (0.2, 0.5, 0.8)
Goal: discover structurally diverse architectures (multi-stream, fused kernel, split put/wait, warp specialization)

Phase 2 — Exploit (remaining 60%):

60% full rewrites, 25% diffs, 15% crossover
Lower temperature (0.0, 0.2, 0.5)
Goal: refine the best architectures found during exploration

The phase split is controlled by explore_fraction in run_evo.py. The key insight: without initial diversity from the explore phase, the search converges slowly to a lower-quality local optimum.

Mutation Forms

The LLM acts as a variation operator — not an open-ended code generator. Its output is structurally bounded by the EVOLVE-BLOCK markers.

Diff Patches

Localized SEARCH/REPLACE edits within evolve blocks. The LLM proposes specific code changes:

<<<< SEARCH
gin.put(world, r, recvwin, offset, sendwin, offset, size, ncclGin_SignalInc(0));
==== REPLACE
gin.put(world, r, recvwin, offset, sendwin, offset, chunk_size, ncclGin_SignalInc(0));
gin.put(world, r, recvwin, offset + chunk_size, sendwin, offset + chunk_size,
        size - chunk_size, ncclGin_SignalInc(1));
>>>>

Best for: fine-grained parameter tuning, adding synchronization, reordering operations.

Full Rewrites

Complete replacement of all code within EVOLVE-BLOCK markers. The LLM generates an entirely new implementation while preserving the frozen interface.

Best for: architectural changes (sequential → pipelined, single-kernel → multi-stream).

Crossover

Synthesis from multiple archive programs. The LLM receives 2-3 high-performing candidates and combines their best aspects.

Best for: combining complementary strategies (e.g., one program's stream topology with another's synchronization pattern).

Parent Selection

Module: cuco/database/parents.py

Three strategies are available:

Power-Law Sampling

Default. Programs are ranked by fitness, and selection probability follows a power law distribution. The exploitation_alpha parameter controls selection pressure:

alpha = 0 → uniform random (maximum exploration)
alpha = 1 → strong bias toward top programs (maximum exploitation)

Weighted Tree Sampling

Uses a sigmoid-based weighting over the fitness distribution. The parent_selection_lambda parameter controls sharpness — higher values concentrate selection on the best programs.

Beam Search

Maintains num_beams best programs and only selects from this beam. Most exploitative strategy.

Archive and Inspirations

Module: cuco/database/inspirations.py

Each mutation prompt includes context from other successful programs:

Archive Inspirations

Drawn from a MAP-Elites diversity archive that maintains structurally distinct high-performing solutions. The archive ensures the LLM sees programs outside the current lineage, analogous to crossover across distant population members.

Configuration:

archive_size — maximum archive capacity
num_archive_inspirations — how many to include per prompt
elite_selection_ratio — proportion of archive slots reserved for fitness elites

Top-K Inspirations

The highest-scoring programs overall, regardless of structural diversity. Provides the LLM with clear performance targets.

Configuration:

num_top_k_inspirations — how many to include per prompt

Meta-Summarizer

Module: cuco/core/summarizer.py Class: MetaSummarizer

Every meta_rec_interval generations, the meta-summarizer runs a three-step LLM pipeline:

Summarize: Digest the most recent batch of candidates — their scores, mutation types, architectural choices, and evaluation feedback.
Update scratchpad: Maintain a persistent global scratchpad tracking which strategies have been attempted, which succeeded, and which failed.
Recommend: Produce a ranked list of concrete optimization directions for the next generation.

These recommendations are injected into subsequent mutation prompts, creating a closed-loop meta-learning signal. Early recommendations may suggest exploring different fusion levels; later recommendations — informed by observing that multi-stream overlap consistently outperforms full fusion for a particular workload — redirect effort toward refining that strategy.

Output is stored in:

meta_memory.json — persistent state (summaries, scratchpad, recommendations)
meta_N.txt — human-readable snapshot at generation N

Novelty Filtering

Module: cuco/core/novelty_judge.py Class: NoveltyJudge

To prevent population collapse, candidates are checked for novelty before evaluation:

Embedding similarity: The candidate's code is embedded (via EmbeddingClient), and cosine similarity is computed against all existing database entries. If similarity exceeds code_embed_sim_threshold (default: 0.995), the candidate is rejected.
LLM novelty assessment (optional): An LLM judges whether the candidate introduces meaningful structural differences.

Rejected candidates are resampled up to max_novelty_attempts times.

Island Model

Module: cuco/database/islands.py

The search uses multiple independent islands to maintain diversity:

Assignment

Each program belongs to exactly one island. Islands can have different:

Seed programs (init_program_paths_per_island)
Task system messages (task_sys_msg_per_island)
Communication APIs (island_api_types: e.g., island 0 = LSA, island 1 = GIN)

Migration

Every migration_interval generations, top-performing programs are copied between islands:

Elitist migration: Copy the best migration_rate fraction of each island to randomly selected targets
Directional migration: Follow a configured migration_graph (e.g., LSA island → hybrid island ← GIN island)

Migration cross-pollinates successful patterns without collapsing diversity.

Cascade Evaluation

Every candidate passes through a three-level cascade:

Level	Check	Cost	On Failure
L1	Compile (nvcc)	Seconds	Store with score 0, feed compiler errors to next mutation
L2	Run + verify (mpirun)	Seconds-minutes	Store with score 0, feed runtime errors
L3	Benchmark (best of N runs)	Seconds-minutes	Store with measured score

Failed candidates are retained in the database with their diagnostics. They serve as negative examples that inform future mutations — a form of explicit negative selection absent from classical evolutionary methods.

LLM Feedback

At every cascade level, an LLM feedback agent receives the candidate code and evaluation outcome and generates a concise diagnostic. This feedback is stored with the candidate and injected when its lineage is later selected as a parent.

Candidate Database

Module: cuco/database/dbase.py Class: ProgramDatabase

All evaluated candidates — including failures — are persisted to an SQLite database. The database serves two roles:

Candidate pool: Source for parent selection, archive sampling, and inspiration retrieval across all islands.
Knowledge base: Backing store for the meta-summarizer, which queries historical results to distill cross-generation patterns.

Embedding-Guided Retrieval

Each candidate's code embedding enables:

Novelty filtering: Reject near-duplicates before evaluation
Nearest-neighbor lookup: Surface structurally similar programs and their feedback
Clustering: Group candidates by architectural similarity for visualization

Schema

The programs table stores:

Column	Type	Description
`id`	`TEXT`	Unique identifier
`code`	`TEXT`	Full source code
`generation`	`INTEGER`	Generation number
`island_idx`	`INTEGER`	Island assignment
`parent_id`	`TEXT`	Parent program ID
`combined_score`	`REAL`	Fitness score
`correct`	`INTEGER`	0 or 1
`public_metrics`	`TEXT`	JSON timing data
`text_feedback`	`TEXT`	LLM feedback
`embedding`	`BLOB`	Code embedding vector
`code_diff`	`TEXT`	Mutation diff from parent
`in_archive`	`INTEGER`	Whether in MAP-Elites archive

Configuration Summary

Key EvolutionConfig parameters for the slow-path agent:

Parameter	Default	Description
`num_generations`	`10`	Total generation budget
`patch_types`	`["diff"]`	Available mutation forms
`patch_type_probs`	`[1.0]`	Sampling probabilities
`llm_models`	`["azure-gpt-4.1-mini"]`	LLM models for mutation
`llm_kwargs`	`{}`	Temperature, max_tokens, etc.
`meta_rec_interval`	`None`	Generations between meta-summaries
`max_novelty_attempts`	`3`	Resamples before accepting a duplicate
`code_embed_sim_threshold`	`1.0`	Cosine similarity rejection threshold
`use_text_feedback`	`False`	Include LLM feedback in prompts
`embedding_model`	`None`	Model for code embeddings

Key DatabaseConfig parameters:

Parameter	Default	Description
`num_islands`	`4`	Number of independent islands
`archive_size`	`100`	MAP-Elites archive capacity
`migration_interval`	`10`	Generations between migrations
`parent_selection_strategy`	`"power_law"`	Selection algorithm

See Configuration Reference for the complete list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow-Path Agent

Evolution Loop

Two-Phase Scheduling

Mutation Forms

Diff Patches

Full Rewrites

Crossover

Parent Selection

Power-Law Sampling

Weighted Tree Sampling

Beam Search

Archive and Inspirations

Archive Inspirations

Top-K Inspirations

Meta-Summarizer

Novelty Filtering

Island Model

Assignment

Migration

Cascade Evaluation

LLM Feedback

Candidate Database

Embedding-Guided Retrieval

Schema

Configuration Summary

FilesExpand file tree

slow-path-agent.md

Latest commit

History

slow-path-agent.md

File metadata and controls

Slow-Path Agent

Evolution Loop

Two-Phase Scheduling

Mutation Forms

Diff Patches

Full Rewrites

Crossover

Parent Selection

Power-Law Sampling

Weighted Tree Sampling

Beam Search

Archive and Inspirations

Archive Inspirations

Top-K Inspirations

Meta-Summarizer

Novelty Filtering

Island Model

Assignment

Migration

Cascade Evaluation

LLM Feedback

Candidate Database

Embedding-Guided Retrieval

Schema

Configuration Summary