CUCo uses Python dataclasses for configuration. There are four main config objects: EvolutionConfig, TransformConfig, DatabaseConfig, and JobConfig. Configuration can be specified either in Python (via run_evo.py) or via Hydra YAML (via cuco_launch).
Module : cuco/core/runner.py
Controls the slow-path evolutionary search.
Parameter
Type
Default
Description
task_sys_msg
Optional[str]
None
System message describing the optimization task, constraints, API knowledge, and hardware context. This is the primary prompt customization point.
language
str
"python"
Source language. Set to "cuda" for CUDA kernels. Affects prompt templates.
init_program_path
Optional[str]
None
Path to the seed program file.
results_dir
Optional[str]
None
Directory for evolution artifacts (generations, database, meta files).
Parameter
Type
Default
Description
patch_types
List[str]
["diff"]
Available mutation forms: "diff", "full", "cross".
patch_type_probs
List[float]
[1.0]
Sampling probabilities for each patch type. Must sum to 1.0.
max_patch_resamples
int
3
Times to retry patch generation if novelty check rejects it.
max_patch_attempts
int
5
Times to retry if patch application fails (malformed diff, etc.).
Parameter
Type
Default
Description
num_generations
int
10
Total generation budget for this run.
max_parallel_jobs
int
2
Maximum concurrent evaluation jobs.
job_type
str
"local"
Execution backend: "local", "slurm_docker", "slurm_conda".
Parameter
Type
Default
Description
llm_models
List[str]
["azure-gpt-4.1-mini"]
LLM models for mutation. Multiple models enable dynamic selection.
llm_dynamic_selection
Optional[str]
None
Model selection strategy: None (round-robin), "ucb" (bandit).
llm_kwargs
dict
{}
Extra kwargs: temperatures (list), max_tokens (int), reasoning_effort (str).
Parameter
Type
Default
Description
meta_rec_interval
Optional[int]
None
Generations between meta-summaries. None disables meta-summarization.
meta_llm_models
Optional[List[str]]
None
LLM models for meta-summarization. Falls back to llm_models.
meta_llm_kwargs
dict
{}
LLM kwargs for meta-summarization.
meta_max_recommendations
int
5
Number of optimization recommendations per summary.
Parameter
Type
Default
Description
max_novelty_attempts
int
3
Resamples before accepting a near-duplicate.
code_embed_sim_threshold
float
1.0
Cosine similarity threshold for rejection. Lower = stricter (0.995 is typical).
embedding_model
Optional[str]
None
Embedding model name. None disables embedding-based novelty.
novelty_llm_models
Optional[List[str]]
None
LLM models for novelty assessment.
use_text_feedback
bool
False
Include LLM text feedback in mutation prompts.
Pre-Transform (Fast-Path)
Parameter
Type
Default
Description
pre_transform_enabled
bool
False
Run fast-path before evolution.
pre_transform_pipeline_steps
List[str]
["analyze", "host_to_device", "evolve_markers", "warmup"]
Pipeline steps to run.
pre_transform_two_stage
bool
True
Split into infrastructure + replacement stages.
pre_transform_max_iterations
int
20
Max iterations (single-stage).
pre_transform_stage_a_max_iterations
int
5
Max iterations for infrastructure stage.
pre_transform_stage_b_max_iterations
int
10
Max iterations for replacement stage.
pre_transform_rewrite_model
Optional[str]
None
LLM for code generation. Falls back to first llm_models entry.
pre_transform_judge_model
str
""
LLM for judge feedback. Empty = same as rewriter.
pre_transform_reference_code_path
Optional[str]
None
Path to reference device-side example.
pre_transform_nccl_api_docs
str
""
NCCL API docs string for the rewriter.
pre_transform_agent
bool
False
Use Claude Code agent for transformation.
pre_transform_agent_model
str
"opus"
Claude model alias for agent mode.
pre_transform_warmup_model
Optional[str]
None
LLM for warmup injection.
pre_transform_marker_model
Optional[str]
None
LLM for evolve-block annotation.
Parameter
Type
Default
Description
task_sys_msg_per_island
Optional[Dict[int, str]]
None
Different task prompts per island.
init_program_paths_per_island
Optional[Dict[int, str]]
None
Different seed programs per island.
reference_code_per_island
Optional[Dict[int, str]]
None
Different reference code per island.
Parameter
Type
Default
Description
pre_transform_dual
bool
False
Run parallel LSA + GIN transformations.
pre_transform_lsa_reference_code_path
Optional[str]
None
Reference code for LSA transformation.
pre_transform_lsa_nccl_api_docs
str
""
NCCL docs for LSA transformation.
pre_transform_lsa_island_idx
int
0
Island index for LSA seed.
pre_transform_gin_island_idx
int
1
Island index for GIN seed.
Module : cuco/transform/transformer.py
Controls the fast-path host-to-device transformation.
Parameter
Type
Default
Description
rewrite_model
str
"bedrock/us.anthropic.claude-sonnet-4-6"
LLM for code generation.
judge_model
str
""
LLM for judge feedback. Empty = same as rewriter.
rewrite_max_tokens
int
32768
Max output tokens for rewrites.
judge_max_tokens
int
2048
Max output tokens for judge.
rewrite_temperature
float
0.0
Temperature for code generation.
judge_temperature
float
0.0
Temperature for judge.
Parameter
Type
Default
Description
nvcc_path
str
"/usr/local/cuda-13.1/bin/nvcc"
Path to nvcc compiler.
nccl_include
str
(NCCL include dir)
Path to NCCL headers.
nccl_static_lib
str
(NCCL static lib)
Path to libnccl_static.a.
cuda_lib64
str
(CUDA lib64 dir)
Path to CUDA runtime libraries.
mpi_include
str
(MPI include dir)
Path to MPI headers.
mpi_lib
str
(MPI lib dir)
Path to MPI libraries.
binary_name
str
"cuda_program"
Output binary name.
Parameter
Type
Default
Description
num_mpi_ranks
int
4
Number of MPI processes.
run_timeout
int
120
Timeout in seconds for each run.
cuda_visible_devices
str
"0,1,2,3"
GPU visibility mask.
hostfile
str
""
Path to MPI hostfile. Empty = local only.
mpirun_extra_args
tuple
()
Extra mpirun arguments (e.g., ("--map-by", "node")).
run_env_vars
dict
{}
Extra environment variables passed via -x.
Parameter
Type
Default
Description
max_iterations
int
5
Max iterations (single-stage mode).
verification_pass_str
str
"Verification: PASS"
Expected output for correctness.
api_type
str
"gin"
Target API: "gin" or "lsa".
two_stage
bool
True
Split into infrastructure + replacement.
stage_a_max_iterations
int
5
Max iterations for Stage A.
stage_b_max_iterations
int
10
Max iterations for Stage B.
Parameter
Type
Default
Description
reference_code
str
""
Working device-side example code.
nccl_api_docs
str
""
NCCL API documentation string.
Module : cuco/database/dbase.py
Controls the candidate database, islands, and selection strategies.
Parameter
Type
Default
Description
db_path
str
"evolution_db.sqlite"
SQLite database filename.
num_islands
int
4
Number of independent islands.
archive_size
int
100
MAP-Elites archive capacity.
Parameter
Type
Default
Description
elite_selection_ratio
float
0.3
Proportion of archive reserved for fitness elites.
num_archive_inspirations
int
5
Archive programs per mutation prompt.
num_top_k_inspirations
int
2
Top-k programs per mutation prompt.
Parameter
Type
Default
Description
migration_interval
int
10
Generations between migrations.
migration_rate
float
0.1
Fraction of island population to migrate.
island_elitism
bool
True
Keep best programs on their home islands.
enforce_island_separation
bool
True
Enforce full island separation for inspirations.
island_api_types
Optional[Dict[int, str]]
None
Per-island API types (e.g., {0: "lsa", 1: "gin"}).
migration_graph
Optional[Dict[int, List[int]]]
None
Directional migration (e.g., {0: [2], 1: [2]}).
Parameter
Type
Default
Description
parent_selection_strategy
str
"power_law"
Strategy: "power_law", "weighted", "beam_search".
exploitation_alpha
float
1.0
Power-law exponent. 0 = uniform, 1 = strong bias.
exploitation_ratio
float
0.2
Probability of picking from archive vs. population.
parent_selection_lambda
float
10.0
Sigmoid sharpness for weighted selection.
num_beams
int
5
Beam width for beam search selection.
Parameter
Type
Default
Description
embedding_model
str
"text-embedding-3-small"
Model for code embeddings.
Module : cuco/launch/scheduler.py
Controls how evaluation jobs are executed.
Parameter
Type
Default
Description
eval_program_path
str
""
Path to evaluate.py.
extra_cmd_args
str
""
Extra CLI arguments for evaluate.py.
For local execution via subprocess:
Parameter
Type
Default
Description
time
str
""
Timeout (e.g., "01:00:00"). Empty = no timeout.
conda_env
str
""
Conda environment name. Empty = use current env.
For Slurm execution with Docker:
Parameter
Type
Default
Description
image
str
""
Docker image name.
image_tar_path
str
""
Path to Docker image tar (for offline nodes).
docker_flags
str
""
Extra docker run flags.
partition
str
""
Slurm partition.
time
str
"01:00:00"
Slurm time limit.
cpus
int
4
CPUs per task.
gpus
int
1
GPUs per task.
mem
str
"32G"
Memory per task.
For Slurm execution with Conda:
Parameter
Type
Default
Description
conda_env
str
""
Conda environment name.
modules
List[str]
[]
Environment modules to load.
partition
str
""
Slurm partition.
time
str
"01:00:00"
Slurm time limit.
cpus
int
4
CPUs per task.
gpus
int
1
GPUs per task.
mem
str
"32G"
Memory per task.
CUCo reads credentials from a .env file in the repository root:
Variable
Provider
Description
AWS_ACCESS_KEY_ID
Bedrock
AWS access key
AWS_SECRET_ACCESS_KEY
Bedrock
AWS secret key
AWS_REGION_NAME
Bedrock
AWS region (e.g., us-east-1)
OPENAI_API_KEY
OpenAI
OpenAI API key
AZURE_OPENAI_API_KEY
Azure
Azure OpenAI API key
AZURE_API_VERSION
Azure
Azure API version
AZURE_API_ENDPOINT
Azure
Azure endpoint URL
DEEPSEEK_API_KEY
DeepSeek
DeepSeek API key
GEMINI_API_KEY
Gemini
Google Gemini API key
BEDROCK_BASE_URL
Bedrock OpenAI
Bedrock OpenAI-compatible base URL
BEDROCK_API_KEY
Bedrock OpenAI
Bedrock API key for OpenAI-compatible endpoint
See LLM Backends for provider-specific setup.
When using cuco_launch (the Bash entry point), configuration is loaded from YAML files in a configs/ directory via Hydra:
cuco_launch database=my_db evolution=my_evo
This looks for configs/database/my_db.yaml and configs/evolution/my_evo.yaml. The YAML structure mirrors the dataclass fields.
When using run_evo.py directly (as in the workloads), configuration is assembled in Python — no YAML files needed. This is the recommended approach for workload-specific setups.