Cambrian

A self-reproducing code factory. Cambrian reads a specification, calls an LLM, and produces a complete working codebase — including a new instance of itself capable of doing the same thing.

Status

M2 Stage 1 ready. Bayesian Optimization loop is operational, crash-safe, and fully specced. 316 tests collected/passing.

What's done:

Phase 0: Supervisor, Test Rig, Docker image, gen-0 validated end-to-end
M1: Gen-1 ran 44 minutes, 10 total generations, 5 promoted (gen-4, gen-6, gen-8, gen-9, gen-10), all at 100% test pass rate. 474,834 cumulative tokens.
Pre-M2 hardening: 16-bead code review (specs, Python, Docker). Path traversal fix, field naming unification, Docker non-root user, 316 tests across multiple test files, all passing.
Verification layers: Three-layer anti-cheating model specced and Layer 1 (FROZEN spec acceptance vectors) implemented. Dual-blind examiner and adversarial red-team specced for M2 Tiers 1–2.
M2 Stage 1: Bayesian Optimization loop operational. Grammar-constrained spec mutations, mini-campaign screening, 15-dimension fitness vector, campaign runner, spec diff tooling all implemented and running.
Baseline campaign: 8/10 viable (80%) on gens 39–48 with unmodified spec.
Phenotypic distiller: AST-based post-campaign analysis that diffs viable vs failed and top-ranked vs bottom-ranked gens, automatically proposes spec amendments from observed code patterns (the Baldwin Effect — phenotypic excellence feeding back into the genome).
Adversarial round-two hardening (2026-04-09): structlog lint gate AST bug fix, expanded printf-format regex, baseline contract guards. Spec exemplar propagation: enhanced system prompt now lives in CAMBRIAN-SPEC-005 §3.5 so Gen-2+ inherits it.
Security hardening (2026-04-09): ANTHROPIC_API_KEY stripped from Test Rig containers; NetworkMode: none enforced for Test Rig and baseline containers.
Post-review hardening (2026-04-09e): reverse-run executes against an isolated workspace copy (historical promoted artifact not mounted directly), container hardening adds SecurityOpt=["no-new-privileges:true"] and CapDrop=["ALL"], generation-store append_only semantics are now truly non-overwriting, BO resume skips duplicate base-spec evaluation, and runtime system prompt is extracted from CAMBRIAN-SPEC-005 with tested fallback behavior.
Spec remediation R1+R2 + follow-up alignment (2026-04-09): 8 spec/code mismatches fixed; dead screen_mutation() removed; BO crash recovery (bo-observations.jsonl reload), generation auto-detection from /versions, GET /versions?campaign-id= filter, best-spec-meta.json output schema alignment. Specs now at CAMBRIAN-SPEC-005 v0.18.0 / BOOTSTRAP-SPEC-002 v0.14.0.

Next: Run full M2 campaigns (20 BO iterations) with distiller-informed mutations to determine whether spec mutations improve viability rate over the 80% baseline.

The Idea

Code is disposable. The specification is the genome.

An LLM-powered organism ("Prime") reads a spec and regenerates the entire codebase from scratch each generation. No diffs, no accumulated cruft, no path dependence. A mechanical test rig — no LLM involved — decides if the result is viable. If it is, the offspring replaces the parent. If not, it's discarded.

This came from Loom, which tried source-code-level self-modification in ClojureScript. Loom proved the pipeline works (72 generations, 1 autonomous promotion) and that editing existing code is the wrong abstraction. Cambrian applies that lesson: evolve the genotype (spec), regenerate the phenotype (code) from scratch.

Architecture

Three components (M2 note: Prime is a logical role; during campaigns the Supervisor invokes prime_runner instead of a long‑running Prime service):

Prime — The organism. Reads the spec, calls an LLM, produces a complete codebase, asks the Supervisor to verify it. Contains its own source, its spec, and its running process.
Supervisor — Host infrastructure. Manages Docker containers, tracks generation history, executes promote/rollback. In M2, orchestrates dual-blind and red-team verification. Not part of the organism — it persists across generations.
Test Rig — Mechanical verification. Builds the artifact, runs tests, starts the process, checks health contracts and FROZEN spec acceptance vectors. Returns a binary viability verdict. No LLM involved.

  ┌───────────────────────────────┐
  │ Prime (logical role)          │
  │ - LLM code synthesis          │
  │ - writes artifact + manifest  │
  └──────────────┬────────────────┘
                 │ in M2: prime_runner
                 │
                 ▼
  ┌───────────────────────────────┐
  │ Supervisor (host)             │
  │ - tracks generations          │
  │ - spawns Test Rig containers  │
  │ - promote / rollback          │
  └──────────────┬────────────────┘
                 │ spawn
                 ▼
  ┌───────────────────────────────┐
  │ Test Rig (container)          │
  │ - build / test / start        │
  │ - health + spec vectors       │
  │ - writes viability report     │
  └───────────────────────────────┘

Architecture Diagrams (M2 Stage 1, current)

These diagrams supersede older M1-era mental models that assumed a long-running Prime service/container as the active orchestrator.

High-level architecture:

Generation sequence:

Editable sources:

Repos

Repo	Purpose
cambrian	Specs, Supervisor, Test Rig, Docker, lab journal
cambrian-artifacts	Generated artifacts (gen-0, gen-1, ...) and generation history

Milestones

M1: Reproduce. ✓ Prime reads a spec, generates a working codebase, passes the test rig. The generated Prime can do the same. Completed 2026-03-29: 5 viable offspring, 474k tokens.
Pre-M2 Hardening. ✓ 3-phase code review, 87 integration tests, anti-cheating verification layers specced. Completed 2026-03-30.
M2: Self-modify. 🔄 In progress. Prime mutates its own spec and tests whether the mutation produces fitter offspring. BO loop operational, crash-safe (observations persisted), auto-detects generation numbers, and avoids duplicate base-spec evaluation on resume. Three verification layers prevent cheating: FROZEN spec vectors (implemented), dual-blind examiner, adversarial red-team. Specs at CAMBRIAN-SPEC-005 v0.18.0 / BOOTSTRAP-SPEC-002 v0.14.0.

Tech Stack

Everything is Python 3.14 for M1 (free-threaded build deferred to M2).

Component	Key Libraries
Async I/O	`aiohttp`, `aiodocker`, `asyncio`
Validation	`pydantic` v2 (all I/O boundaries)
Logging	`structlog` (JSON in containers, key-value in dev)
Type checking	`pyright` strict mode
Tooling	`uv`, `ruff`, `pytest` + `pytest-asyncio` + `pytest-aiohttp`

Project Structure

spec/
  CAMBRIAN-SPEC-005.md     — Genome spec (what Prime is — consumed by LLM)
  BOOTSTRAP-SPEC-002.md    — Bootstrap spec (Supervisor, Test Rig, Docker)
  SPEC-STYLE-GUIDE.md      — How to write specs
  archive/                 — Superseded specs (historical reference only)
supervisor/                — Host-side Supervisor (aiohttp server)
test-rig/                  — Mechanical verification pipeline
tests/                     — Integration tests (spec compliance, security, lifecycle)
scripts/                   — Campaign runners, BO loop entry point, analysis tools
docker/                    — Dockerfile and build script for cambrian-base
lab-journal/               — Discussion and decision logs

Quick Start

# Clone both repos side by side
git clone https://github.com/lispmeister/cambrian.git
git clone https://github.com/lispmeister/cambrian-artifacts.git

# Create .env with your API key
echo "ANTHROPIC_API_KEY=sk-ant-..." > cambrian/.env

# Build Docker base image
cd cambrian
./docker/build.sh

# Start Supervisor (terminal 1)
source .env
uv run python -m supervisor.supervisor

# Run M2 BO loop (terminal 2)
# CAMBRIAN_START_GENERATION defaults to "auto" — the loop queries /versions
# and picks up where the last run left off. Crash recovery is automatic:
# bo-observations.jsonl in the artifacts root is reloaded on restart.
source .env && \
  CAMBRIAN_BO_BUDGET=20 \
  CAMBRIAN_CAMPAIGN_LENGTH=5 \
  CAMBRIAN_MINI_CAMPAIGN_N=2 \
  CAMBRIAN_BO_INITIAL_POINTS=5 \
  CAMBRIAN_ESCALATION_MODEL=claude-sonnet-4-6 \
  uv run python scripts/run_m2.py

The BO loop runs until the budget is exhausted and writes best-spec.md if any viable spec is found. For a quick smoke test, use CAMBRIAN_BO_BUDGET=5 CAMBRIAN_CAMPAIGN_LENGTH=2 CAMBRIAN_MINI_CAMPAIGN_N=1 CAMBRIAN_BO_INITIAL_POINTS=3.

Generation lifecycle (what happens each run)

Prime (via prime_runner in M2) reads the spec and generates a full artifact + manifest.json.
Supervisor records the attempt and spawns a Test Rig container with the artifact mounted at /workspace and an isolated /output for the report.
Test Rig runs: manifest → build → test → start → health + spec vectors; writes /output/viability-report.json.
Supervisor ingests the report, updates generations.json, then promotes or rolls back.

Generational run notes

The canonical run history lives in ../cambrian-artifacts/generations.json.
Always start at the next unused generation number (last entry + 1).
If generations.json is missing or empty, start at generation 1.
M2 campaigns create artifacts under ../cambrian-artifacts/campaigns/<campaign-id>/gen-<N>/ but still increment the global generation counter.
For base-spec self-replication confidence runs, use uv run python scripts/run_gen0_campaign.py --generations 1 --model claude-sonnet-4-6. It writes artifacts and a summary.json under ../cambrian-artifacts/gen-0-campaigns/<campaign-id>/.
After a campaign, run the phenotypic distiller to identify spec improvement opportunities: uv run python scripts/distill_campaign.py ../cambrian-artifacts/gen-0-campaigns/<campaign-id>. It outputs a differential analysis report with proposed spec amendments.

M2 objective and approach (approachable summary)

Objective: make the spec itself evolvable. In M2 we mutate the spec (the genome) and test whether those mutations produce fitter offspring.

How we do it: a Bayesian Optimization loop proposes spec mutations, runs short campaigns, and scores them with a fitness vector derived from the Test Rig. Winning specs are kept; poor specs are rejected.

Lab journal and experimental data

The lab journal (lab-journal/) is the project’s honest narrative: decisions, failures, fixes, and experimental outcomes are logged chronologically so history can’t be rewritten.
Experimental data is captured mechanically in ../cambrian-artifacts/generations.json. Each generation attempt (success or failure) is recorded with its viability report. This file is the source of truth for run history and generation numbering.
M2 execution planning docs:
- docs/M2-STAGE1-SUCCESS-CHECKLIST.md
- docs/templates/m2-stage1-results-template.md
- scripts/summarize_m2_results.py (auto-generates campaign/aggregate tables from generations.json)

See CLAUDE.md for development conventions and issue tracking workflow.

Running M2 with Claude Code

The Supervisor and M2 loop can be orchestrated by Claude Code. Paste this at the start of a session in the cambrian/ directory:

We're working on the Cambrian project — a self-reproducing code factory in M2.

Key concepts:
- Prime is the organism: reads a spec (CAMBRIAN-SPEC-005.md), calls an LLM, generates a complete working codebase each generation.
- The Supervisor (supervisor/supervisor.py) is host infrastructure: manages Docker containers, tracks generation history, handles promote/rollback via HTTP API at port 8400.
- The Test Rig is a mechanical verifier: builds the artifact, runs tests, starts the process, checks health contracts. No LLM involved.
- M2 runs via scripts/run_m2.py — a Bayesian Optimization loop that mutates the spec and tests whether mutations improve viability rate.
- cambrian-artifacts/ (sibling repo) holds generated artifacts and generation history.

Environment:
- ANTHROPIC_API_KEY is in .env — load with: source .env
- Never use pip install directly; use uv
- Supervisor starts with: uv run python -m supervisor.supervisor
- Docker base image: cambrian-base:latest (rebuild with ./docker/build.sh after any test-rig changes)
- Default model: claude-sonnet-4-6; only use claude-opus-4-6 if asked
- Check bd ready for available work before starting anything new

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.beads		.beads
docker		docker
docs		docs
lab-journal		lab-journal
research		research
scripts		scripts
spec		spec
supervisor		supervisor
test-rig		test-rig
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
run_m2.log		run_m2.log
skills-lock.json		skills-lock.json
supervisor.log		supervisor.log
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cambrian

Status

The Idea

Architecture

Architecture Diagrams (M2 Stage 1, current)

Repos

Milestones

Tech Stack

Project Structure

Quick Start

Generation lifecycle (what happens each run)

Generational run notes

M2 objective and approach (approachable summary)

Lab journal and experimental data

Running M2 with Claude Code

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cambrian

Status

The Idea

Architecture

Architecture Diagrams (M2 Stage 1, current)

Repos

Milestones

Tech Stack

Project Structure

Quick Start

Generation lifecycle (what happens each run)

Generational run notes

M2 objective and approach (approachable summary)

Lab journal and experimental data

Running M2 with Claude Code

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages