Autonomous research orchestration framework. Give it a research goal, and it designs experiment campaigns, runs them, analyzes results, and documents novel discoveries.
Autolab builds on the autoresearch paradigm pioneered by Andrej Karpathy. Where autoresearch excels at optimizing a single metric in a tight loop, Autolab extends the idea to orchestrate research programs — multi-question investigations with campaign-based experiment design, literature-informed hypothesis generation, and rigorous discovery documentation.
pip install autolab
# Initialize a research project
autolab init "Optimize transformer inference latency on consumer GPUs"
# Run a campaign
autolab run campaigns/000_example.yaml
# Check status
autolab status
# Query results
autolab results --metric throughput --top 5
# Start autonomous research loop (requires API key)
export ANTHROPIC_API_KEY=sk-...
autolab loop --backend anthropic --max-iterations 10Research Directive (human)
|
v
Goal Decomposition --> Research Questions
|
v
Campaign Design --> YAML parameter grids
|
v
Experiment Execution --> Local, SSH, Docker, SLURM
|
v
Results Analysis --> SQLite DB, trend detection
|
v
Discovery Documentation --> DISCOVERIES.md with prior art verification
|
v
Repeat (with escape strategies when stuck)
Each research iteration follows the cycle:
- Orient — read journal, query DB, check progress
- Hypothesize — form testable question, write it down first
- Design — create campaign YAML targeting the hypothesis
- Execute —
autolab run campaigns/name.yaml - Analyze — query results, compare to baselines
- Document — update journal, add discoveries
- Commit — git commit the iteration
version: 1
name: batch_size_sweep
hypothesis: "Batch sizes >32 show diminishing throughput gains"
question: q1
moonshot: false
runner:
backend: local # local | ssh
command: "python train.py --batch-size {batch_size} --lr {lr}"
working_dir: ./experiments
timeout_seconds: 3600
defaults:
lr: 0.001
grid:
batch_size: [8, 16, 32, 64, 128] # 5 experiments (Cartesian product)
metrics:
primary: throughput
direction: maximize
collect:
- name: throughput
pattern: "Throughput: ([\\d.]+)"
- name: loss
pattern: "Loss: ([\\d.]+)"
stopping:
window: 3
threshold: 0.05
max_failures: 3Autolab is LLM-agnostic. The research loop can be driven by:
| Backend | How | Best for |
|---|---|---|
| Claude Code | Plugin with Ralph Wiggum loop | Richest: skills, hooks, native tools |
| Anthropic API | autolab loop --backend anthropic |
Direct API, any Claude model |
| OpenAI API | autolab loop --backend openai |
GPT-4o, o1, o3 |
| OpenAI-compatible | autolab loop --backend openai-compatible --base-url ... |
Ollama, vLLM, Together |
Install the plugin for the richest experience:
/research-loop Start autonomous research marathon
/campaign create Design a new campaign
/campaign run Run a campaign
/status Check progress
/literature Search prior art
/discover Document a finding
The plugin includes skills for research loop protocol, campaign design, and discovery writing, plus a Ralph Wiggum stop hook for long-running autonomous sessions.
Autolab enforces that 50% of campaigns (configurable) should be moonshots — experiments that challenge fundamental assumptions rather than incrementally tweaking parameters. This prevents convergence on local optima.
# autolab.yaml
strategy:
moonshot_ratio: 0.5 # 50% default
enforce: soft # soft | hardWhen the agent hasn't improved for 3+ consecutive iterations, Autolab triggers escape strategies:
- Literature search — find untried approaches
- Devil's advocate — argue against current approach, try the opposite
- Random perturbation — explore far outside tested parameter ranges
- New question — pivot to a completely different angle
Discoveries made with Autolab include attribution in DISCOVERIES.md:
Discovered with Autolab — autonomous research orchestration
See ATTRIBUTION.md for publication guidelines.
autolab/
├── src/autolab/
│ ├── core/ Campaign engine, research loop, plan, scheduler
│ ├── agents/ LLM backends (Anthropic, OpenAI, compatible APIs)
│ ├── runners/ Experiment execution (local, SSH)
│ ├── metrics/ Collectors, SQLite DB, trend analysis
│ ├── intelligence/ Literature search, escape strategies, discovery management
│ ├── state/ Project state, journal, git integration
│ └── scaffold/ Project initialization templates
├── plugin/ Claude Code plugin (commands, skills, hooks)
├── examples/
│ ├── ml-optimization/ Hyperparameter tuning
│ ├── distributed-inference/ Pipeline parallelism optimization
│ └── algorithm-design/ Sorting algorithm comparison
└── tests/ 118 tests
cd examples/ml-optimization
PYTHONPATH=../../src python3 -m autolab.cli run campaigns/001_lr_sweep.yamlcd examples/distributed-inference
PYTHONPATH=../../src python3 -m autolab.cli run campaigns/001_stage_count.yamlcd examples/algorithm-design
PYTHONPATH=../../src python3 -m autolab.cli run campaigns/001_algorithm_comparison.yamlgit clone https://github.com/t8/autolab.git
cd autolab
pip install -e ".[dev]"
pytest tests/ -vAutolab stands on the shoulders of:
- Andrej Karpathy — whose autoresearch concept proved that LLMs can autonomously run meaningful experiments. Autolab extends this from single-metric optimization to multi-hypothesis research programs.
- Geoffrey Huntley — who created the Ralph Wiggum loop technique for keeping Claude Code running autonomously. Autolab's research marathon mode is built on this pattern.
Apache 2.0. See LICENSE and ATTRIBUTION.md.
