A Claude Code skill that runs autonomous improvement loops on any codebase. Inspired by Andrej Karpathy's autoresearch.
The idea: give Claude a measurable objective, point it at your code, and let it experiment autonomously — proposing changes, testing them, keeping what works, discarding what doesn't, and repeating forever until you stop it.
/autoresearch "maximize pytest pass rate" "pytest -q 2>&1 | tail -1" "src/"
┌─────────────────────────────────────────────────────┐
│ AUTORESEARCH LOOP │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Propose │──→│ Implement│──→│ Commit │ │
│ │ (think) │ │ (edit) │ │ (git) │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ↑ │ │
│ │ ↓ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Log to │←──│ Evaluate │←──│ Run │ │
│ │results.tsv │(compare) │ │(measure) │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │
│ ↓ │
│ keep? → advance branch │
│ worse? → git reset --hard HEAD~1 │
│ crash? → fix or skip │
│ │
│ REPEAT FOREVER (never ask, never stop) │
└─────────────────────────────────────────────────────┘
Copy the skill to your Claude Code skills directory:
# User-level (all projects)
mkdir -p ~/.claude/skills/autoresearch
cp .claude/skills/autoresearch/SKILL.md ~/.claude/skills/autoresearch/SKILL.md
# Or project-level (current project only, after cloning this repo)
mkdir -p /path/to/your/project/.claude/skills/autoresearch
cp .claude/skills/autoresearch/SKILL.md /path/to/your/project/.claude/skills/autoresearch/Or one-liner from GitHub:
mkdir -p ~/.claude/skills/autoresearch && \
curl -sL https://raw.githubusercontent.com/labclaw/autoresearch-skill/main/.claude/skills/autoresearch/SKILL.md \
-o ~/.claude/skills/autoresearch/SKILL.md/autoresearch
Claude will ask you for:
- Objective — what to optimize (e.g., "minimize val_bpb", "maximize test pass rate")
- Measure command — how to measure it (e.g.,
pytest -q | tail -1) - Edit scope — which files Claude can modify (e.g.,
src/model.py) - Timeout — max time per experiment (default: 5 min)
/autoresearch "minimize bundle size" "npm run build 2>&1 | grep 'Total size'" "src/**/*.ts"
- Claude creates a
autoresearch/<tag>branch - Runs the baseline and records the starting metric
- Asks for confirmation once, then goes fully autonomous
- Experiments indefinitely — you can leave, sleep, come back
When you return, you get:
=== Autoresearch Summary ===
Branch: autoresearch/mar26
Experiments run: 47
Kept: 12
Discarded: 31
Crashed: 4
Baseline metric: 0.998
Best metric: 0.871 (improvement: -12.7%)
Top improvements:
1. a1b2c3d — increase LR to 0.04 — 0.993
2. b2c3d4e — add cosine schedule — 0.961
...
/autoresearch "minimize val_bpb" "uv run train.py > run.log 2>&1 && grep '^val_bpb:' run.log" "train.py"
/autoresearch "maximize test pass count" "pytest --tb=no -q 2>&1 | tail -1" "src/ tests/"
/autoresearch "minimize bundle size bytes" "npm run build 2>&1 | grep -oP 'Total: \K[\d.]+'" "webpack.config.js src/"
/autoresearch "minimize ruff warnings" "ruff check src/ 2>&1 | tail -1" "src/"
/autoresearch "maximize extraction accuracy" "python benchmark.py 2>&1 | grep accuracy" "src/pipeline/"
Directly from Karpathy's autoresearch:
| Principle | Rule |
|---|---|
| Autonomy | Never stop. Never ask. The human may be asleep. |
| Simplicity | Simpler code at equal metric = keep. Complexity for tiny gain = discard. |
| Git safety | One commit per experiment. Revert cleanly. Branch never corrupted. |
| Audit trail | Every experiment logged to results.tsv. No exceptions. |
| Scope | Only edit declared files. Never touch evaluation harness. |
| Karpathy's autoresearch | This skill |
|---|---|
Python script + program.md |
Claude Code skill (SKILL.md) |
Only edits train.py |
User-defined edit scope |
Only val_bpb metric |
Any measurable objective |
| 5-min GPU training | User-defined timeout + command |
| ML architecture only | Any code changes |
| Single GPU required | Any compute environment |
| Claude Code / Cursor required | Claude Code only |
The loop structure is identical: propose -> commit -> run -> evaluate -> keep/discard -> repeat.
This repo serves as a template for creating Claude Code skills. The key file is .claude/skills/autoresearch/SKILL.md:
---
name: your-skill-name
description: |
What it does and when to trigger it.
argument-hint: "[arg1] [arg2]"
allowed-tools:
- Bash
- Read
- Edit
---
# Your skill instructions here
Claude follows these instructions when the skill is invoked.
Use $ARGUMENTS for user input.
Use ```bash blocks for dynamic context injection.See the Claude Code docs for the full skill specification.
MIT