autoresearch-skill

A Claude Code skill that runs autonomous improvement loops on any codebase. Inspired by Andrej Karpathy's autoresearch.

The idea: give Claude a measurable objective, point it at your code, and let it experiment autonomously — proposing changes, testing them, keeping what works, discarding what doesn't, and repeating forever until you stop it.

/autoresearch "maximize pytest pass rate" "pytest -q 2>&1 | tail -1" "src/"

How it works

┌─────────────────────────────────────────────────────┐
│                AUTORESEARCH LOOP                     │
│                                                      │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐        │
│  │ Propose  │──→│ Implement│──→│  Commit  │        │
│  │ (think)  │   │ (edit)   │   │  (git)   │        │
│  └──────────┘   └──────────┘   └──────────┘        │
│       ↑                              │               │
│       │                              ↓               │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐        │
│  │  Log to  │←──│ Evaluate │←──│   Run    │        │
│  │results.tsv   │(compare) │   │(measure) │        │
│  └──────────┘   └──────────┘   └──────────┘        │
│       │                                              │
│       ↓                                              │
│  keep? → advance branch                             │
│  worse? → git reset --hard HEAD~1                   │
│  crash? → fix or skip                               │
│                                                      │
│  REPEAT FOREVER (never ask, never stop)             │
└─────────────────────────────────────────────────────┘

Install

Copy the skill to your Claude Code skills directory:

# User-level (all projects)
mkdir -p ~/.claude/skills/autoresearch
cp .claude/skills/autoresearch/SKILL.md ~/.claude/skills/autoresearch/SKILL.md

# Or project-level (current project only, after cloning this repo)
mkdir -p /path/to/your/project/.claude/skills/autoresearch
cp .claude/skills/autoresearch/SKILL.md /path/to/your/project/.claude/skills/autoresearch/

Or one-liner from GitHub:

mkdir -p ~/.claude/skills/autoresearch && \
curl -sL https://raw.githubusercontent.com/labclaw/autoresearch-skill/main/.claude/skills/autoresearch/SKILL.md \
  -o ~/.claude/skills/autoresearch/SKILL.md

Usage

Interactive setup

/autoresearch

Claude will ask you for:

Objective — what to optimize (e.g., "minimize val_bpb", "maximize test pass rate")
Measure command — how to measure it (e.g., pytest -q | tail -1)
Edit scope — which files Claude can modify (e.g., src/model.py)
Timeout — max time per experiment (default: 5 min)

One-shot

/autoresearch "minimize bundle size" "npm run build 2>&1 | grep 'Total size'" "src/**/*.ts"

What happens

Claude creates a autoresearch/<tag> branch
Runs the baseline and records the starting metric
Asks for confirmation once, then goes fully autonomous
Experiments indefinitely — you can leave, sleep, come back

When you return, you get:

=== Autoresearch Summary ===
Branch: autoresearch/mar26
Experiments run: 47
Kept: 12
Discarded: 31
Crashed: 4
Baseline metric: 0.998
Best metric: 0.871 (improvement: -12.7%)

Top improvements:
1. a1b2c3d — increase LR to 0.04 — 0.993
2. b2c3d4e — add cosine schedule — 0.961
...

Examples

ML training (Karpathy-style)

/autoresearch "minimize val_bpb" "uv run train.py > run.log 2>&1 && grep '^val_bpb:' run.log" "train.py"

Test coverage

/autoresearch "maximize test pass count" "pytest --tb=no -q 2>&1 | tail -1" "src/ tests/"

Build size

/autoresearch "minimize bundle size bytes" "npm run build 2>&1 | grep -oP 'Total: \K[\d.]+'" "webpack.config.js src/"

Code quality

/autoresearch "minimize ruff warnings" "ruff check src/ 2>&1 | tail -1" "src/"

API accuracy

/autoresearch "maximize extraction accuracy" "python benchmark.py 2>&1 | grep accuracy" "src/pipeline/"

Design principles

Directly from Karpathy's autoresearch:

Principle	Rule
Autonomy	Never stop. Never ask. The human may be asleep.
Simplicity	Simpler code at equal metric = keep. Complexity for tiny gain = discard.
Git safety	One commit per experiment. Revert cleanly. Branch never corrupted.
Audit trail	Every experiment logged to `results.tsv`. No exceptions.
Scope	Only edit declared files. Never touch evaluation harness.

How this differs from Karpathy's autoresearch

Karpathy's autoresearch	This skill
Python script + `program.md`	Claude Code skill (SKILL.md)
Only edits `train.py`	User-defined edit scope
Only `val_bpb` metric	Any measurable objective
5-min GPU training	User-defined timeout + command
ML architecture only	Any code changes
Single GPU required	Any compute environment
Claude Code / Cursor required	Claude Code only

The loop structure is identical: propose -> commit -> run -> evaluate -> keep/discard -> repeat.

Creating your own skill (template)

This repo serves as a template for creating Claude Code skills. The key file is .claude/skills/autoresearch/SKILL.md:

---
name: your-skill-name
description: |
  What it does and when to trigger it.
argument-hint: "[arg1] [arg2]"
allowed-tools:
  - Bash
  - Read
  - Edit
---

# Your skill instructions here

Claude follows these instructions when the skill is invoked.
Use $ARGUMENTS for user input.
Use ```bash blocks for dynamic context injection.

See the Claude Code docs for the full skill specification.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude/skills/autoresearch		.claude/skills/autoresearch
.github/workflows		.github/workflows
examples		examples
references		references
tests		tests
.claudeignore		.claudeignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoresearch-skill

How it works

Install

Usage

Interactive setup

One-shot

What happens

Examples

ML training (Karpathy-style)

Test coverage

Build size

Code quality

API accuracy

Design principles

How this differs from Karpathy's autoresearch

Creating your own skill (template)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

autoresearch-skill

How it works

Install

Usage

Interactive setup

One-shot

What happens

Examples

ML training (Karpathy-style)

Test coverage

Build size

Code quality

API accuracy

Design principles

How this differs from Karpathy's autoresearch

Creating your own skill (template)

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages