autoresearch

Forked from karpathy/autoresearch to support autonomous research swarm able to run experiments in parallel in the cloud.

The idea: give an AI agent a small but real LLM training setup and a "skillified" description of how to access cloud resources to run experiments in parallel. The main research agent can experiment autonomously, exploring many directions in parallel via a swarm of subagents. The swarm syncs their results into a unified experiment log stored in a Deeplake managed table. Additionally, the subagents record memories of their experiments for future reference and analysis by the main research agent and/or other agents, also stored in Deeplake.

How it works

The repo is deliberately kept small and follows the philosophy of original autoresearch:

prepare.py — fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). Usually not modified by the agents.
train.py — the single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW). More details are at karpathy/nanochat. This file is edited and iterated on by the agent.
modal_train.py — contains the Modal image, dependencies and deployment script for training on a cloud GPU (e.g. H100) and gathering training logs. this file is to be edited only during "preparation", to change some configurations
.claude/agents/experiment-worker.md — the subagent contract, describing the task and tools available to the subagent. (Can also be used as standalone reference for other agentic coding tools.)
orchestrator.md — the main research agent contract, describing the task and tools available to the autonomous research agent.

Following the original autoresearch design, training runs for a fixed 5-minute time budget (wall clock, excluding startup/compilation), regardless of the details of your compute. The metric is val_bpb (validation bits per byte) — lower is better, and vocab-size-independent so architectural changes are fairly compared. Each subagent gets a separate machine to run the training on and can inspect the results of their own experiment.

Quick start

Requirements: Python 3.10+, uv, Modal and Deeplake accounts and API keys.

# 1. Install uv project manager (if you don't already have it)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Install dependencies - local dependencies only include Deeplake and Modal SDKs, as well as auxiliary dependencies for the visualization code.
uv sync

# 3. Setup Deeplake environment, take a token from your Deeplake account
export DEEPLAKE_API_KEY=<your-deeplake-api-key>

# 4. Setup Modal environment, run and authenticate with your Modal account
uv run modal setup
uv run modal environment create "autoresearch"

If the above commands all work ok, your setup is working and you can go into autonomous research mode.

Running the agent

Simply spin up your Claude or similar agentic codingtool, then prompt it:

Hi have a look at orchestrator.md and let's kick off a new experiment! First, let's complete the setup together and do a baseline run.

With this, the agent will work with you to identify any missing steps from the setup, get the baseline metrics and be ready to start exploring new directions autonomously. After the initial setup, the Modal image will be generated with a volume containing training data and the tokenizer resulting from the prepare.py. This way, each run will have minimal startup time overhead and typically finishes within 5-7 minutes.

Alternatively, you can also run the non-interactive version with something like:

claude -p "Hi, have a look at orchestrator.md. Our goal is to optimize the training code. Adhere to the instructions in the file and start working." \
 --model claude-opus-4-6 \
 --dangerously-skip-permissions \
 --permission-mode bypassPermissions \
 --output-format stream-json \
 --include-partial-messages \
 --verbose \
 2>&1 | tee -a dev/autoresearch_logs.txt

Design choices

Single file to modify. The agent only touches train.py. This keeps the scope manageable and diffs reviewable both manually and by the main researcher agent.
Running experiments in parallel. The subagents run experiments in parallel on their own machines, and sync their results into the Deeplake table. The main researcher agent can inspect the results and integrate the best surviving commit back onto main. Each subagent can work on its own git worktree, modify train.py and use the code to trigger a training run on a cloud GPU via Modal.

Platform and compute support

Currently, the training code will run exclusively remotely on a cloud GPU, making this accessible to pretty much any computer with an agentic coding tool and Python installed. In modal_train.py, you can change the GPU type to support other compute platforms, and/or add packages into the environment by modifying the snippet:

app = modal.App("autoresearch")

vol = modal.Volume.from_name("autoresearch-data", create_if_missing=True)
VOLUME_PATH = "/data/autoresearch"
GPU_TYPE = "H100"

image = (
    modal.Image.debian_slim(python_version="3.12")
    .pip_install(
        "torch==2.9.1",
        extra_index_url="https://download.pytorch.org/whl/cu128",
    )
    .pip_install(
        "kernels>=0.11.7",
        "numpy",
        "pandas",
        "pyarrow",
        "requests",
        "rustbpe",
        "tiktoken",
        "matplotlib",
    )
    .add_local_file("prepare.py", "/app/prepare.py")
    .add_local_file("train.py", "/app/train.py")
)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.claude/agents		.claude/agents
.gitignore		.gitignore
.python-version		.python-version
analysis.ipynb		analysis.ipynb
deeplake_skill.md		deeplake_skill.md
log_experiment.py		log_experiment.py
modal_train.py		modal_train.py
orchestrator.md		orchestrator.md
prepare.py		prepare.py
progress.png		progress.png
pyproject.toml		pyproject.toml
readme.md		readme.md
run.log		run.log
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoresearch

How it works

Quick start

Running the agent

Design choices

Platform and compute support

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

autoresearch

How it works

Quick start

Running the agent

Design choices

Platform and compute support

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages