Forked from karpathy/autoresearch to support autonomous research swarm able to run experiments in parallel in the cloud.
The idea: give an AI agent a small but real LLM training setup and a "skillified" description of how to access cloud resources to run experiments in parallel. The main research agent can experiment autonomously, exploring many directions in parallel via a swarm of subagents. The swarm syncs their results into a unified experiment log stored in a Deeplake managed table. Additionally, the subagents record memories of their experiments for future reference and analysis by the main research agent and/or other agents, also stored in Deeplake.
The repo is deliberately kept small and follows the philosophy of original autoresearch:
prepare.py— fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). Usually not modified by the agents.train.py— the single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW). More details are at karpathy/nanochat. This file is edited and iterated on by the agent.modal_train.py— contains the Modal image, dependencies and deployment script for training on a cloud GPU (e.g. H100) and gathering training logs. this file is to be edited only during "preparation", to change some configurations.claude/agents/experiment-worker.md— the subagent contract, describing the task and tools available to the subagent. (Can also be used as standalone reference for other agentic coding tools.)orchestrator.md— the main research agent contract, describing the task and tools available to the autonomous research agent.
Following the original autoresearch design, training runs for a fixed 5-minute time budget (wall clock, excluding startup/compilation), regardless of the details of your compute. The metric is val_bpb (validation bits per byte) — lower is better, and vocab-size-independent so architectural changes are fairly compared. Each subagent gets a separate machine to run the training on and can inspect the results of their own experiment.
Requirements: Python 3.10+, uv, Modal and Deeplake accounts and API keys.
# 1. Install uv project manager (if you don't already have it)
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Install dependencies - local dependencies only include Deeplake and Modal SDKs, as well as auxiliary dependencies for the visualization code.
uv sync
# 3. Setup Deeplake environment, take a token from your Deeplake account
export DEEPLAKE_API_KEY=<your-deeplake-api-key>
# 4. Setup Modal environment, run and authenticate with your Modal account
uv run modal setup
uv run modal environment create "autoresearch"
If the above commands all work ok, your setup is working and you can go into autonomous research mode.
Simply spin up your Claude or similar agentic codingtool, then prompt it:
Hi have a look at orchestrator.md and let's kick off a new experiment! First, let's complete the setup together and do a baseline run.
With this, the agent will work with you to identify any missing steps from the setup, get the baseline metrics and be ready to start exploring new directions autonomously. After the initial setup, the Modal image will be generated with a volume containing training data and the tokenizer resulting from the prepare.py. This way, each run will have minimal startup time overhead and typically finishes within 5-7 minutes.
Alternatively, you can also run the non-interactive version with something like:
claude -p "Hi, have a look at orchestrator.md. Our goal is to optimize the training code. Adhere to the instructions in the file and start working." \
--model claude-opus-4-6 \
--dangerously-skip-permissions \
--permission-mode bypassPermissions \
--output-format stream-json \
--include-partial-messages \
--verbose \
2>&1 | tee -a dev/autoresearch_logs.txt- Single file to modify. The agent only touches
train.py. This keeps the scope manageable and diffs reviewable both manually and by the main researcher agent. - Running experiments in parallel. The subagents run experiments in parallel on their own machines, and sync their results into the Deeplake table. The main researcher agent can inspect the results and integrate the best surviving commit back onto
main. Each subagent can work on its own git worktree, modifytrain.pyand use the code to trigger a training run on a cloud GPU via Modal.
Currently, the training code will run exclusively remotely on a cloud GPU, making this accessible to pretty much any computer with an agentic coding tool and Python installed.
In modal_train.py, you can change the GPU type to support other compute platforms, and/or add packages into the environment by modifying the snippet:
app = modal.App("autoresearch")
vol = modal.Volume.from_name("autoresearch-data", create_if_missing=True)
VOLUME_PATH = "/data/autoresearch"
GPU_TYPE = "H100"
image = (
modal.Image.debian_slim(python_version="3.12")
.pip_install(
"torch==2.9.1",
extra_index_url="https://download.pytorch.org/whl/cu128",
)
.pip_install(
"kernels>=0.11.7",
"numpy",
"pandas",
"pyarrow",
"requests",
"rustbpe",
"tiktoken",
"matplotlib",
)
.add_local_file("prepare.py", "/app/prepare.py")
.add_local_file("train.py", "/app/train.py")
)MIT
