A modular platform for building, experimenting with, evaluating, and deploying AI agents — especially software engineering agents.
AgentLab provides interchangeable components (LLMs, context managers, loops, tools, sandboxes, prompts) that compose into agents. Run experiments across architectures, evaluate on SWE benchmarks, and compare performance across runs.
pip install -e ".[dev]"Create a YAML config in agents/:
name: coding_agent
llm: openai
loop: react
context: sliding
tools:
- filesystem
- shell
sandbox: locallab run coding_agent
lab run coding_agent --task bug_fix_1lab experiment run experiment.yaml
lab experiment results <experiment_id>lab eval coding_agent --task bug_fix_1lab compare <run_id_1> <run_id_2>lab replay <run_id>To trace LLM calls and group each agent run under an OpenInference AGENT span in Arize Phoenix, use a self-hosted Phoenix instance (self-hosting guide), then:
- Install the optional extra:
pip install -e ".[phoenix]" - Set environment variables (see
example.env):AGENTLAB_PHOENIX_TRACING=1PHOENIX_COLLECTOR_ENDPOINT— your Phoenix OTLP endpoint (oftenhttp://localhost:6006for local Phoenix)
- For self hosted, use
phoenix serveto start the phoenix server
Tracing is initialized when you run lab ui, or when you run lab run, lab experiment run, or lab eval from the CLI. Open the Phoenix UI to browse traces.
AgentLab includes a full web UI for browsing agents, runs, tasks, experiments, and comparisons.
From the project root (with your virtualenv activated):
lab uiThen open http://127.0.0.1:8000 in your browser.
FastAPI serves both the JSON API under /api/* and, if built, the static UI at /.
In one terminal, start the API server:
lab uiIn another terminal, run the Vite dev server from the ui/ directory:
cd ui
npm install # first time
npm run devVisit http://127.0.0.1:5173. The dev server proxies /api/* requests to http://127.0.0.1:8000.
To build the React app into static assets:
cd ui
npm run buildThis outputs to ui/dist/. On the next lab ui run, FastAPI will automatically serve ui/dist/ at /.
Agent = LLM + Loop Controller + Context Manager + Tools + Sandbox + Prompts
Components are swappable modules registered in a global registry. Agents are defined as YAML configs that reference component names.
agentlab/
cli/ CLI entry points
core/ Agent, Component interfaces, Registry
components/ Built-in component implementations
runtime/ Agent runner + trace recorder
experiment/ Experiment engine + comparisons
evaluation/ SWE harness, metrics, validators
storage/ File-based run/experiment store
models/ Pydantic data models
MIT