AgentLab

A modular platform for building, experimenting with, evaluating, and deploying AI agents — especially software engineering agents.

Overview

AgentLab provides interchangeable components (LLMs, context managers, loops, tools, sandboxes, prompts) that compose into agents. Run experiments across architectures, evaluate on SWE benchmarks, and compare performance across runs.

Installation

pip install -e ".[dev]"

Quick Start

Define an agent

Create a YAML config in agents/:

name: coding_agent
llm: openai
loop: react
context: sliding
tools:
  - filesystem
  - shell
sandbox: local

Run an agent

lab run coding_agent
lab run coding_agent --task bug_fix_1

Run an experiment

lab experiment run experiment.yaml
lab experiment results <experiment_id>

Evaluate an agent

lab eval coding_agent --task bug_fix_1

Compare runs

lab compare <run_id_1> <run_id_2>

Inspect traces

lab replay <run_id>

Observability (Phoenix)

To trace LLM calls and group each agent run under an OpenInference AGENT span in Arize Phoenix, use a self-hosted Phoenix instance (self-hosting guide), then:

Install the optional extra: pip install -e ".[phoenix]"
Set environment variables (see example.env):
- AGENTLAB_PHOENIX_TRACING=1
- PHOENIX_COLLECTOR_ENDPOINT — your Phoenix OTLP endpoint (often http://localhost:6006 for local Phoenix)
For self hosted, use phoenix serve to start the phoenix server

Tracing is initialized when you run lab ui, or when you run lab run, lab experiment run, or lab eval from the CLI. Open the Phoenix UI to browse traces.

Web UI

AgentLab includes a full web UI for browsing agents, runs, tasks, experiments, and comparisons.

Start the UI (server mode)

From the project root (with your virtualenv activated):

lab ui

Then open http://127.0.0.1:8000 in your browser.
FastAPI serves both the JSON API under /api/* and, if built, the static UI at /.

Develop the UI (hot reload)

In one terminal, start the API server:

lab ui

In another terminal, run the Vite dev server from the ui/ directory:

cd ui
npm install        # first time
npm run dev

Visit http://127.0.0.1:5173. The dev server proxies /api/* requests to http://127.0.0.1:8000.

Build the UI for production

To build the React app into static assets:

cd ui
npm run build

This outputs to ui/dist/. On the next lab ui run, FastAPI will automatically serve ui/dist/ at /.

Architecture

Agent = LLM + Loop Controller + Context Manager + Tools + Sandbox + Prompts

Components are swappable modules registered in a global registry. Agents are defined as YAML configs that reference component names.

Project Structure

agentlab/
  cli/          CLI entry points
  core/         Agent, Component interfaces, Registry
  components/   Built-in component implementations
  runtime/      Agent runner + trace recorder
  experiment/   Experiment engine + comparisons
  evaluation/   SWE harness, metrics, validators
  storage/      File-based run/experiment store
  models/       Pydantic data models

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
agentlab		agentlab
agents		agents
experiments		experiments
runs		runs
skills/skill-creator		skills/skill-creator
tasks/swe		tasks/swe
tests		tests
ui		ui
.gitignore		.gitignore
README.md		README.md
example.env		example.env
pyproject.toml		pyproject.toml
resume_points.txt		resume_points.txt
resume_points_final.txt		resume_points_final.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentLab

Overview

Installation

Quick Start

Define an agent

Run an agent

Run an experiment

Evaluate an agent

Compare runs

Inspect traces

Observability (Phoenix)

Web UI

Start the UI (server mode)

Develop the UI (hot reload)

Build the UI for production

Architecture

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentLab

Overview

Installation

Quick Start

Define an agent

Run an agent

Run an experiment

Evaluate an agent

Compare runs

Inspect traces

Observability (Phoenix)

Web UI

Start the UI (server mode)

Develop the UI (hot reload)

Build the UI for production

Architecture

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages