Agent Patterns Lab

Agent Patterns Lab is a local learning project for comparing classic agent patterns side by side. It uses Python, LangChain model abstractions, LangGraph execution graphs, externalized prompts, and a small Gradio chat page.

Repository: github.com/Sud0x67/agent-patterns-lab

Related docs:

Why Agent Patterns

Complex agents are hard to keep reliable with one long prompt. Planning, tool use, observations, critique, retries, and final synthesis become mixed together, which makes failures hard to locate.

Agent patterns make the control flow explicit:

More reliable: each stage has a defined job.
Easier to debug: traces show whether the failure happened in planning, tool execution, evaluation, or synthesis.
Easier to reuse: keep the control flow, swap prompts and tools.
Easier to evaluate: compare patterns on the same cases with latency, token, completion, and success metrics.

Simple Q&A does not need a pattern. Patterns are for managing multi-stage task complexity.

Screenshots

Local chat page with the pattern dropdown:

Trace view after a ReAct run:

Pattern Taxonomy

This lab currently implements eight patterns. A useful way to read them is as two families:

Family	Patterns	What they emphasize
Tool-execution patterns	ReAct, Plan & Solve, LLM Compiler, REWOO	How the agent decides, schedules, and consumes tool calls.
Reasoning-enhancement patterns	Reflection, Reflexion, LATS, Self-Discovery	How the agent improves, evaluates, searches, or structures its own answer.

Pattern Comparison

Pattern	Paper	Command	Main loop	Tool use	Best fit	Tradeoff
ReAct	arXiv:2210.03629	`react`	Reason -> action -> observation -> repeat	Interleaved	Interactive tasks where each observation changes the next step	Simple and transparent, but can be slower or loop-prone.
Plan & Solve	arXiv:2305.04091	`plan-solve`	Plan all steps, solve each step, synthesize	Optional during steps	Multi-step tasks with a stable problem shape	Easier to inspect than free-form CoT, but early plans can be wrong.
LLM Compiler	arXiv:2312.04511	`llm-compiler`	Compile a task DAG, execute ready tool tasks, join	Structured DAG	Independent or partially dependent tool calls	Good for parallelism, but depends on valid structured plans.
REWOO	arXiv:2305.18323	`rewoo`	Plan evidence slots, execute tools, solve from evidence	Batched after planning	Retrieval/calculation tasks where observations can be separated from reasoning	Reduces repeated context, but the first plan must name useful evidence.
Reflection	arXiv:2303.17651	`reflection`	Draft -> critique -> revise	Usually none	Writing, answer quality, self-checking, cleanup	Cheap and easy, but critique quality is model-dependent.
Reflexion	arXiv:2303.11366	`reflexion`	Attempt -> evaluate -> verbal lesson -> retry	Optional	Tasks with a clear feedback signal or repeated attempts	Can improve after failures, but needs useful evaluator feedback.
LATS	arXiv:2310.04406	`lats`	Select node with UCB, expand, evaluate, backpropagate, synthesize	Optional in full versions; this lab focuses on candidate search	Harder tasks where exploring alternatives helps	Stronger search behavior, but higher latency and token cost.
Self-Discovery	arXiv:2402.03620	`self-discovery`	Select modules, adapt modules, build structure, fill structure	Usually none	Complex reasoning where choosing the right reasoning scaffold matters	Produces reusable task structure, but is inference-heavy.

Setup

Install uv if needed:

curl -LsSf https://astral.sh/uv/install.sh | sh

Open a new shell after installation. If uv is still not found, add the install directory to your current shell:

export PATH="$HOME/.local/bin:$PATH"

Sync dependencies:

uv sync

Configure any OpenAI-compatible model endpoint:

export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="your-model-name"
# Optional for non-default OpenAI-compatible endpoints:
export OPENAI_BASE_URL="https://your-compatible-endpoint.example/v1"

The project intentionally uses the OpenAI-compatible API shape only. The code does not maintain vendor-specific routing registries.

Command-Line Usage

List patterns:

uv run python main.py patterns

Run one prompt:

uv run python main.py run \
  --pattern react \
  --input "What is 18 * 24? Use a tool if useful." \
  --show-trace

Run a different pattern:

uv run python main.py run \
  --pattern self-discovery \
  --input "Find the bug: total = 0; for n in nums: total = n; return total" \
  --show-trace

Common pattern names:

react
plan-solve
reflection
reflexion
llm-compiler
rewoo
lats
self-discovery

Manual Page Testing

Start the local chat page:

uv run python main.py chat --pattern react --port 7860

Then open:

http://127.0.0.1:7860

The --pattern option only sets the default selection. The page includes a Pattern dropdown, so you can switch between patterns without restarting the server.

Useful manual prompts:

Use the available calculator tool if useful: compute 27 * (14 + 6) - 35.

Revise this vague sentence into one clear README sentence:
Agent thing does stuff with tools and thoughts.

Explore three ways to make the LATS demo more educational, then select the best one.

The chat page streams a visible learning trace while the pattern runs. It shows step-level events such as plans, tool calls, observations, candidate scores, reflections, and final synthesis. This is an observable pattern trace, not hidden chain-of-thought. When the endpoint returns token usage, model-call trace events include token counts.

Prompt Editing

Prompts are externalized in:

prompts/{pattern}/{step}/system.md
prompts/{pattern}/{step}/user.md

Templates use $variable placeholders, for example $task, $tools, and $scratchpad. This keeps JSON examples easy to edit because {} characters do not need escaping.

Run with a custom prompt directory:

uv run python main.py chat \
  --pattern react \
  --prompt-dir prompts

Real-Model Evaluation

The fixed mini dataset lives at evals/mini.jsonl. It has eight cases split into two suites:

Suite	Cases	Intended comparison
`tool_execution`	4	ReAct, Plan & Solve, LLM Compiler, and REWOO on calculator / local knowledge-base tasks.
`reasoning_enhancement`	4	Reflection, Reflexion, LATS, and Self-Discovery on revision, self-checking, tradeoff, and exploration tasks.

Run the full matrix on a real OpenAI-compatible endpoint:

export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="your-model-name"
export OPENAI_BASE_URL="https://your-compatible-endpoint.example/v1"

uv run python main.py eval \
  --patterns all \
  --concurrency 4 \
  --case-timeout 240 \
  --request-timeout 120

Run only the tool-execution family:

uv run python main.py eval \
  --patterns react,plan-solve,llm-compiler,rewoo \
  --concurrency 4 \
  --case-timeout 240 \
  --request-timeout 120

Run only the reasoning-enhancement family:

uv run python main.py eval \
  --patterns reflection,reflexion,lats,self-discovery \
  --concurrency 4 \
  --case-timeout 240 \
  --request-timeout 120

Each run writes:

eval_runs/YYYYMMDD-HHMMSS/
  summary.md
  summary.csv
  raw_results.jsonl
  traces/

240s Evaluation Results

The table below shows only the preserved 240s qwen3.6-plus run in this repository:

Dataset: evals/mini.jsonl
Patterns: 8
Model: qwen3.6-plus
Runs: 64
Concurrency: 4
Case timeout: 240s
Request timeout: 120s

Model-level summary:

Model	Success	Completed	Runs	Success Rate	Completion Rate	Avg Latency	Avg Tokens	Timeouts
qwen3.6-plus	52	64	64	81.25%	100.00%	67110 ms	4407	0

Pattern-level summary:

Pattern	Success	Completed	Runs	Success Rate	Avg Latency	Avg Tokens	Tool Calls
self-discovery	8	8	8	100.00%	145836 ms	9426	0
rewoo	7	8	8	87.50%	46090 ms	2852	0
llm-compiler	7	8	8	87.50%	61214 ms	3701	0
plan-solve	7	8	8	87.50%	81307 ms	5588	0
react	6	8	8	75.00%	21755 ms	1814	10
lats	6	8	8	75.00%	67694 ms	4738	0
reflection	6	8	8	75.00%	69842 ms	4334	0
reflexion	5	8	8	62.50%	43146 ms	2807	0

Preserved artifacts:

eval_runs/qwen3.6-plus-concurrent/20260522-182954/summary.md
eval_runs/qwen3.6-plus-concurrent/20260522-182954/summary.csv
eval_runs/qwen3.6-plus-concurrent/20260522-182954/raw_results.jsonl
eval_runs/qwen3.6-plus-concurrent/20260522-182954/traces/

The scorer is intentionally small and deterministic: it checks expected strings and records trace metrics such as success, latency, model events, tool calls, token usage, warnings, and timeouts. Treat the benchmark as a learning probe, not as a general leaderboard.

Project Layout

agent_patterns_lab/
  cli.py                 # Typer commands
  ui.py                  # Gradio chat page
  llm.py                 # LangChain ChatOpenAI factory
  tools.py               # LangChain tool registry and demo tools
  schema.py              # RunResult, TraceEvent, token aggregation
  patterns/              # One LangGraph runner per pattern
docs/
  assets/                # README screenshots
  agent-patterns-lab-tech-share.zh-CN.md
tests/                   # Fake-model tests for every pattern
prompts/                 # Editable system/user prompts for every pattern step
evals/mini.jsonl         # Fixed mini E2E dataset
eval_runs/               # Preserved benchmark artifacts
main.py                  # Local command entry point
pyproject.toml           # uv-managed dependencies

Implementation Notes

Each pattern is implemented as a small LangGraph StateGraph. The goal is to make the control flow visible rather than to hide it behind a larger agent framework.

The LATS implementation uses a real search tree: each iteration selects a leaf with UCB, expands candidate child nodes, evaluates them, backpropagates scores, and synthesizes the final answer from the best path.

Self-Discovery follows a four-step flow: select relevant reasoning modules, adapt them to the task, operationalize them into a JSON reasoning structure, then fill that structure to answer the concrete task.

The default tools are deliberately small:

calculator
current_time
search_knowledge_base

Add real tools in agent_patterns_lab/tools.py or pass a custom ToolRegistry when creating a runner.

Test

uv run pytest

The tests use a scripted fake chat model, so they do not require an API key.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Patterns Lab

Why Agent Patterns

Screenshots

Pattern Taxonomy

Pattern Comparison

Setup

Command-Line Usage

Manual Page Testing

Prompt Editing

Real-Model Evaluation

240s Evaluation Results

Project Layout

Implementation Notes

Test

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agent_patterns_lab		agent_patterns_lab
docs		docs
eval_runs		eval_runs
evals		evals
prompts		prompts
tests		tests
.gitignore		.gitignore
README.md		README.md
README.zh-CN.md		README.zh-CN.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Agent Patterns Lab

Why Agent Patterns

Screenshots

Pattern Taxonomy

Pattern Comparison

Setup

Command-Line Usage

Manual Page Testing

Prompt Editing

Real-Model Evaluation

240s Evaluation Results

Project Layout

Implementation Notes

Test

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages