Agent Patterns Lab is a local learning project for comparing classic agent patterns side by side. It uses Python, LangChain model abstractions, LangGraph execution graphs, externalized prompts, and a small Gradio chat page.
Repository: github.com/Sud0x67/agent-patterns-lab
Related docs:
Complex agents are hard to keep reliable with one long prompt. Planning, tool use, observations, critique, retries, and final synthesis become mixed together, which makes failures hard to locate.
Agent patterns make the control flow explicit:
- More reliable: each stage has a defined job.
- Easier to debug: traces show whether the failure happened in planning, tool execution, evaluation, or synthesis.
- Easier to reuse: keep the control flow, swap prompts and tools.
- Easier to evaluate: compare patterns on the same cases with latency, token, completion, and success metrics.
Simple Q&A does not need a pattern. Patterns are for managing multi-stage task complexity.
Local chat page with the pattern dropdown:
Trace view after a ReAct run:
This lab currently implements eight patterns. A useful way to read them is as two families:
| Family | Patterns | What they emphasize |
|---|---|---|
| Tool-execution patterns | ReAct, Plan & Solve, LLM Compiler, REWOO | How the agent decides, schedules, and consumes tool calls. |
| Reasoning-enhancement patterns | Reflection, Reflexion, LATS, Self-Discovery | How the agent improves, evaluates, searches, or structures its own answer. |
| Pattern | Paper | Command | Main loop | Tool use | Best fit | Tradeoff |
|---|---|---|---|---|---|---|
| ReAct | arXiv:2210.03629 | react |
Reason -> action -> observation -> repeat | Interleaved | Interactive tasks where each observation changes the next step | Simple and transparent, but can be slower or loop-prone. |
| Plan & Solve | arXiv:2305.04091 | plan-solve |
Plan all steps, solve each step, synthesize | Optional during steps | Multi-step tasks with a stable problem shape | Easier to inspect than free-form CoT, but early plans can be wrong. |
| LLM Compiler | arXiv:2312.04511 | llm-compiler |
Compile a task DAG, execute ready tool tasks, join | Structured DAG | Independent or partially dependent tool calls | Good for parallelism, but depends on valid structured plans. |
| REWOO | arXiv:2305.18323 | rewoo |
Plan evidence slots, execute tools, solve from evidence | Batched after planning | Retrieval/calculation tasks where observations can be separated from reasoning | Reduces repeated context, but the first plan must name useful evidence. |
| Reflection | arXiv:2303.17651 | reflection |
Draft -> critique -> revise | Usually none | Writing, answer quality, self-checking, cleanup | Cheap and easy, but critique quality is model-dependent. |
| Reflexion | arXiv:2303.11366 | reflexion |
Attempt -> evaluate -> verbal lesson -> retry | Optional | Tasks with a clear feedback signal or repeated attempts | Can improve after failures, but needs useful evaluator feedback. |
| LATS | arXiv:2310.04406 | lats |
Select node with UCB, expand, evaluate, backpropagate, synthesize | Optional in full versions; this lab focuses on candidate search | Harder tasks where exploring alternatives helps | Stronger search behavior, but higher latency and token cost. |
| Self-Discovery | arXiv:2402.03620 | self-discovery |
Select modules, adapt modules, build structure, fill structure | Usually none | Complex reasoning where choosing the right reasoning scaffold matters | Produces reusable task structure, but is inference-heavy. |
Install uv if needed:
curl -LsSf https://astral.sh/uv/install.sh | shOpen a new shell after installation. If uv is still not found, add the install
directory to your current shell:
export PATH="$HOME/.local/bin:$PATH"Sync dependencies:
uv syncConfigure any OpenAI-compatible model endpoint:
export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="your-model-name"
# Optional for non-default OpenAI-compatible endpoints:
export OPENAI_BASE_URL="https://your-compatible-endpoint.example/v1"The project intentionally uses the OpenAI-compatible API shape only. The code does not maintain vendor-specific routing registries.
List patterns:
uv run python main.py patternsRun one prompt:
uv run python main.py run \
--pattern react \
--input "What is 18 * 24? Use a tool if useful." \
--show-traceRun a different pattern:
uv run python main.py run \
--pattern self-discovery \
--input "Find the bug: total = 0; for n in nums: total = n; return total" \
--show-traceCommon pattern names:
react
plan-solve
reflection
reflexion
llm-compiler
rewoo
lats
self-discovery
Start the local chat page:
uv run python main.py chat --pattern react --port 7860Then open:
http://127.0.0.1:7860
The --pattern option only sets the default selection. The page includes a
Pattern dropdown, so you can switch between patterns without restarting the
server.
Useful manual prompts:
Use the available calculator tool if useful: compute 27 * (14 + 6) - 35.
Revise this vague sentence into one clear README sentence:
Agent thing does stuff with tools and thoughts.
Explore three ways to make the LATS demo more educational, then select the best one.
The chat page streams a visible learning trace while the pattern runs. It shows step-level events such as plans, tool calls, observations, candidate scores, reflections, and final synthesis. This is an observable pattern trace, not hidden chain-of-thought. When the endpoint returns token usage, model-call trace events include token counts.
Prompts are externalized in:
prompts/{pattern}/{step}/system.md
prompts/{pattern}/{step}/user.md
Templates use $variable placeholders, for example $task, $tools, and
$scratchpad. This keeps JSON examples easy to edit because {} characters do
not need escaping.
Run with a custom prompt directory:
uv run python main.py chat \
--pattern react \
--prompt-dir promptsThe fixed mini dataset lives at evals/mini.jsonl. It has eight cases split
into two suites:
| Suite | Cases | Intended comparison |
|---|---|---|
tool_execution |
4 | ReAct, Plan & Solve, LLM Compiler, and REWOO on calculator / local knowledge-base tasks. |
reasoning_enhancement |
4 | Reflection, Reflexion, LATS, and Self-Discovery on revision, self-checking, tradeoff, and exploration tasks. |
Run the full matrix on a real OpenAI-compatible endpoint:
export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="your-model-name"
export OPENAI_BASE_URL="https://your-compatible-endpoint.example/v1"
uv run python main.py eval \
--patterns all \
--concurrency 4 \
--case-timeout 240 \
--request-timeout 120Run only the tool-execution family:
uv run python main.py eval \
--patterns react,plan-solve,llm-compiler,rewoo \
--concurrency 4 \
--case-timeout 240 \
--request-timeout 120Run only the reasoning-enhancement family:
uv run python main.py eval \
--patterns reflection,reflexion,lats,self-discovery \
--concurrency 4 \
--case-timeout 240 \
--request-timeout 120Each run writes:
eval_runs/YYYYMMDD-HHMMSS/
summary.md
summary.csv
raw_results.jsonl
traces/
The table below shows only the preserved 240s qwen3.6-plus run in this
repository:
- Dataset:
evals/mini.jsonl - Patterns: 8
- Model:
qwen3.6-plus - Runs: 64
- Concurrency: 4
- Case timeout: 240s
- Request timeout: 120s
Model-level summary:
| Model | Success | Completed | Runs | Success Rate | Completion Rate | Avg Latency | Avg Tokens | Timeouts |
|---|---|---|---|---|---|---|---|---|
| qwen3.6-plus | 52 | 64 | 64 | 81.25% | 100.00% | 67110 ms | 4407 | 0 |
Pattern-level summary:
| Pattern | Success | Completed | Runs | Success Rate | Avg Latency | Avg Tokens | Tool Calls | Timeouts |
|---|---|---|---|---|---|---|---|---|
| self-discovery | 8 | 8 | 8 | 100.00% | 145836 ms | 9426 | 0 | 0 |
| rewoo | 7 | 8 | 8 | 87.50% | 46090 ms | 2852 | 0 | 0 |
| llm-compiler | 7 | 8 | 8 | 87.50% | 61214 ms | 3701 | 0 | 0 |
| plan-solve | 7 | 8 | 8 | 87.50% | 81307 ms | 5588 | 0 | 0 |
| react | 6 | 8 | 8 | 75.00% | 21755 ms | 1814 | 10 | 0 |
| lats | 6 | 8 | 8 | 75.00% | 67694 ms | 4738 | 0 | 0 |
| reflection | 6 | 8 | 8 | 75.00% | 69842 ms | 4334 | 0 | 0 |
| reflexion | 5 | 8 | 8 | 62.50% | 43146 ms | 2807 | 0 | 0 |
Preserved artifacts:
eval_runs/qwen3.6-plus-concurrent/20260522-182954/summary.mdeval_runs/qwen3.6-plus-concurrent/20260522-182954/summary.csveval_runs/qwen3.6-plus-concurrent/20260522-182954/raw_results.jsonleval_runs/qwen3.6-plus-concurrent/20260522-182954/traces/
The scorer is intentionally small and deterministic: it checks expected strings and records trace metrics such as success, latency, model events, tool calls, token usage, warnings, and timeouts. Treat the benchmark as a learning probe, not as a general leaderboard.
agent_patterns_lab/
cli.py # Typer commands
ui.py # Gradio chat page
llm.py # LangChain ChatOpenAI factory
tools.py # LangChain tool registry and demo tools
schema.py # RunResult, TraceEvent, token aggregation
patterns/ # One LangGraph runner per pattern
docs/
assets/ # README screenshots
agent-patterns-lab-tech-share.zh-CN.md
tests/ # Fake-model tests for every pattern
prompts/ # Editable system/user prompts for every pattern step
evals/mini.jsonl # Fixed mini E2E dataset
eval_runs/ # Preserved benchmark artifacts
main.py # Local command entry point
pyproject.toml # uv-managed dependencies
Each pattern is implemented as a small LangGraph StateGraph. The goal is to
make the control flow visible rather than to hide it behind a larger agent
framework.
The LATS implementation uses a real search tree: each iteration selects a leaf with UCB, expands candidate child nodes, evaluates them, backpropagates scores, and synthesizes the final answer from the best path.
Self-Discovery follows a four-step flow: select relevant reasoning modules, adapt them to the task, operationalize them into a JSON reasoning structure, then fill that structure to answer the concrete task.
The default tools are deliberately small:
calculatorcurrent_timesearch_knowledge_base
Add real tools in agent_patterns_lab/tools.py or pass a custom ToolRegistry
when creating a runner.
uv run pytestThe tests use a scripted fake chat model, so they do not require an API key.

