Skip to content

Sud0x67/agent-patterns-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agent Patterns Lab

Agent Patterns Lab is a local learning project for comparing classic agent patterns side by side. It uses Python, LangChain model abstractions, LangGraph execution graphs, externalized prompts, and a small Gradio chat page.

Repository: github.com/Sud0x67/agent-patterns-lab

Related docs:

Why Agent Patterns

Complex agents are hard to keep reliable with one long prompt. Planning, tool use, observations, critique, retries, and final synthesis become mixed together, which makes failures hard to locate.

Agent patterns make the control flow explicit:

  • More reliable: each stage has a defined job.
  • Easier to debug: traces show whether the failure happened in planning, tool execution, evaluation, or synthesis.
  • Easier to reuse: keep the control flow, swap prompts and tools.
  • Easier to evaluate: compare patterns on the same cases with latency, token, completion, and success metrics.

Simple Q&A does not need a pattern. Patterns are for managing multi-stage task complexity.

Screenshots

Local chat page with the pattern dropdown:

Agent Patterns Lab chat page

Trace view after a ReAct run:

Agent Patterns Lab trace view

Pattern Taxonomy

This lab currently implements eight patterns. A useful way to read them is as two families:

Family Patterns What they emphasize
Tool-execution patterns ReAct, Plan & Solve, LLM Compiler, REWOO How the agent decides, schedules, and consumes tool calls.
Reasoning-enhancement patterns Reflection, Reflexion, LATS, Self-Discovery How the agent improves, evaluates, searches, or structures its own answer.

Pattern Comparison

Pattern Paper Command Main loop Tool use Best fit Tradeoff
ReAct arXiv:2210.03629 react Reason -> action -> observation -> repeat Interleaved Interactive tasks where each observation changes the next step Simple and transparent, but can be slower or loop-prone.
Plan & Solve arXiv:2305.04091 plan-solve Plan all steps, solve each step, synthesize Optional during steps Multi-step tasks with a stable problem shape Easier to inspect than free-form CoT, but early plans can be wrong.
LLM Compiler arXiv:2312.04511 llm-compiler Compile a task DAG, execute ready tool tasks, join Structured DAG Independent or partially dependent tool calls Good for parallelism, but depends on valid structured plans.
REWOO arXiv:2305.18323 rewoo Plan evidence slots, execute tools, solve from evidence Batched after planning Retrieval/calculation tasks where observations can be separated from reasoning Reduces repeated context, but the first plan must name useful evidence.
Reflection arXiv:2303.17651 reflection Draft -> critique -> revise Usually none Writing, answer quality, self-checking, cleanup Cheap and easy, but critique quality is model-dependent.
Reflexion arXiv:2303.11366 reflexion Attempt -> evaluate -> verbal lesson -> retry Optional Tasks with a clear feedback signal or repeated attempts Can improve after failures, but needs useful evaluator feedback.
LATS arXiv:2310.04406 lats Select node with UCB, expand, evaluate, backpropagate, synthesize Optional in full versions; this lab focuses on candidate search Harder tasks where exploring alternatives helps Stronger search behavior, but higher latency and token cost.
Self-Discovery arXiv:2402.03620 self-discovery Select modules, adapt modules, build structure, fill structure Usually none Complex reasoning where choosing the right reasoning scaffold matters Produces reusable task structure, but is inference-heavy.

Setup

Install uv if needed:

curl -LsSf https://astral.sh/uv/install.sh | sh

Open a new shell after installation. If uv is still not found, add the install directory to your current shell:

export PATH="$HOME/.local/bin:$PATH"

Sync dependencies:

uv sync

Configure any OpenAI-compatible model endpoint:

export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="your-model-name"
# Optional for non-default OpenAI-compatible endpoints:
export OPENAI_BASE_URL="https://your-compatible-endpoint.example/v1"

The project intentionally uses the OpenAI-compatible API shape only. The code does not maintain vendor-specific routing registries.

Command-Line Usage

List patterns:

uv run python main.py patterns

Run one prompt:

uv run python main.py run \
  --pattern react \
  --input "What is 18 * 24? Use a tool if useful." \
  --show-trace

Run a different pattern:

uv run python main.py run \
  --pattern self-discovery \
  --input "Find the bug: total = 0; for n in nums: total = n; return total" \
  --show-trace

Common pattern names:

react
plan-solve
reflection
reflexion
llm-compiler
rewoo
lats
self-discovery

Manual Page Testing

Start the local chat page:

uv run python main.py chat --pattern react --port 7860

Then open:

http://127.0.0.1:7860

The --pattern option only sets the default selection. The page includes a Pattern dropdown, so you can switch between patterns without restarting the server.

Useful manual prompts:

Use the available calculator tool if useful: compute 27 * (14 + 6) - 35.
Revise this vague sentence into one clear README sentence:
Agent thing does stuff with tools and thoughts.
Explore three ways to make the LATS demo more educational, then select the best one.

The chat page streams a visible learning trace while the pattern runs. It shows step-level events such as plans, tool calls, observations, candidate scores, reflections, and final synthesis. This is an observable pattern trace, not hidden chain-of-thought. When the endpoint returns token usage, model-call trace events include token counts.

Prompt Editing

Prompts are externalized in:

prompts/{pattern}/{step}/system.md
prompts/{pattern}/{step}/user.md

Templates use $variable placeholders, for example $task, $tools, and $scratchpad. This keeps JSON examples easy to edit because {} characters do not need escaping.

Run with a custom prompt directory:

uv run python main.py chat \
  --pattern react \
  --prompt-dir prompts

Real-Model Evaluation

The fixed mini dataset lives at evals/mini.jsonl. It has eight cases split into two suites:

Suite Cases Intended comparison
tool_execution 4 ReAct, Plan & Solve, LLM Compiler, and REWOO on calculator / local knowledge-base tasks.
reasoning_enhancement 4 Reflection, Reflexion, LATS, and Self-Discovery on revision, self-checking, tradeoff, and exploration tasks.

Run the full matrix on a real OpenAI-compatible endpoint:

export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="your-model-name"
export OPENAI_BASE_URL="https://your-compatible-endpoint.example/v1"

uv run python main.py eval \
  --patterns all \
  --concurrency 4 \
  --case-timeout 240 \
  --request-timeout 120

Run only the tool-execution family:

uv run python main.py eval \
  --patterns react,plan-solve,llm-compiler,rewoo \
  --concurrency 4 \
  --case-timeout 240 \
  --request-timeout 120

Run only the reasoning-enhancement family:

uv run python main.py eval \
  --patterns reflection,reflexion,lats,self-discovery \
  --concurrency 4 \
  --case-timeout 240 \
  --request-timeout 120

Each run writes:

eval_runs/YYYYMMDD-HHMMSS/
  summary.md
  summary.csv
  raw_results.jsonl
  traces/

240s Evaluation Results

The table below shows only the preserved 240s qwen3.6-plus run in this repository:

  • Dataset: evals/mini.jsonl
  • Patterns: 8
  • Model: qwen3.6-plus
  • Runs: 64
  • Concurrency: 4
  • Case timeout: 240s
  • Request timeout: 120s

Model-level summary:

Model Success Completed Runs Success Rate Completion Rate Avg Latency Avg Tokens Timeouts
qwen3.6-plus 52 64 64 81.25% 100.00% 67110 ms 4407 0

Pattern-level summary:

Pattern Success Completed Runs Success Rate Avg Latency Avg Tokens Tool Calls Timeouts
self-discovery 8 8 8 100.00% 145836 ms 9426 0 0
rewoo 7 8 8 87.50% 46090 ms 2852 0 0
llm-compiler 7 8 8 87.50% 61214 ms 3701 0 0
plan-solve 7 8 8 87.50% 81307 ms 5588 0 0
react 6 8 8 75.00% 21755 ms 1814 10 0
lats 6 8 8 75.00% 67694 ms 4738 0 0
reflection 6 8 8 75.00% 69842 ms 4334 0 0
reflexion 5 8 8 62.50% 43146 ms 2807 0 0

Preserved artifacts:

  • eval_runs/qwen3.6-plus-concurrent/20260522-182954/summary.md
  • eval_runs/qwen3.6-plus-concurrent/20260522-182954/summary.csv
  • eval_runs/qwen3.6-plus-concurrent/20260522-182954/raw_results.jsonl
  • eval_runs/qwen3.6-plus-concurrent/20260522-182954/traces/

The scorer is intentionally small and deterministic: it checks expected strings and records trace metrics such as success, latency, model events, tool calls, token usage, warnings, and timeouts. Treat the benchmark as a learning probe, not as a general leaderboard.

Project Layout

agent_patterns_lab/
  cli.py                 # Typer commands
  ui.py                  # Gradio chat page
  llm.py                 # LangChain ChatOpenAI factory
  tools.py               # LangChain tool registry and demo tools
  schema.py              # RunResult, TraceEvent, token aggregation
  patterns/              # One LangGraph runner per pattern
docs/
  assets/                # README screenshots
  agent-patterns-lab-tech-share.zh-CN.md
tests/                   # Fake-model tests for every pattern
prompts/                 # Editable system/user prompts for every pattern step
evals/mini.jsonl         # Fixed mini E2E dataset
eval_runs/               # Preserved benchmark artifacts
main.py                  # Local command entry point
pyproject.toml           # uv-managed dependencies

Implementation Notes

Each pattern is implemented as a small LangGraph StateGraph. The goal is to make the control flow visible rather than to hide it behind a larger agent framework.

The LATS implementation uses a real search tree: each iteration selects a leaf with UCB, expands candidate child nodes, evaluates them, backpropagates scores, and synthesizes the final answer from the best path.

Self-Discovery follows a four-step flow: select relevant reasoning modules, adapt them to the task, operationalize them into a JSON reasoning structure, then fill that structure to answer the concrete task.

The default tools are deliberately small:

  • calculator
  • current_time
  • search_knowledge_base

Add real tools in agent_patterns_lab/tools.py or pass a custom ToolRegistry when creating a runner.

Test

uv run pytest

The tests use a scripted fake chat model, so they do not require an API key.

About

implements of different agent patterns and mini evaluation on them.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages