Contract testing for multi-agent handoffs.
Catch schema drift, broken handoffs, and tool-output regressions in your test suite — not in production. Pytest-native, zero-dependency core, with thin adapters for OpenAI, Anthropic, LangChain, LangGraph, and CrewAI.
Documentation · Quickstart · Vendor Onboarding showcase · Discussions · Changelog
Vendor Onboarding example — a multi-agent approval workflow where an intake agent extracts a structured vendor packet and hands it to downstream security, finance, and approval agents. Declare the packet shape once as a contract:
VENDOR_PACKET_SCHEMA = {
"vendor_name": str,
"data_access": {
"contains_customer_pii": bool,
"data_categories": [str],
"storage_region": str,
"retention_days": int,
},
"compliance": {
"soc2_available": bool,
"dpa_required": bool,
"subprocessors": [str],
},
}
security.assert_handoff_matches(schema=VENDOR_PACKET_SCHEMA)When the intake agent's tool drifts — say contains_customer_pii is renamed to handles_personal_data — the contract fails at the very next boundary, before the security review keeps going on incomplete data:
FAILED test_vendor_onboarding_security_review
AssertionError: handoff field 'data_access.contains_customer_pii': missing from data
AGENT STACK TRACE — security
─────────────────────────────────────────────────────────────
parent: intake (a1f2…)
handoff_context = {
"vendor_name": "ClearVoice AI",
"data_access": {
"handles_personal_data": true, ← drift
"data_categories": ["call_audio", "transcripts"],
"storage_region": "us-east-1",
"retention_days": 365
},
...
}
Turn 0 assess_security_risk(vendor_name="ClearVoice AI")
↳ {"risk": "blocked: missing PII flag"}
─────────────────────────────────────────────────────────────
Read the full walkthrough on the docs site: reagent-ai.mintlify.app/examples/vendor-onboarding
| Adjacent tool / approach | What it does | Where reagent-flow is different |
|---|---|---|
| Pydantic AI / structured outputs | Validates a single LLM call's output shape. | Validates the data passed between agents, across multiple sessions. |
| Guardrails / runtime guards | Blocks bad output at runtime, in production. | Catches it in your test suite, before the PR merges. |
| LangSmith / Langfuse / observability | Records traces for post-hoc inspection. | Records and asserts — your CI fails on drift. |
| LLM evals | Scores model output quality on a dataset. | Asserts deterministic structural contracts on every test run. |
| pytest-mock for agents | Mocks tool calls so tests don't hit live LLMs. | Captures real or mock traces and asserts on their shape. |
Use reagent-flow when you have:
- Multi-agent or multi-step pipelines passing structured data between sessions
- A pytest suite where you want CI to fail on handoff drift before merge
- Tool outputs whose shape your downstream agents silently depend on
Reach for something else when:
- You only need to validate a single LLM call's output → use Pydantic directly
- You need to block bad output at runtime in production → use a guardrails library
- You need accuracy or quality scoring on a dataset → use an evals framework
The core library plus five framework adapters, each a separate installable package:
| Package | Version | Purpose | Docs |
|---|---|---|---|
reagent-flow |
0.5.0 | Core: sessions, traces, assertions, golden baselines | Concepts |
reagent-flow-openai |
0.2.0 | OpenAI Python SDK adapter | OpenAI |
reagent-flow-anthropic |
0.2.0 | Anthropic Python SDK adapter | Anthropic |
reagent-flow-langchain |
0.2.0 | LangChain callback handler | LangChain |
reagent-flow-langgraph |
0.2.0 | LangGraph callback (extends LangChain) | LangGraph |
reagent-flow-crewai |
0.2.0 | CrewAI tool wrapper | CrewAI |
Runnable examples under examples/:
langgraph_demo/— three-agent LangGraph pipeline (Gatherer → Assessor → Decider) that runs end-to-end and demonstrates a broken handoff being caught at the assessor boundary.manual_logging/— minimal refund flow using explicitlog_llm_call/log_tool_result, no framework adapter required.
uv add reagent-flow # core, zero runtime deps
uv add reagent-flow-openai # +OpenAI
uv add reagent-flow-anthropic # +Anthropic
uv add reagent-flow-langchain # +LangChain
uv add reagent-flow-langgraph # +LangGraph
uv add reagent-flow-crewai # +CrewAIPython 3.10+. Each adapter depends only on its respective framework.
Next: write your first contract test in 5 minutes → reagent-ai.mintlify.app/quickstart
Current release: reagent-flow 0.5.0, adapters 0.2.0. Stability: alpha.
Stable today (full reference on the docs site):
- Handoff contracts, tool-output contracts, context preservation
- Flow, count, and ordering assertions
- Nested schemas — typed lists, list-of-dicts, optional Pydantic
BaseModelsupport - Golden-baseline diffs with
ignore_fields - Token and cost guards with per-model pricing
- Agent Stack Traces attached to every failed assertion
- Five framework adapters with automatic tool-result capture
- pytest plugin: fixtures, marker, CLI flags
Planned next:
- Built-in trace redaction framework (for traces that may carry PII or secrets)
- Additional adapters as the community requests them
Versioning: while on 0.x, minor versions may include breaking changes. 1.0 will lock the public assertion API. See CHANGELOG.md for what shipped when.
Questions, ideas, war stories about multi-agent handoffs going wrong — all welcome.
- GitHub Discussions — Q&A, design conversations, show-and-tell
- GitHub Issues — bug reports and feature requests
CONTRIBUTING.md— dev setup, conventions, the 90 % coverage gateCODE_OF_CONDUCT.md— Contributor Covenant 2.0
Looking to contribute? Start with the good first issue label.
Requires uv.
git clone https://github.com/re-agent-ai/reagent-flow.git
cd reagent-flow && uv sync
uv run pytest packages/ -v
uv run ruff check packages/ examples/ && uv run ruff format --check packages/ examples/
uv run mypy packages/reagent-flow/src/reagent_flow/ --strictFor architecture notes and contribution guidelines see ARCHITECTURE.md and CONTRIBUTING.md.
Traces are plain JSON containing the full tool-call arguments and results from your agent runs — this may include sensitive data: API keys, user PII, database contents, anything your tools touch. Review before committing and add .reagent/ to .gitignore unless you're confident the contents are synthetic. A built-in redaction framework is on the roadmap.
For vulnerability disclosure see SECURITY.md.