reagent-flow

Contract testing for multi-agent handoffs.

Catch schema drift, broken handoffs, and tool-output regressions in your test suite — not in production. Pytest-native, zero-dependency core, with thin adapters for OpenAI, Anthropic, LangChain, LangGraph, and CrewAI.

Documentation · Quickstart · Vendor Onboarding showcase · Discussions · Changelog

What it looks like

Vendor Onboarding example — a multi-agent approval workflow where an intake agent extracts a structured vendor packet and hands it to downstream security, finance, and approval agents. Declare the packet shape once as a contract:

VENDOR_PACKET_SCHEMA = {
    "vendor_name": str,
    "data_access": {
        "contains_customer_pii": bool,
        "data_categories": [str],
        "storage_region": str,
        "retention_days": int,
    },
    "compliance": {
        "soc2_available": bool,
        "dpa_required": bool,
        "subprocessors": [str],
    },
}

security.assert_handoff_matches(schema=VENDOR_PACKET_SCHEMA)

When the intake agent's tool drifts — say contains_customer_pii is renamed to handles_personal_data — the contract fails at the very next boundary, before the security review keeps going on incomplete data:

FAILED test_vendor_onboarding_security_review
  AssertionError: handoff field 'data_access.contains_customer_pii': missing from data

  AGENT STACK TRACE — security
  ─────────────────────────────────────────────────────────────
  parent: intake (a1f2…)
  handoff_context = {
      "vendor_name": "ClearVoice AI",
      "data_access": {
          "handles_personal_data": true,   ← drift
          "data_categories": ["call_audio", "transcripts"],
          "storage_region": "us-east-1",
          "retention_days": 365
      },
      ...
  }

  Turn 0  assess_security_risk(vendor_name="ClearVoice AI")
       ↳ {"risk": "blocked: missing PII flag"}
  ─────────────────────────────────────────────────────────────

Read the full walkthrough on the docs site: reagent-ai.mintlify.app/examples/vendor-onboarding

Where reagent-flow fits

Adjacent tool / approach	What it does	Where reagent-flow is different
Pydantic AI / structured outputs	Validates a single LLM call's output shape.	Validates the data passed between agents, across multiple sessions.
Guardrails / runtime guards	Blocks bad output at runtime, in production.	Catches it in your test suite, before the PR merges.
LangSmith / Langfuse / observability	Records traces for post-hoc inspection.	Records and asserts — your CI fails on drift.
LLM evals	Scores model output quality on a dataset.	Asserts deterministic structural contracts on every test run.
pytest-mock for agents	Mocks tool calls so tests don't hit live LLMs.	Captures real or mock traces and asserts on their shape.

Use reagent-flow when you have:

Multi-agent or multi-step pipelines passing structured data between sessions
A pytest suite where you want CI to fail on handoff drift before merge
Tool outputs whose shape your downstream agents silently depend on

Reach for something else when:

You only need to validate a single LLM call's output → use Pydantic directly
You need to block bad output at runtime in production → use a guardrails library
You need accuracy or quality scoring on a dataset → use an evals framework

What's in this monorepo

The core library plus five framework adapters, each a separate installable package:

Package	Version	Purpose	Docs
`reagent-flow`	0.5.0	Core: sessions, traces, assertions, golden baselines	Concepts
`reagent-flow-openai`	0.2.0	OpenAI Python SDK adapter	OpenAI
`reagent-flow-anthropic`	0.2.0	Anthropic Python SDK adapter	Anthropic
`reagent-flow-langchain`	0.2.0	LangChain callback handler	LangChain
`reagent-flow-langgraph`	0.2.0	LangGraph callback (extends LangChain)	LangGraph
`reagent-flow-crewai`	0.2.0	CrewAI tool wrapper	CrewAI

Runnable examples under examples/:

langgraph_demo/ — three-agent LangGraph pipeline (Gatherer → Assessor → Decider) that runs end-to-end and demonstrates a broken handoff being caught at the assessor boundary.
manual_logging/ — minimal refund flow using explicit log_llm_call / log_tool_result, no framework adapter required.

Install

uv add reagent-flow                # core, zero runtime deps
uv add reagent-flow-openai         # +OpenAI
uv add reagent-flow-anthropic      # +Anthropic
uv add reagent-flow-langchain      # +LangChain
uv add reagent-flow-langgraph      # +LangGraph
uv add reagent-flow-crewai         # +CrewAI

Python 3.10+. Each adapter depends only on its respective framework.

Next: write your first contract test in 5 minutes → reagent-ai.mintlify.app/quickstart

Status & roadmap

Current release: reagent-flow 0.5.0, adapters 0.2.0. Stability: alpha.

Stable today (full reference on the docs site):

Handoff contracts, tool-output contracts, context preservation
Flow, count, and ordering assertions
Nested schemas — typed lists, list-of-dicts, optional Pydantic BaseModel support
Golden-baseline diffs with ignore_fields
Token and cost guards with per-model pricing
Agent Stack Traces attached to every failed assertion
Five framework adapters with automatic tool-result capture
pytest plugin: fixtures, marker, CLI flags

Planned next:

Built-in trace redaction framework (for traces that may carry PII or secrets)
Additional adapters as the community requests them

Versioning: while on 0.x, minor versions may include breaking changes. 1.0 will lock the public assertion API. See CHANGELOG.md for what shipped when.

Community

Questions, ideas, war stories about multi-agent handoffs going wrong — all welcome.

GitHub Discussions — Q&A, design conversations, show-and-tell
GitHub Issues — bug reports and feature requests
CONTRIBUTING.md — dev setup, conventions, the 90 % coverage gate
CODE_OF_CONDUCT.md — Contributor Covenant 2.0

Looking to contribute? Start with the good first issue label.

Development

Requires uv.

git clone https://github.com/re-agent-ai/reagent-flow.git
cd reagent-flow && uv sync
uv run pytest packages/ -v
uv run ruff check packages/ examples/ && uv run ruff format --check packages/ examples/
uv run mypy packages/reagent-flow/src/reagent_flow/ --strict

For architecture notes and contribution guidelines see ARCHITECTURE.md and CONTRIBUTING.md.

Security & privacy

Traces are plain JSON containing the full tool-call arguments and results from your agent runs — this may include sensitive data: API keys, user PII, database contents, anything your tools touch. Review before committing and add .reagent/ to .gitignore unless you're confident the contents are synthetic. A built-in redaction framework is on the roadmap.

For vulnerability disclosure see SECURITY.md.

License

MIT · github.com/re-agent-ai/reagent-flow

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.claude		.claude
.github		.github
.vscode		.vscode
docs-site		docs-site
examples		examples
packages		packages
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASING.md		RELEASING.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reagent-flow

What it looks like

Where reagent-flow fits

What's in this monorepo

Install

Status & roadmap

Community

Development

Security & privacy

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

reagent-flow

What it looks like

Where reagent-flow fits

What's in this monorepo

Install

Status & roadmap

Community

Development

Security & privacy

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages