Agent Contracts

A formal framework for governing autonomous AI agents through explicit resource constraints and temporal boundaries.

Overview

Agent Contracts transforms autonomous AI agents from unbounded explorers into bounded optimizers by introducing formal contracts that specify:

🎯 Resource Budgets - Tokens, API calls, compute time, and costs
⏱️ Temporal Constraints - Deadlines, duration limits, and lifecycle boundaries
📊 Success Criteria - Measurable conditions for contract fulfillment
🔄 Lifecycle Management - Clear states from activation to termination

The Problem

Current agentic AI systems face critical challenges:

Unbounded Resource Consumption - Agents can consume unpredictable amounts of tokens, API calls, and compute time
Unclear Lifecycles - No explicit termination criteria, leading to resource leaks
Difficult Governance - Hard to audit, ensure compliance, and attribute costs
Coordination Complexity - Multi-agent systems lack formal resource allocation mechanisms

The Solution

Agent Contracts provide a mathematical framework that enables:

Predictable Costs - Explicit resource budgets prevent runaway consumption
Formal Verification - Contract states and constraints are machine-verifiable
Time-Resource Tradeoffs - Strategic optimization between speed and economy
Multi-Agent Coordination - Hierarchical contracts and resource markets

Quick Examples

Basic LLM Integration

from agent_contracts import Contract, ContractedLLM, ResourceConstraints, ContractMode

# Define a contract with resource budgets
contract = Contract(
    id="research-task",
    name="Research Assistant",
    mode=ContractMode.BALANCED,  # Optimize for quality-cost-time balance
    resources=ResourceConstraints(
        tokens=10000,
        api_calls=50,
        cost_usd=1.0
    )
)

# Execute LLM calls within contract constraints
with ContractedLLM(contract) as llm:
    response = llm.completion(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Summarize recent AI papers"}]
    )

# Contract automatically enforces:
# ✅ Token budget limits
# ✅ API call tracking
# ✅ Cost monitoring
# ✅ Violations trigger warnings or stops

Per-Tool Resource Limits

Fine-grained control over individual tool usage:

from agent_contracts import Contract, ResourceConstraints

contract = Contract(
    id="research-agent",
    name="Research Agent",
    resources=ResourceConstraints(
        tokens=10000,
        tool_invocations=20,  # Total limit across all tools
        per_tool_limits={
            "web_search": 5,   # Max 5 web searches
            "code_exec": 3,    # Max 3 code executions
            # Other tools limited only by aggregate
        }
    )
)

Pre-Execution Hooks (Custom Policy)

Add custom governance logic that runs before every constraint check:

from agent_contracts import (
    Contract, ContractedLLM, CheckContext, HookResult,
    EnforcementAction, ResourceConstraints,
)

# Define a hook that blocks off-topic requests
def topic_guard(ctx: CheckContext) -> HookResult:
    messages = ctx.metadata.get("messages", [])
    if any("off-topic" in str(m) for m in messages):
        return HookResult(
            allow=False,
            reason="Request outside allowed domain",
            action=EnforcementAction.HARD_STOP,
        )
    return HookResult()  # allow by default

contract = Contract(
    id="guarded-agent",
    resources=ResourceConstraints(tokens=10000, cost_usd=1.0)
)

with ContractedLLM(contract) as llm:
    llm.enforcer.add_pre_check_hook(topic_guard)
    # Hooks fire automatically on every LLM call
    # Works with all integrations: LiteLLM, LangGraph, Google ADK, Claude SDK

LangGraph Multi-Agent Workflows ⭐

For complex workflows with cycles and multi-agent coordination:

from langgraph.graph import StateGraph, END
from agent_contracts import Contract, ResourceConstraints
from agent_contracts.integrations.langgraph import ContractedGraph

# Build complex graph with validation cycle
workflow = StateGraph(AgentState)
workflow.add_node("research", research_agent)
workflow.add_node("validate", validate_agent)
workflow.add_conditional_edges(
    "validate",
    should_retry,
    {True: "research", False: END}  # Can loop!
)
app = workflow.compile()

# Wrap with contract to prevent runaway loops
contract = Contract(
    id="research-workflow",
    resources=ResourceConstraints(
        tokens=50000,
        api_calls=25,  # Limit iterations!
        cost_usd=2.0
    )
)

contracted_workflow = ContractedGraph(contract=contract, graph=app)
result = contracted_workflow.invoke({"query": "Research topic"})

# Budget enforced across ALL nodes and cycles:
# ✅ Prevents infinite loops
# ✅ Multi-agent budget sharing
# ✅ Real-time violation detection
# ✅ Cumulative tracking across entire graph

Google ADK Multi-Agent Systems

For Google ADK-based agents and multi-agent hierarchies:

from google.adk.agents import LlmAgent
from agent_contracts import Contract, ResourceConstraints
from agent_contracts.integrations.google_adk import ContractedAdkAgent

# Create multi-agent hierarchy
researcher = LlmAgent(
    name="researcher",
    model="gemini-2.0-flash",
    instruction="You research topics thoroughly."
)

summarizer = LlmAgent(
    name="summarizer",
    model="gemini-2.0-flash",
    instruction="You create concise summaries."
)

coordinator = LlmAgent(
    name="coordinator",
    model="gemini-2.0-flash",
    instruction="You coordinate research and summarization.",
    sub_agents=[researcher, summarizer]
)

# Single budget for ENTIRE multi-agent system
contract = Contract(
    id="research-system",
    resources=ResourceConstraints(
        tokens=50000,  # For ALL agents combined
        api_calls=25,
        cost_usd=2.0
    )
)

contracted_system = ContractedAdkAgent(contract=contract, agent=coordinator)
result = contracted_system.run(
    user_id="user1",
    session_id="session1",
    message="Research and summarize quantum computing"
)

# Budget enforced across ALL agents in hierarchy:
# ✅ Detailed token tracking (prompt/response/thinking/cached)
# ✅ Multi-turn conversation protection
# ✅ Multi-agent coordination governance
# ✅ Tool execution monitoring

Contract Modes

Choose the mode that fits your requirements:

# URGENT mode: Minimize time, accept higher costs
contract = Contract(
    mode=ContractMode.URGENT,
    resources=ResourceConstraints(tokens=10000)
)
# → 50% faster execution, 20% more tokens

# BALANCED mode: Optimize quality-cost-time tradeoff
contract = Contract(
    mode=ContractMode.BALANCED,
    resources=ResourceConstraints(tokens=10000)
)
# → Standard execution with quality focus

# ECONOMICAL mode: Minimize costs, accept longer runtime
contract = Contract(
    mode=ContractMode.ECONOMICAL,
    resources=ResourceConstraints(tokens=10000)
)
# → 60% fewer tokens, 50% longer execution

Documentation

📚 Complete Documentation

Key Resources

Whitepaper - Complete theoretical framework with mathematical foundations
Pre-Execution Hooks - Custom governance hooks and behavioral monitor design
Examples - Coming soon: Practical implementation examples

Quick Start by Role

Researchers: Read the Formal Framework and Future Directions
Engineers: Check Implementation Architecture and Use Cases
Product Managers: Start with the Introduction and Use Cases

Key Concepts

Contract Definition

An Agent Contract C = (I, O, S, R, T, Φ, Ψ) includes:

I: Input specification
O: Output specification
S: Skills (tools, capabilities)
R: Resource constraints
T: Temporal constraints
Φ: Success criteria
Ψ: Termination conditions

Time-Resource Tradeoff

Agents can optimize along multiple dimensions:

Mode	Time	Resources	Quality
Urgent	Low ⚡	High 💰	85%
Balanced	Medium ⏱️	Medium 💵	95%
Economical	High 🐢	Low 💸	90%

Contract States

DRAFTED → ACTIVE → {FULFILLED, VIOLATED, EXPIRED, TERMINATED}

Agent Skills (agentskills.io Standard)

Agent Contracts supports the agentskills.io open standard for defining reusable agent behaviors:

from agent_contracts import SkillSpec, Capabilities, Contract

# Define a rich skill with full instructions
code_review = SkillSpec(
    name="code-reviewer",
    description="Review code for best practices, security issues, and test coverage.",
    instructions="""
    ## Instructions
    1. Read the target files
    2. Check for common issues:
       - Error handling
       - Security vulnerabilities
       - Test coverage
    3. Provide detailed feedback
    """,
    allowed_tools=["Read", "Grep", "Glob"],
    version="1.0.0",
)

# Use in capabilities (mix strings and SkillSpec)
contract = Contract(
    id="review-task",
    name="Code Review",
    capabilities=Capabilities(
        skills=[code_review, "simple-skill"],  # Both types work
        tools=["web_search"],
    ),
)

# Access skills programmatically
skill = contract.capabilities.get_skill("code-reviewer")
print(skill.instructions)

Features:

✅ Compatible with Microsoft, OpenAI, Cursor, and other adopters
✅ SKILL.md import/export (to_skill_md(), from_skill_md())
✅ Progressive disclosure (metadata vs full instructions)
✅ Backward compatible (string skills still work)

Repository Status

🎉 Ready for Release (November 2025)

Current Version: 0.1.0 Status: Production-ready, validated, documented

Phase 1: Core Framework ✅ Complete

✅ Contract data structures (C = I, O, S, R, T, Φ, Ψ)
✅ Resource monitoring and enforcement
✅ Token counting and cost tracking
✅ LiteLLM integration wrapper
✅ 145 tests, 96% coverage
✅ Live demo with Gemini 2.0 Flash

Phase 2A: Strategic Optimization ✅ Complete

✅ Contract modes (URGENT, BALANCED, ECONOMICAL)
✅ Budget-aware prompt generation
✅ Strategic planning utilities
✅ Quality-cost-time Pareto benchmark
✅ 209 core tests passing

Phase 2B: Governance & Benchmarks ✅ Complete

✅ Multi-step research benchmark (research agent with quality evaluation)
✅ Budget violation policy testing (100% enforcement validation)
✅ Cost governance validation (organizational policy compliance)
✅ Variance reduction analysis (N=20 validation, temperature=0 effect discovered)
✅ Quality metrics framework (3-phase validation study, CV=5.2%)
✅ LangChain 1.0+ integration (governance & compliance)
✅ Pre-commit hooks and code quality infrastructure

LangGraph Integration ✅ Complete (Premium Feature)

✅ ContractedGraph for complex multi-agent workflows
✅ Cumulative budget tracking across ALL nodes and cycles
✅ Loop/retry protection (prevents runaway costs)
✅ Multi-agent budget sharing
✅ 27 comprehensive tests, 85% coverage
✅ Real-world demos (validation cycles, parallel agents)

Google ADK Integration ✅ Complete

✅ ContractedAdkAgent for Google ADK agents
✅ Detailed token tracking (prompt, response, thinking, cached)
✅ Multi-turn conversation protection
✅ Multi-agent hierarchy governance
✅ Tool execution monitoring
✅ 11 comprehensive tests, 90% coverage
✅ Real-world demos (multi-turn, multi-agent)

Claude Agent SDK Integration ✅ Complete

✅ ContractedClaudeAgent with hook-based enforcement
✅ Exact token tracking from AssistantMessage.usage
✅ Per-tool limits and temporal enforcement via PreToolUse hooks
✅ Audit trail via PostToolUse hooks
✅ Full SDK passthrough (tools, MCP, subagents, skills, permissions)
✅ Dual API: async aexecute() and sync execute()
✅ 33 comprehensive tests

Pre-Execution Hooks ✅ Complete

✅ User-defined pre/post-check hooks on ContractEnforcer
✅ CheckContext, HookResult, CheckHook types for custom policy governance
✅ Integration metadata pass-through (all 5 integrations)
✅ Hook actions: WARN, THROTTLE (informational) and SOFT_STOP, HARD_STOP (blocking)
✅ Post-check hooks are observational (cannot block)
✅ Backward compatible — existing code works unchanged

Evaluation Pipelines ✅ Complete

✅ Research Pipeline: Multi-agent report generation (25 topics)
✅ Code Review Pipeline: Coder↔Reviewer loop (175 LiveCodeBench problems)
✅ CONTRACTED vs UNCONTRACTED comparison framework
✅ Conservation law enforcement in multi-agent delegation
✅ Iteration limits prevent runaway agent loops

Total: 646+ tests, 81%+ coverage

Use Cases

Agent Contracts are designed for:

Production AI Systems - Cost control and SLA compliance
Complex Multi-Agent Workflows ⭐ - LangGraph loops, retries, validation cycles
Enterprise Deployments - Governance, audit trails, and compliance
Claude Agent SDK - Govern Claude agents with per-tool limits and audit trails
Google ADK Applications - Multi-turn conversations and multi-agent hierarchies
LangChain Applications - Simple chains with budget enforcement
Research - Studying optimal agent behavior under constraints

Where Agent Contracts Shines

LangChain (simple chains):

3-10 LLM calls per execution
Budget risk: LOW to MODERATE
Value: Governance, compliance, multi-call protection

LangGraph (complex workflows) ⭐:

30+ LLM calls per execution (cycles, retries, parallel agents)
Budget risk: VERY HIGH (can spiral to $10+ without limits!)
Value: Loop protection, multi-agent coordination, cumulative tracking
This is the killer feature for production deployments

Claude Agent SDK (agentic coding & file/web/terminal):

10-100+ tool calls per session (Read, Edit, Bash, WebSearch, subagents)
Budget risk: HIGH (open-ended agents with many tools can spiral)
Value: Per-tool limits, temporal enforcement, audit trail, hook-based governance
Ideal for: Claude-powered agents, coding assistants, research agents

Google ADK (multi-turn & multi-agent):

10-50+ LLM calls per conversation (turns, agent coordination, tool use)
Budget risk: HIGH (multi-agent hierarchies can explode costs)
Value: Multi-turn protection, hierarchical governance, detailed token tracking
Ideal for: Google Cloud deployments, Gemini-based agents, conversational AI

Project Structure

agent-contracts/
├── src/agent_contracts/           # Core package
│   ├── core/
│   │   ├── contract.py           # Contract data structures
│   │   ├── monitor.py            # Resource monitoring
│   │   ├── enforcement.py        # Constraint enforcement
│   │   ├── tokens.py             # Token counting
│   │   ├── planning.py           # Strategic planning
│   │   └── prompts.py            # Budget-aware prompts
│   └── integrations/
│       ├── litellm_wrapper.py    # LiteLLM integration
│       ├── langchain.py          # LangChain integration
│       ├── langgraph.py          # LangGraph integration ⭐
│       ├── google_adk.py         # Google ADK integration
│       └── claude_agent_sdk.py   # Claude Agent SDK integration
├── tests/                         # 247+ tests, 94%+ coverage
│   ├── core/                     # Core module tests (209 tests)
│   └── integrations/             # Integration tests (38 tests)
├── benchmarks/                    # Live demonstrations & benchmarks
│   ├── demo_phase1.py            # Phase 1 interactive demo
│   ├── strategic/                # Strategic optimization benchmarks
│   ├── research_agent/           # Multi-step research benchmark
│   ├── governance/               # Policy & governance tests
│   ├── langchain/                # LangChain demos
│   ├── langgraph/                # LangGraph demos (multi-agent)
│   └── google_adk/               # Google ADK demos (multi-turn, multi-agent)
├── evaluation/                    # Experimental evaluations
│   ├── research_pipeline/        # Multi-agent research experiment
│   └── code_review_pipeline/     # Coder↔Reviewer experiment
├── docs/
│   ├── whitepaper.md             # Complete theoretical framework
│   └── testing-strategy.md       # Testing & validation plan
├── pyproject.toml                 # Package configuration
└── README.md                      # This file

Installation

# Install from PyPI
pip install ai-agent-contracts

# Or with uv
uv add ai-agent-contracts

The package is importable as agent_contracts:

from agent_contracts import Contract, ResourceConstraints

For development (from source):

git clone https://github.com/flyersworder/agent-contracts.git
cd agent-contracts
uv sync --dev

Requirements: Python ≥ 3.12

Optional dependencies:

litellm - For LLM integration (automatically installed)
langchain - For LangChain integration (uv sync --extra langchain)
langgraph - For LangGraph integration ⭐ (uv sync --extra langgraph)
google-adk - For Google ADK integration (uv sync --extra google-adk)
claude-agent-sdk - For Claude Agent SDK integration (uv sync --extra claude-agent-sdk)
matplotlib - For visualization benchmarks (pip install matplotlib)

Development

Setup

This project uses uv for dependency management. To set up the development environment:

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/flyersworder/agent-contracts.git
cd agent-contracts

# Install dependencies (including dev dependencies)
uv sync --dev

# Install pre-commit hooks
uv run pre-commit install

Code Quality

This project uses several tools to maintain code quality:

Ruff: Fast Python linter and formatter (replaces black, isort, flake8)
mypy: Static type checker
pre-commit: Git hooks for automated checks

Pre-commit hooks will automatically run on every commit. To manually run all checks:

# Run all pre-commit hooks
uv run pre-commit run --all-files

# Run specific tools
uv run ruff check .                    # Linting
uv run ruff format .                   # Formatting
uv run mypy .                          # Type checking

Running Tests

# Run tests (when available)
uv run pytest

# Run with coverage
uv run pytest --cov=agent_contracts --cov-report=html

Project Structure

docs/ - Documentation (whitepaper, testing strategy)
src/ - Source code (planned)
tests/ - Test suite (planned)
pyproject.toml - Project configuration and dependencies
uv.lock - Locked dependencies for reproducibility

Contributing

This is an evolving framework. We welcome contributions in:

Reference implementations (Python, TypeScript)
Integration with existing frameworks (LangChain, AutoGPT, etc.)
Practical examples and tutorials
Empirical studies and benchmarks

License

This project is licensed under CC BY 4.0.

Authors

Qing Ye (with assistance from Claude, Anthropic)

Citation

If you use this framework in your research, please cite:

@techreport{ye2025agentcontracts,
  title={Agent Contracts: A Resource-Bounded Optimization Framework for Autonomous AI Systems},
  author={Ye, Qing},
  year={2025},
  month={October}
}

Learn More

📖 Read the Whitepaper
🎯 Browse Documentation
💬 Open an Issue for questions or discussions

Version: 0.3.0 | Last Updated: March 28, 2026 | Status: Production Ready ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.github		.github
benchmarks		benchmarks
docs		docs
evaluation		evaluation
results/code_review/figures		results/code_review/figures
src/agent_contracts		src/agent_contracts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
claude.md		claude.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Agent Contracts

Overview

The Problem

The Solution

Quick Examples

Basic LLM Integration

Per-Tool Resource Limits

Pre-Execution Hooks (Custom Policy)

LangGraph Multi-Agent Workflows ⭐

Google ADK Multi-Agent Systems

Contract Modes

Documentation

Key Resources

Quick Start by Role

Key Concepts

Contract Definition

Time-Resource Tradeoff

Contract States

Agent Skills (agentskills.io Standard)

Repository Status

Use Cases

Where Agent Contracts Shines

Project Structure

Installation

Development

Setup

Code Quality

Running Tests

Project Structure

Contributing

License

Authors

Citation

Learn More

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages