A formal framework for governing autonomous AI agents through explicit resource constraints and temporal boundaries.
Agent Contracts transforms autonomous AI agents from unbounded explorers into bounded optimizers by introducing formal contracts that specify:
- 🎯 Resource Budgets - Tokens, API calls, compute time, and costs
- ⏱️ Temporal Constraints - Deadlines, duration limits, and lifecycle boundaries
- 📊 Success Criteria - Measurable conditions for contract fulfillment
- 🔄 Lifecycle Management - Clear states from activation to termination
Current agentic AI systems face critical challenges:
- Unbounded Resource Consumption - Agents can consume unpredictable amounts of tokens, API calls, and compute time
- Unclear Lifecycles - No explicit termination criteria, leading to resource leaks
- Difficult Governance - Hard to audit, ensure compliance, and attribute costs
- Coordination Complexity - Multi-agent systems lack formal resource allocation mechanisms
Agent Contracts provide a mathematical framework that enables:
- Predictable Costs - Explicit resource budgets prevent runaway consumption
- Formal Verification - Contract states and constraints are machine-verifiable
- Time-Resource Tradeoffs - Strategic optimization between speed and economy
- Multi-Agent Coordination - Hierarchical contracts and resource markets
from agent_contracts import Contract, ContractedLLM, ResourceConstraints, ContractMode
# Define a contract with resource budgets
contract = Contract(
id="research-task",
name="Research Assistant",
mode=ContractMode.BALANCED, # Optimize for quality-cost-time balance
resources=ResourceConstraints(
tokens=10000,
api_calls=50,
cost_usd=1.0
)
)
# Execute LLM calls within contract constraints
with ContractedLLM(contract) as llm:
response = llm.completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Summarize recent AI papers"}]
)
# Contract automatically enforces:
# ✅ Token budget limits
# ✅ API call tracking
# ✅ Cost monitoring
# ✅ Violations trigger warnings or stopsFine-grained control over individual tool usage:
from agent_contracts import Contract, ResourceConstraints
contract = Contract(
id="research-agent",
name="Research Agent",
resources=ResourceConstraints(
tokens=10000,
tool_invocations=20, # Total limit across all tools
per_tool_limits={
"web_search": 5, # Max 5 web searches
"code_exec": 3, # Max 3 code executions
# Other tools limited only by aggregate
}
)
)Add custom governance logic that runs before every constraint check:
from agent_contracts import (
Contract, ContractedLLM, CheckContext, HookResult,
EnforcementAction, ResourceConstraints,
)
# Define a hook that blocks off-topic requests
def topic_guard(ctx: CheckContext) -> HookResult:
messages = ctx.metadata.get("messages", [])
if any("off-topic" in str(m) for m in messages):
return HookResult(
allow=False,
reason="Request outside allowed domain",
action=EnforcementAction.HARD_STOP,
)
return HookResult() # allow by default
contract = Contract(
id="guarded-agent",
resources=ResourceConstraints(tokens=10000, cost_usd=1.0)
)
with ContractedLLM(contract) as llm:
llm.enforcer.add_pre_check_hook(topic_guard)
# Hooks fire automatically on every LLM call
# Works with all integrations: LiteLLM, LangGraph, Google ADK, Claude SDKFor complex workflows with cycles and multi-agent coordination:
from langgraph.graph import StateGraph, END
from agent_contracts import Contract, ResourceConstraints
from agent_contracts.integrations.langgraph import ContractedGraph
# Build complex graph with validation cycle
workflow = StateGraph(AgentState)
workflow.add_node("research", research_agent)
workflow.add_node("validate", validate_agent)
workflow.add_conditional_edges(
"validate",
should_retry,
{True: "research", False: END} # Can loop!
)
app = workflow.compile()
# Wrap with contract to prevent runaway loops
contract = Contract(
id="research-workflow",
resources=ResourceConstraints(
tokens=50000,
api_calls=25, # Limit iterations!
cost_usd=2.0
)
)
contracted_workflow = ContractedGraph(contract=contract, graph=app)
result = contracted_workflow.invoke({"query": "Research topic"})
# Budget enforced across ALL nodes and cycles:
# ✅ Prevents infinite loops
# ✅ Multi-agent budget sharing
# ✅ Real-time violation detection
# ✅ Cumulative tracking across entire graphFor Google ADK-based agents and multi-agent hierarchies:
from google.adk.agents import LlmAgent
from agent_contracts import Contract, ResourceConstraints
from agent_contracts.integrations.google_adk import ContractedAdkAgent
# Create multi-agent hierarchy
researcher = LlmAgent(
name="researcher",
model="gemini-2.0-flash",
instruction="You research topics thoroughly."
)
summarizer = LlmAgent(
name="summarizer",
model="gemini-2.0-flash",
instruction="You create concise summaries."
)
coordinator = LlmAgent(
name="coordinator",
model="gemini-2.0-flash",
instruction="You coordinate research and summarization.",
sub_agents=[researcher, summarizer]
)
# Single budget for ENTIRE multi-agent system
contract = Contract(
id="research-system",
resources=ResourceConstraints(
tokens=50000, # For ALL agents combined
api_calls=25,
cost_usd=2.0
)
)
contracted_system = ContractedAdkAgent(contract=contract, agent=coordinator)
result = contracted_system.run(
user_id="user1",
session_id="session1",
message="Research and summarize quantum computing"
)
# Budget enforced across ALL agents in hierarchy:
# ✅ Detailed token tracking (prompt/response/thinking/cached)
# ✅ Multi-turn conversation protection
# ✅ Multi-agent coordination governance
# ✅ Tool execution monitoringChoose the mode that fits your requirements:
# URGENT mode: Minimize time, accept higher costs
contract = Contract(
mode=ContractMode.URGENT,
resources=ResourceConstraints(tokens=10000)
)
# → 50% faster execution, 20% more tokens
# BALANCED mode: Optimize quality-cost-time tradeoff
contract = Contract(
mode=ContractMode.BALANCED,
resources=ResourceConstraints(tokens=10000)
)
# → Standard execution with quality focus
# ECONOMICAL mode: Minimize costs, accept longer runtime
contract = Contract(
mode=ContractMode.ECONOMICAL,
resources=ResourceConstraints(tokens=10000)
)
# → 60% fewer tokens, 50% longer execution- Whitepaper - Complete theoretical framework with mathematical foundations
- Pre-Execution Hooks - Custom governance hooks and behavioral monitor design
- Examples - Coming soon: Practical implementation examples
- Researchers: Read the Formal Framework and Future Directions
- Engineers: Check Implementation Architecture and Use Cases
- Product Managers: Start with the Introduction and Use Cases
An Agent Contract C = (I, O, S, R, T, Φ, Ψ) includes:
- I: Input specification
- O: Output specification
- S: Skills (tools, capabilities)
- R: Resource constraints
- T: Temporal constraints
- Φ: Success criteria
- Ψ: Termination conditions
Agents can optimize along multiple dimensions:
| Mode | Time | Resources | Quality |
|---|---|---|---|
| Urgent | Low ⚡ | High 💰 | 85% |
| Balanced | Medium ⏱️ | Medium 💵 | 95% |
| Economical | High 🐢 | Low 💸 | 90% |
DRAFTED → ACTIVE → {FULFILLED, VIOLATED, EXPIRED, TERMINATED}
Agent Contracts supports the agentskills.io open standard for defining reusable agent behaviors:
from agent_contracts import SkillSpec, Capabilities, Contract
# Define a rich skill with full instructions
code_review = SkillSpec(
name="code-reviewer",
description="Review code for best practices, security issues, and test coverage.",
instructions="""
## Instructions
1. Read the target files
2. Check for common issues:
- Error handling
- Security vulnerabilities
- Test coverage
3. Provide detailed feedback
""",
allowed_tools=["Read", "Grep", "Glob"],
version="1.0.0",
)
# Use in capabilities (mix strings and SkillSpec)
contract = Contract(
id="review-task",
name="Code Review",
capabilities=Capabilities(
skills=[code_review, "simple-skill"], # Both types work
tools=["web_search"],
),
)
# Access skills programmatically
skill = contract.capabilities.get_skill("code-reviewer")
print(skill.instructions)Features:
- ✅ Compatible with Microsoft, OpenAI, Cursor, and other adopters
- ✅ SKILL.md import/export (
to_skill_md(),from_skill_md()) - ✅ Progressive disclosure (metadata vs full instructions)
- ✅ Backward compatible (string skills still work)
🎉 Ready for Release (November 2025)
Current Version: 0.1.0 Status: Production-ready, validated, documented
Phase 1: Core Framework ✅ Complete
- ✅ Contract data structures (C = I, O, S, R, T, Φ, Ψ)
- ✅ Resource monitoring and enforcement
- ✅ Token counting and cost tracking
- ✅ LiteLLM integration wrapper
- ✅ 145 tests, 96% coverage
- ✅ Live demo with Gemini 2.0 Flash
Phase 2A: Strategic Optimization ✅ Complete
- ✅ Contract modes (URGENT, BALANCED, ECONOMICAL)
- ✅ Budget-aware prompt generation
- ✅ Strategic planning utilities
- ✅ Quality-cost-time Pareto benchmark
- ✅ 209 core tests passing
Phase 2B: Governance & Benchmarks ✅ Complete
- ✅ Multi-step research benchmark (research agent with quality evaluation)
- ✅ Budget violation policy testing (100% enforcement validation)
- ✅ Cost governance validation (organizational policy compliance)
- ✅ Variance reduction analysis (N=20 validation, temperature=0 effect discovered)
- ✅ Quality metrics framework (3-phase validation study, CV=5.2%)
- ✅ LangChain 1.0+ integration (governance & compliance)
- ✅ Pre-commit hooks and code quality infrastructure
LangGraph Integration ✅ Complete (Premium Feature)
- ✅ ContractedGraph for complex multi-agent workflows
- ✅ Cumulative budget tracking across ALL nodes and cycles
- ✅ Loop/retry protection (prevents runaway costs)
- ✅ Multi-agent budget sharing
- ✅ 27 comprehensive tests, 85% coverage
- ✅ Real-world demos (validation cycles, parallel agents)
Google ADK Integration ✅ Complete
- ✅ ContractedAdkAgent for Google ADK agents
- ✅ Detailed token tracking (prompt, response, thinking, cached)
- ✅ Multi-turn conversation protection
- ✅ Multi-agent hierarchy governance
- ✅ Tool execution monitoring
- ✅ 11 comprehensive tests, 90% coverage
- ✅ Real-world demos (multi-turn, multi-agent)
Claude Agent SDK Integration ✅ Complete
- ✅ ContractedClaudeAgent with hook-based enforcement
- ✅ Exact token tracking from AssistantMessage.usage
- ✅ Per-tool limits and temporal enforcement via PreToolUse hooks
- ✅ Audit trail via PostToolUse hooks
- ✅ Full SDK passthrough (tools, MCP, subagents, skills, permissions)
- ✅ Dual API: async
aexecute()and syncexecute() - ✅ 33 comprehensive tests
Pre-Execution Hooks ✅ Complete
- ✅ User-defined pre/post-check hooks on ContractEnforcer
- ✅
CheckContext,HookResult,CheckHooktypes for custom policy governance - ✅ Integration metadata pass-through (all 5 integrations)
- ✅ Hook actions: WARN, THROTTLE (informational) and SOFT_STOP, HARD_STOP (blocking)
- ✅ Post-check hooks are observational (cannot block)
- ✅ Backward compatible — existing code works unchanged
Evaluation Pipelines ✅ Complete
- ✅ Research Pipeline: Multi-agent report generation (25 topics)
- ✅ Code Review Pipeline: Coder↔Reviewer loop (175 LiveCodeBench problems)
- ✅ CONTRACTED vs UNCONTRACTED comparison framework
- ✅ Conservation law enforcement in multi-agent delegation
- ✅ Iteration limits prevent runaway agent loops
Total: 646+ tests, 81%+ coverage
Agent Contracts are designed for:
- Production AI Systems - Cost control and SLA compliance
- Complex Multi-Agent Workflows ⭐ - LangGraph loops, retries, validation cycles
- Enterprise Deployments - Governance, audit trails, and compliance
- Claude Agent SDK - Govern Claude agents with per-tool limits and audit trails
- Google ADK Applications - Multi-turn conversations and multi-agent hierarchies
- LangChain Applications - Simple chains with budget enforcement
- Research - Studying optimal agent behavior under constraints
LangChain (simple chains):
- 3-10 LLM calls per execution
- Budget risk: LOW to MODERATE
- Value: Governance, compliance, multi-call protection
LangGraph (complex workflows) ⭐:
- 30+ LLM calls per execution (cycles, retries, parallel agents)
- Budget risk: VERY HIGH (can spiral to $10+ without limits!)
- Value: Loop protection, multi-agent coordination, cumulative tracking
- This is the killer feature for production deployments
Claude Agent SDK (agentic coding & file/web/terminal):
- 10-100+ tool calls per session (Read, Edit, Bash, WebSearch, subagents)
- Budget risk: HIGH (open-ended agents with many tools can spiral)
- Value: Per-tool limits, temporal enforcement, audit trail, hook-based governance
- Ideal for: Claude-powered agents, coding assistants, research agents
Google ADK (multi-turn & multi-agent):
- 10-50+ LLM calls per conversation (turns, agent coordination, tool use)
- Budget risk: HIGH (multi-agent hierarchies can explode costs)
- Value: Multi-turn protection, hierarchical governance, detailed token tracking
- Ideal for: Google Cloud deployments, Gemini-based agents, conversational AI
agent-contracts/
├── src/agent_contracts/ # Core package
│ ├── core/
│ │ ├── contract.py # Contract data structures
│ │ ├── monitor.py # Resource monitoring
│ │ ├── enforcement.py # Constraint enforcement
│ │ ├── tokens.py # Token counting
│ │ ├── planning.py # Strategic planning
│ │ └── prompts.py # Budget-aware prompts
│ └── integrations/
│ ├── litellm_wrapper.py # LiteLLM integration
│ ├── langchain.py # LangChain integration
│ ├── langgraph.py # LangGraph integration ⭐
│ ├── google_adk.py # Google ADK integration
│ └── claude_agent_sdk.py # Claude Agent SDK integration
├── tests/ # 247+ tests, 94%+ coverage
│ ├── core/ # Core module tests (209 tests)
│ └── integrations/ # Integration tests (38 tests)
├── benchmarks/ # Live demonstrations & benchmarks
│ ├── demo_phase1.py # Phase 1 interactive demo
│ ├── strategic/ # Strategic optimization benchmarks
│ ├── research_agent/ # Multi-step research benchmark
│ ├── governance/ # Policy & governance tests
│ ├── langchain/ # LangChain demos
│ ├── langgraph/ # LangGraph demos (multi-agent)
│ └── google_adk/ # Google ADK demos (multi-turn, multi-agent)
├── evaluation/ # Experimental evaluations
│ ├── research_pipeline/ # Multi-agent research experiment
│ └── code_review_pipeline/ # Coder↔Reviewer experiment
├── docs/
│ ├── whitepaper.md # Complete theoretical framework
│ └── testing-strategy.md # Testing & validation plan
├── pyproject.toml # Package configuration
└── README.md # This file
# Install from PyPI
pip install ai-agent-contracts
# Or with uv
uv add ai-agent-contractsThe package is importable as agent_contracts:
from agent_contracts import Contract, ResourceConstraintsFor development (from source):
git clone https://github.com/flyersworder/agent-contracts.git
cd agent-contracts
uv sync --devRequirements: Python ≥ 3.12
Optional dependencies:
litellm- For LLM integration (automatically installed)langchain- For LangChain integration (uv sync --extra langchain)langgraph- For LangGraph integration ⭐ (uv sync --extra langgraph)google-adk- For Google ADK integration (uv sync --extra google-adk)claude-agent-sdk- For Claude Agent SDK integration (uv sync --extra claude-agent-sdk)matplotlib- For visualization benchmarks (pip install matplotlib)
This project uses uv for dependency management. To set up the development environment:
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository
git clone https://github.com/flyersworder/agent-contracts.git
cd agent-contracts
# Install dependencies (including dev dependencies)
uv sync --dev
# Install pre-commit hooks
uv run pre-commit installThis project uses several tools to maintain code quality:
- Ruff: Fast Python linter and formatter (replaces black, isort, flake8)
- mypy: Static type checker
- pre-commit: Git hooks for automated checks
Pre-commit hooks will automatically run on every commit. To manually run all checks:
# Run all pre-commit hooks
uv run pre-commit run --all-files
# Run specific tools
uv run ruff check . # Linting
uv run ruff format . # Formatting
uv run mypy . # Type checking# Run tests (when available)
uv run pytest
# Run with coverage
uv run pytest --cov=agent_contracts --cov-report=htmldocs/- Documentation (whitepaper, testing strategy)src/- Source code (planned)tests/- Test suite (planned)pyproject.toml- Project configuration and dependenciesuv.lock- Locked dependencies for reproducibility
This is an evolving framework. We welcome contributions in:
- Reference implementations (Python, TypeScript)
- Integration with existing frameworks (LangChain, AutoGPT, etc.)
- Practical examples and tutorials
- Empirical studies and benchmarks
This project is licensed under CC BY 4.0.
Qing Ye (with assistance from Claude, Anthropic)
If you use this framework in your research, please cite:
@techreport{ye2025agentcontracts,
title={Agent Contracts: A Resource-Bounded Optimization Framework for Autonomous AI Systems},
author={Ye, Qing},
year={2025},
month={October}
}- 📖 Read the Whitepaper
- 🎯 Browse Documentation
- 💬 Open an Issue for questions or discussions
Version: 0.3.0 | Last Updated: March 28, 2026 | Status: Production Ready ⭐