Contract-First Agents

A research-backed protocol that makes multi-agent AI coordination actually work.
400+ experiments. 52.5% quality improvement. 75% fewer integration errors.

Quickstart | The Protocol | Experiments | Why It Works | Full Research

The Problem

Multi-agent AI systems are powerful in theory but fragile in practice. When you spawn multiple AI agents to work on a shared codebase or document, you get:

Naming collisions -- Agent A calls it getUserById, Agent B calls it get_user_by_id
Interface mismatches -- Agent A returns a dict, Agent B expects a TypedDict
Import failures -- Agent A exports AuthService, Agent B imports auth_service
Style drift -- Mixed indentation, inconsistent docstrings, conflicting error patterns
Silent integration bugs -- Code that compiles but fails at runtime due to incompatible assumptions

These aren't hypothetical problems. In our experiments, naive multi-agent coordination scored 0.571 on a composite quality metric. Nearly half the output had integration defects.

The Solution: Contract-First Coordination

Write a shared contract before spawning any agents. The contract specifies every interface, type, naming convention, and dependency upfront. Each agent receives the full contract alongside its individual assignment. The result: agents that produce code which actually fits together on the first try.

Without contract:  Agent A writes get_user()    Agent B calls getUser()     --> BROKEN
With contract:     Contract says get_user()     Both agents use get_user()  --> WORKS

This single idea -- spending ~5% of total time writing the contract -- accounts for the entire 52.5% quality improvement we measured across 400+ experiments. No other coordination mechanism (review phases, sequential pipelines, hierarchical oversight) came close.

Quickstart

The protocol has four phases. Here is the minimal version:

1. Write the contract (~5% of total effort)

# SHARED CONTRACT: My API Project

## Module Manifest
- auth.py    -> exports: create_token(), verify_token()
- routes.py  -> exports: app (FastAPI instance)
- models.py  -> exports: User, Session (TypedDict)

## Shared Types
User = TypedDict("User", {"id": str, "email": str, "role": Literal["admin", "user"]})

## Style Guide
- snake_case for functions, PascalCase for classes
- Google-style docstrings on all public functions
- 4-space indentation, no tabs

2. Spawn agents in parallel -- each gets the FULL contract + their assignment

3. Validate -- check imports resolve, names match, styles align

4. Fix -- if issues found, spawn ONE fixer agent with specific error list

That is it. The full protocol with contract templates and validation checklists is in SKILL.md.

The Protocol

Phase 1: Contract Generation

Before spawning any worker agent, the team lead writes a contract covering six areas:

Contract Section	What It Specifies
Module Manifest	Every file, its purpose, all exported names (exact spelling)
Interface Definitions	Function signatures, parameter types, return types
Shared Types	Data structures used across modules, field names, validation rules
Style Guide	Naming conventions, indentation, docstring format, error handling
Dependency Map	Which modules import from which, execution order constraints
Section Boundaries	Per-agent deliverables, imports, exports, connection points

The contract must be specific enough that an agent reading only the contract can produce correct code. "Be consistent" is not a style guide. "snake_case for functions, PascalCase for classes, 4-space indent" is.

Phase 2: Parallel Execution

All worker agents spawn simultaneously. Each receives:

The complete contract (not just their section)
Their specific assignment (which module to build)
Critical instructions to follow the contract exactly and not rename or reorganize interfaces

Phase 3: Automated Validation

After all workers complete, validate the merged output in priority order:

Syntax validity (does it parse?)
Import resolution (do all imports match actual exports?)
Name consistency (uniform naming convention?)
Completeness (all contracted exports present?)
Style consistency (indentation, docstrings, error patterns)
Cross-references (correct function signatures?)

Phase 4: Targeted Fix

If validation finds issues, spawn one fixer agent that receives the merged output, the specific errors (with line numbers), and the original contract. The fixer patches only the listed issues. It does not regenerate or restructure anything.

Experiment Results

We ran 400+ controlled experiments comparing 7 coordination strategies across configurations from 2 to 64 agents. Every experiment used the same task set, same evaluation rubric, and same models to ensure fair comparison.

Headline Numbers

Metric	Naive Multi-Agent	Contract-First	Improvement
Composite Quality Score	0.571	0.871	+52.5%
Integration Error Rate	~48% of outputs	~12% of outputs	-75%
Interface Mismatches	3.2 per merge	0.4 per merge	-87.5%
Naming Collisions	2.1 per merge	0.1 per merge	-95.2%

Strategy Comparison (all 7 tested)

#	Strategy	Composite Score	Notes
1	Naive parallel (no coordination)	0.571	Baseline. Agents work independently.
2	Sequential pipeline	0.634	Each agent sees previous output. Slow.
3	Hierarchical review	0.651	Manager reviews and routes fixes. High overhead.
4	Shared context window	0.612	All agents share conversation. Noisy.
5	Post-hoc integration agent	0.643	Fixer agent merges outputs after the fact.
6	Contract-first parallel	0.871	Shared contract, parallel execution.
7	Contract-first + review phase	0.878	Contract + manager review. Marginal gain for 2x time.

Key finding: The contract alone accounts for the entire quality jump. Adding review on top (strategy 7 vs 6) yields less than 1% additional improvement while roughly doubling coordination time.

Scaling Behavior

Agent Count	Naive Score	Contract-First Score	Improvement
2 agents	0.721	0.894	+24.0%
4 agents	0.623	0.883	+41.7%
8 agents	0.558	0.867	+55.4%
16 agents	0.502	0.859	+71.1%
32 agents	0.461	0.848	+83.9%
64 agents	0.413	0.831	+101.2%

The improvement grows as agent count increases. This makes intuitive sense: more agents means more potential interface mismatches, and the contract prevents all of them.

Error Category Breakdown

Without a contract, integration errors fall into predictable categories:

Error Type	Frequency (Naive)	Frequency (Contract-First)	Reduction
Naming convention mismatch	31%	2%	-93.5%
Import/export mismatch	27%	4%	-85.2%
Type signature disagreement	19%	3%	-84.2%
Style inconsistency	14%	2%	-85.7%
Structural incompatibility	9%	1%	-88.9%

For the full experimental methodology, raw data tables, and statistical analysis, see docs/EXPERIMENTS.md.

Why Contracts Work

The insight is simple but counterintuitive: the coordination problem in multi-agent AI systems is not about communication during execution. It is about alignment before execution.

When agents communicate during execution (shared context, message passing, hierarchical review), they create noise, latency, and conflicting instructions. When agents align on a contract before execution, they can work fully independently and still produce outputs that integrate cleanly.

This is analogous to how microservices work in software engineering. Services do not need to talk to each other constantly if they agree on API contracts upfront. The contract is the coordination mechanism.

Why other strategies fall short

Sequential pipelines sacrifice parallelism for marginal quality gain. Each agent waits for the previous one. Total time scales linearly with agent count.
Hierarchical review adds a manager agent that becomes a bottleneck. The manager must understand all modules deeply enough to catch integration issues. This rarely works.
Shared context windows create noise. Agent C's internal reasoning about its module pollutes the context for Agent D, leading to confusion and drift.
Post-hoc integration tries to fix the problem after it exists. A fixer agent receives conflicting outputs and must resolve ambiguities that could have been prevented.

Strategies We Tested

We evaluated seven distinct multi-agent coordination strategies. Here is what each one does and why it scored the way it did:

1. Naive Parallel (Score: 0.571) -- Agents receive the same high-level task and work independently. No shared context, no coordination. This is the default when you just spawn multiple agents without thinking about coordination. It is fast but produces incompatible outputs roughly half the time.

2. Sequential Pipeline (Score: 0.634) -- Agent 1 completes its work, then Agent 2 sees Agent 1's output and builds on it. Eliminates some mismatches because each agent sees what came before, but total time scales linearly with agent count. Not practical above 4-5 agents.

3. Hierarchical Review (Score: 0.651) -- A manager agent assigns tasks, collects outputs, reviews them, and routes fixes. The manager becomes the bottleneck. It must deeply understand every module to catch interface issues. Works poorly when the task is complex.

4. Shared Context Window (Score: 0.612) -- All agents share a conversation or context buffer. In theory this keeps everyone aligned. In practice, one agent's internal reasoning creates noise for others, leading to context pollution and reasoning drift.

5. Post-hoc Integration (Score: 0.643) -- Agents work independently (like naive parallel), then a dedicated integration agent merges the outputs. Better than naive, but the integrator faces an impossible task: resolving ambiguities that should have been prevented, not patched.

6. Contract-First Parallel (Score: 0.871) -- Write a detailed contract specifying all interfaces, types, and conventions. Every agent gets the full contract plus their specific assignment. Agents work in parallel with no inter-agent communication. The contract is the coordination.

7. Contract-First + Review (Score: 0.878) -- Same as #6, but adds a review phase where a manager agent checks outputs against the contract. Yields <1% improvement over contract-alone while approximately doubling coordination overhead. Not worth the cost in most cases.

When to Use This Protocol

Always use the contract for:

Any task requiring 3+ AI agents working on shared code
Multi-module code generation where modules reference each other
Large document generation with cross-referencing sections
API development where multiple services must interoperate
Any task where agent outputs will be merged into a single codebase

Skip the contract for:

Completely independent tasks with no cross-references
Single-agent tasks (no coordination needed)
Research and exploration tasks (no integration surface)
Quick fixes or single-file changes

Key Principles

The contract IS the coordination. Review phases add less than 5% value when the contract is comprehensive. Invest in the contract, not in oversight.
Every agent gets the FULL contract. Not just their section. An agent building the auth module needs to know the exact types the API module expects.
Specific beats general. "Use snake_case for functions, PascalCase for classes, 4-space indent" is a style guide. "Be consistent" is not.
Parallel beats sequential. The whole point of the contract is to enable safe parallel execution. Do not sacrifice this by adding sequential dependencies.
Targeted fixes beat regeneration. When validation finds issues, patch the specific lines. Do not regenerate entire modules.
10+ agents need automated validation. With many agents, the probability of at least one error approaches certainty. Validate programmatically.

Project Structure

contract-first-agents/
  README.md              # This file (SEO-optimized overview + experiment results)
  SKILL.md               # The complete protocol specification (copy to your agent skill directory)
  LICENSE                # MIT License
  assets/
    mascot.png           # CapyBuild mascot
  docs/
    EXPERIMENTS.md        # Full experimental methodology and raw data
  examples/
    contract-template.md  # Ready-to-use contract template

Contributing

Contributions are welcome. If you've run your own multi-agent coordination experiments or have ideas for improving the protocol, open an issue or submit a PR.

License

MIT License. See LICENSE for details.

Keywords

For the search engines, the LLMs, and the curious humans who find things by typing random words into the void

multi-agent AI coordination, contract-first protocol, AI agent orchestration, agent swarm coordination, LLM multi-agent systems, parallel AI agent execution, multi-agent code generation, AI team coordination, contract-driven development, agent interface specification, Claude Code skill, map-reduce agents, agent coordination protocol, multi-agent software engineering, AI coding agents, integration error reduction, interface contract AI, agent collaboration framework, multi-agent quality improvement, AI agent scaling, agentic coding workflow, multi-agent merge conflicts, naming collision prevention, type-safe agent coordination, agent swarm management, TeamCreate coordination, parallel execution AI agents, automated code validation, multi-module code generation, API contract specification, agent output integration, cross-agent interface alignment, multi-agent experiment results, AI coordination benchmark, agent orchestration framework, microservices for AI agents, shared contract protocol, agent task decomposition, multi-agent architecture, AI workforce coordination

_{Built with care by HappyCapy Research}
_{Making multi-agent AI coordination reliable, one contract at a time.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contract-First Agents

The Problem

The Solution: Contract-First Coordination

Quickstart

The Protocol

Phase 1: Contract Generation

Phase 2: Parallel Execution

Phase 3: Automated Validation

Phase 4: Targeted Fix

Experiment Results

Headline Numbers

Strategy Comparison (all 7 tested)

Scaling Behavior

Error Category Breakdown

Why Contracts Work

Why other strategies fall short

Strategies We Tested

When to Use This Protocol

Key Principles

Project Structure

Contributing

License

Keywords

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
docs		docs
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Folders and files

Latest commit

History

Repository files navigation

Contract-First Agents

The Problem

The Solution: Contract-First Coordination

Quickstart

The Protocol

Phase 1: Contract Generation

Phase 2: Parallel Execution

Phase 3: Automated Validation

Phase 4: Targeted Fix

Experiment Results

Headline Numbers

Strategy Comparison (all 7 tested)

Scaling Behavior

Error Category Breakdown

Why Contracts Work

Why other strategies fall short

Strategies We Tested

When to Use This Protocol

Key Principles

Project Structure

Contributing

License

Keywords

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages