A framework for humans who orchestrate multiple AI coding agents.
Structured handoff · Trust calibration · Continuous improvement
English · 中文
| ⌨️ CLI | Claude Code Anthropic |
Codex CLI OpenAI |
Gemini CLI |
Kimi Code Moonshot AI |
| 🖥️ IDE | Cursor AI IDE |
Windsurf Codeium |
Antigravity Google DeepMind |
GitHub Copilot Chat |
| Aider Open Source |
OpenCode Open Source |
Continue Open Source |
+ any tool that reads/writes files |
Conductor is file-protocol based — any AI coding tool that can read and write files is compatible.
You're running Claude Code in one terminal, Cursor in another, maybe Codex for review — all on different projects. By end of day:
- 🤯 You can't remember what each agent decided
- 🔁 New sessions repeat mistakes from yesterday
- 📂 HANDOFF notes, devlogs, and error logs are scattered everywhere
- 🤔 You don't know which agent to trust for which task
Conductor brings order to this chaos.
We identified 12 dimensions that matter when a human works with multiple AI agents:
| # | Dimension | What It Covers |
|---|---|---|
| 1 | Handoff Management | Passing context between sessions without loss |
| 2 | Knowledge Capture | Recording decisions, errors, and QA pairs |
| 3 | Trust Calibration | Knowing when to verify vs. trust the agent |
| 4 | Cognitive Load | Managing your mental bandwidth across agents |
| 5 | Prompt Quality | Improving how you communicate with agents |
| 6 | Agent Profiling | Tracking each agent's strengths and weaknesses |
| 7 | Tool Selection | Picking the right agent for the right task |
| 8 | Feedback Loops | Turning errors into prevention rules |
| 9 | Attention Allocation | Deciding which project needs you right now |
| 10 | Disagreement Resolution | Handling conflicting agent advice |
| 11 | Cross-Agent Consistency | Keeping agents aligned on decisions |
| 12 | Energy Modeling | Adjusting oversight based on your fatigue |
Read more: docs/
| Template | Purpose |
|---|---|
HANDOFF.md |
Session-end context handoff |
CLAUDE.md |
Agent rules with handoff protocol |
ERROR_BOOK.md |
AI mistake tracker for trust calibration |
TRUST_PROFILE.md |
Agent reliability scorecard |
DESIGN.md |
UI design system for consistent agent output |
$ conductor status
🎵 Conductor · Project Status
┌─────────────┬────────────┬────────┬──────────────────────┐
│ Project │ Last Active│ Status │ Next Step │
├─────────────┼────────────┼────────┼──────────────────────┤
│ wenyuan │ 6h ago │ ✅ │ Refactor README │
│ network-opt │ 2h ago │ ✅ │ VPS setup │
│ conductor │ 30m ago │ ✅ │ Write tests │
│ old-project │ 3d ago │ 🔴 │ Archive or continue? │
└─────────────┴────────────┴────────┴──────────────────────┘
📅 2026-04-09 │ 4 projects │ 5 decisions │ 12 files Δ$ conductor digest ./my-project
🎵 Conductor · Project Digest
📁 my-project │ 📅 2026-04-01 → 2026-04-09 │ 🔄 5 sessions
📋 Decisions Made
┌────────────┬──────────────────────────────┐
│ 2026-04-09 │ Chose JWT over sessions │
│ 2026-04-08 │ Python 3.9+ compatibility │
└────────────┴──────────────────────────────┘
⚠️ Errors & Pitfalls
┌────────────┬──────────────────────────────┐
│ 2026-04-09 │ bcrypt 5.x broke hashes │
└────────────┴──────────────────────────────┘$ conductor memory add "Uses FastAPI + PostgreSQL" -t fact -g backend
✅ Memory #1 added (fact)
$ conductor memory search "FastAPI"
🔍 matches: #1 [fact] Uses FastAPI + PostgreSQL- Copy
templates/HANDOFF.md.templateto your project asHANDOFF.md - Copy the handoff protocol from
templates/CLAUDE.md.templateinto your project'sCLAUDE.md - Tell your AI: "Read HANDOFF.md before starting. Update it before ending."
pip install conductor-ai
conductor init ./my-project
conductor statusEvery session ends with a structured handoff:
## 2026-04-09
- done: Implemented user auth module
- decisions: Chose JWT over sessions (stateless, scales better)
- pitfall: bcrypt 5.x changed default rounds — broke existing hashes
- next: Add password reset flow500 token max. If you can't summarize it, you don't understand it.
Don't blindly trust or distrust your AI. Calibrate per domain:
| Layer | Method |
|---|---|
| L1 | Verify outcomes — does the code run? |
| L2 | Cross-verify — have another agent review |
| L3 | Progressive trust — try on one file first |
| L4 | Demand explanation — ask WHY, not just WHAT |
Not every task needs a full planning cycle:
| Size | Time | Process |
|---|---|---|
| S | < 30min | Just do it → test → commit → HANDOFF |
| M | 1-3h | Brief plan → execute → HANDOFF |
| L | > 3h | Brainstorm → plan → TDD → review → HANDOFF |
| vs. | Difference |
|---|---|
| CrewAI / LangGraph | They orchestrate agent-to-agent. We orchestrate human-to-agents. |
| OpenSpec | OpenSpec manages specs within one session. We manage across sessions and agents. |
| CLAUDE.md alone | CLAUDE.md is one file. We're a complete methodology + tools. |
| Nothing | You're losing decisions, repeating mistakes, and burning context window tokens. |
- v0.1 — Methodology docs + templates +
conductor status - v0.2 —
conductor digest— extract decisions/errors from project history - v0.3 —
conductor retro— interactive post-session agent review - v0.4 —
conductor memory— persistent cross-session knowledge store
"If you just drive the AI to work and walk away, you'll never know what you don't know. The original sin of AI-assisted development is not reviewing, not reflecting, not improving."
Conductor is built on three principles:
- Structure over ceremony — Lightweight protocols that actually get followed, not heavy processes that get skipped.
- Observe, then trust — Build trust through data (error books, trust profiles), not assumptions.
- The human improves too — It's not just about making AI better. It's about making you better at working with AI.
Contributions welcome! Please read the methodology docs first to understand the philosophy.