This walks through a real feature e2e: JWT authentication with a login endpoint. The goal is to make the mental model concrete.
The context window is the fundamental constraint. A single agent can't hold an entire project in memory. This harness solves that by:
- Decomposing large work into small, self-contained tasks (beads)
- Persisting state outside context windows — in beads and git, not in any agent's memory
- Spawning focused subagents that each get a clean context window with just enough information to do one thing
Every agent in this system only knows what it was explicitly given at spawn time. The harness — commands, beads, worktrees — is what stitches them together.
CLAUDE.md — loaded at the start of every agent session. The permanent briefing.
# CLAUDE.md
## Project
Node.js REST API. TypeScript, Express, Prisma, PostgreSQL.
## Quality gates (must pass before any PR)
- `npm test`
- `npm run lint`
...etc
## Conventions
- All endpoints in `src/routes/`
...etc
## Never do this
(empty for now — grows as mistakes happen)
This is a feedback loop. When an agent makes a mistake, you add it here so every future agent reads it before starting work.AGENTS.md — loaded automatically via a SessionStart hook configured in settings.json. Every agent session starts knowing what other agents exist and how they interact, without the coordinator needing to pass this manually.
Model guidance: The coordinator runs on Opus 4.6 (orchestration decisions require stronger reasoning). Implementer subagents run on Sonnet 4.6 (implementation work is well-scoped enough for a faster model).
Use this when you're not sure exactly what needs to be built, or the feature touches multiple files.
What runs: .claude/commands/plan.md
This file is a router. Its only job: receive your description, and tell Claude to load .claude/skills/planner/SKILL.md. The command itself has no logic.
What runs next: planner/SKILL.md loads into the agent's context. Four phases, strictly ordered.
Planner Phase 1 — Explore
The planner reads your actual codebase before proposing anything. It's looking for: how are other routes structured? What does the existing user model look like? What shared types exist? What would this touch?
This is ground truth collection. The planner is forbidden from writing code or filing issues here.
Planner Phase 2 — Discuss
The planner surfaces its findings and asks you questions before deciding anything. Things like: refresh tokens or access token only? Cookie or Authorization header? Should login return a user object or just a token?
You answer. It iterates. When you've reached consensus, it writes the agreed plan and pauses for your approval. Nothing gets filed until you say yes.
Planner Phase 3 — File issues
You approve. The planner creates beads issues:
epic: bd-a3f8 "JWT Authentication"
└── bd-a3f8.1 "Create users table and Prisma schema"
└── bd-a3f8.2 "POST /login endpoint" ← depends on bd-a3f8.1
└── bd-a3f8.3 "JWT middleware for protected routes" ← depends on bd-a3f8.2
Each task is self-contained — the implementer that runs it spawns in a fresh context window with no memory of this conversation. So every task includes: what to build, why, which files to touch, and what done looks like.
Beads stores these as JSONL in .beads/ — versioned with git, surviving across sessions.
Planner Phase 4 — Plan review
The planner spawns a subagent:
Task(
prompt: "ROLE: Plan Reviewer
SKILL: read .claude/skills/reviewer-plan/SKILL.md
EPIC: bd-a3f8"
)
Fresh context window. reviewer-plan/SKILL.md loads. This agent reads the filed beads issues and the codebase and checks: do tasks follow existing patterns? Are dependencies in the right order? Are there gaps? Is each task self-contained enough for a future implementer?
Returns APPROVED or CHANGES NEEDED with specific task-level feedback. Planner iterates if needed. Once approved, planning is done.
$ bd ready
bd-a3f8.1 Create users table and Prisma schema
Only bd-a3f8.1 shows because the other two are blocked by dependencies. Beads surfaces only what can actually run right now — you never have to track the dependency graph yourself.
You pass the epic ID. The coordinator handles sequencing of all subtasks.
What runs: .claude/commands/work.md
Router. Runs bd show bd-a3f8 --json and bd list --parent bd-a3f8 --json to fetch the epic and all subtasks from beads. Then: load coordinator/SKILL.md.
What runs next: coordinator/SKILL.md (running on Opus 4.6)
The coordinator reads all three tasks and their dependencies. Before anything else, it:
- Installs dependencies — runs
npm install(or equivalent) in the main repo. This must happen before spawning any subagents so they have a working environment. - Analyzes file overlap — checks which files each task will touch. Tasks that share files must run sequentially. Tasks with no overlap can run in parallel. Since this chain has strict dependencies, all three run sequentially.
- Creates one worktree and one branch for the entire work unit:
git worktree add ../myapp-jwt-authentication -b feature/jwt-authenticationOne branch. All three subtasks will accumulate commits here. One PR at the end. No merges between subtasks.
Coordinator — Task bd-a3f8.1 (users table)
Spawn implementer subagent (Sonnet 4.6):
Task(
subagent_type: "general-purpose",
model: "sonnet",
prompt: "ROLE: Implementer
SKILL: read .claude/skills/implementer/SKILL.md
WORKTREE: ../myapp-jwt-authentication
TASK: bd-a3f8.1"
)
New agent. Clean context window. It wakes up knowing only these four things. implementer/SKILL.md loads. Five phases, strictly ordered:
- Phase 1 — Write failing tests. Before touching the schema, write tests that describe the desired behavior. They fail because nothing exists yet. No exceptions, ever.
- Phase 2 — Implement. Create the Prisma schema and migration. Make the tests pass.
- Phase 3 — Verify. Run all quality gate commands from
CLAUDE.md. Zero errors required. - Phase 4 — Coverage review. Run
git diff --name-onlyto see what changed. Map changes to tests. Find gaps (error cases, edge cases). Write missing tests. Rerun quality gates. - Phase 5 — Summary. Output structured result: what changed, what was tested, any concerns. Return to coordinator.
Spawn 3 reviewers in parallel (skipped for single-file or config-only changes):
Task → SKILL: reviewer-correctness/SKILL.md ─┐
Task → SKILL: reviewer-tests/SKILL.md ├─ simultaneous
Task → SKILL: reviewer-architecture/SKILL.md ─┘
Three fresh context windows. Each reads its SKILL.md and the git diff since the last commit:
- Correctness: is the logic right? Any edge cases missed?
- Tests: are the tests actually testing behavior, or just line coverage?
- Architecture: does this fit existing patterns? Structural concerns?
Coordinator processes review findings:
- Trivial issues (typos, minor naming) → coordinator fixes directly, commits to the same branch
- Non-trivial issues (bugs, missing tests, duplication) → coordinator files a beads issue, spawns an implementer to fix it, closes the issue when done
No PR yet. bd-a3f8.1 work is committed to feature/jwt-authentication. The coordinator moves on to the next task in the same worktree.
Coordinator — Tasks bd-a3f8.2 and bd-a3f8.3
Same cycle for each remaining task: claim → spawn implementer → run reviewers → process findings. The coordinator already knows the full task list and dependency order from the initial bd list --parent call — it doesn't poll bd ready between tasks.
Since all three tasks share the same worktree (../myapp-jwt-authentication), each implementer picks up where the previous one left off. Commits accumulate on feature/jwt-authentication.
After all three tasks are complete, the coordinator runs quality gates one final time, pushes the branch, and creates a single PR covering the entire epic.
Separate command, separate agent. merge-queue/SKILL.md loads. Eight steps:
- Scan — check CI status on
main(failing main blocks everything), list all open PRs - Categorize — each PR is: ready to merge, awaiting review, CI pending, CI failing, or needs rebase
- Decide merge order — when multiple PRs are ready, prioritize by impact and conflict risk
- Choose strategy per PR — inspect commit quality: squash (messy/WIP commits), merge commit (clean distinct commits worth preserving), or rebase cleanup (mixed quality)
- Merge — execute the chosen strategy, close beads issues referenced in the PR body, remove worktree, delete feature branch, pull main
- Rebase — PRs behind main get rebased; trivial conflicts resolved inline, non-trivial filed as issues
- Handle CI failures — fetch failure logs, file a beads issue with details, never rerun (test failures are real)
- Report — summary of all PR states with beads issue IDs for any work that needs
/work
The coordinator does not own CI or merging. The merge agent does. This is an explicit boundary.
Use this when you already know exactly what needs to be built and don't need the planning ceremony.
/work "add rate limiting to the login endpoint"
The coordinator sees this isn't a beads ID. It runs:
bd create "add rate limiting to the login endpoint" -t feature --jsonCreates a bead on the fly, then runs the full cycle — installs dependencies, creates worktree, spawns implementer, runs reviewers (unless it's a single-file change), creates PR. Same output as the full path, just without the planning ceremony.
The two paths compared:
Full ceremony (big/uncertain):
/plan → discuss → beads filed → reviewer-plan → /work <epic-id>
Lightweight (small/clear):
/work "description" → coordinator creates bead inline → implements
/plan "JWT authentication"
→ plan.md (router)
→ planner/SKILL.md (Opus 4.6)
Phase 1: explore codebase
Phase 2: discuss with you → AskUserQuestion
Phase 3: file beads issues (bd-a3f8.1, .2, .3 with dependencies)
Phase 4: Task → reviewer-plan/SKILL.md → APPROVED
/work bd-a3f8 (epic ID — coordinator handles all subtasks)
→ work.md (router, fetches epic + subtasks from beads)
→ coordinator/SKILL.md (Opus 4.6)
install dependencies
create ONE worktree + branch: feature/jwt-authentication
analyze file overlap → run tasks sequentially (chain dependency)
For each task (in dependency order, same worktree):
Task → implementer/SKILL.md (Sonnet 4.6)
write failing tests → implement → verify → coverage review → summary
Task → reviewer-correctness/SKILL.md ─┐
Task → reviewer-tests/SKILL.md ├─ parallel
Task → reviewer-architecture/SKILL.md ─┘
fix trivial findings, file issues for non-trivial
after all tasks: push branch, open ONE PR
/merge
→ merge-queue/SKILL.md
scan PRs → categorize → choose strategy → merge/rebase → report
Every box is either a file loaded into a context window, or a fresh subagent spawned via the Task tool. Nothing is implicit. Every agent's behavior is determined entirely by which SKILL.md it was told to read at the moment it was spawned.