bmad-code-org · pbean · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -12,7 +12,7 @@
       "name": "bauto",
       "source": "./src/automator/data/skills",
       "description": "Automation-mode skills driven by the bmad-auto orchestrator: unattended dev (bmad-auto-dev), adversarial review (bmad-auto-review), and deferred-work sweep triage (bmad-auto-sweep)",
-      "version": "0.6.3",
+      "version": "0.6.4",
       "author": {
         "name": "pinkyd"
       },

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,30 @@ All notable changes to `bmad-auto` are documented here. The format is based on
 [Semantic Versioning](https://semver.org/spec/v2.0.0.html). While the project is pre-1.0,
 breaking changes may land in a minor release.
 
+## [0.6.4] — 2026-06-21
+
+### Fixed
+
+- **Copilot token usage now records (was always 0).** Copilot writes its token totals only in
+  the trailing `session.shutdown` events line, ~1s after `agentStop` — usage was sampled before
+  it landed. `read_usage` now polls the transcript for a short grace, driven by a new per-profile
+  `usage_grace_s` (8s for `copilot`, 0 elsewhere = read once).
+- **Copilot multi-turn reviews no longer stall.** `agentStop` fires per response turn, so a
+  parallel-subagent review ends several turns and tripped the global `stop_without_result_nudges`
+  default of 1. New per-adapter floor (5 for `copilot`), overridable per stage via `[adapter.review]`.
+
+### Added
+
+- **`[adapter] usage_grace_s` / `stop_without_result_nudges`** (base + per-stage
+  `[adapter.dev|review|triage]`), editable in the settings TUI. Unset = inherit the CLI profile's
+  shipped default.
+
+### Changed
+
+- **Copilot docs.** Pin a capable model — the free GPT-5 mini default silently skips steps in
+  multi-step dev/review — and it's the Copilot **CLI** binary that's supported, not the VS Code
+  extension.
+
 ## [0.6.3] — 2026-06-21
 
 ### Fixed
@@ -467,6 +491,7 @@ enforced in CI.
   implementation phase, driven by a Python control loop with hook-based session transport and
   resumable on-disk run state.
 
+[0.6.4]: https://github.com/bmad-code-org/bmad-auto/releases/tag/v0.6.4
 [0.6.3]: https://github.com/bmad-code-org/bmad-auto/releases/tag/v0.6.3
 [0.6.2]: https://github.com/bmad-code-org/bmad-auto/releases/tag/v0.6.2
 [0.6.1]: https://github.com/bmad-code-org/bmad-auto/releases/tag/v0.6.1

diff --git a/README.md b/README.md
@@ -439,12 +439,14 @@ Each run drives its agents inside a dedicated tmux session, `bmad-auto-<run-id>`
 
 One generic driver (`adapters/generic_tmux.py`) runs any coding CLI that fits the tmux-injection + hook-signal transport; everything CLI-specific lives in a declarative **profile** (`adapters/profile.py`). Built-in profiles ship as TOML in `automator/data/profiles/`:
 
-| Profile   | Status                    | Notes                                                                                                                                                                                                                                                                |
-| --------- | ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `claude`  | supported                 | reference implementation                                                                                                                                                                                                                                             |
-| `codex`   | supported, E2E-verified   | Codex ≥ 0.139. No slash expansion in the initial prompt — the profile renders `$skill-name` mentions (plus a "use subagents as needed" nudge) instead. No SessionEnd hook; window-death fallback covers crashes.                                                     |
-| `gemini`  | supported, E2E-verified   | Gemini CLI ≥ 0.46 (hooks on by default since then). Launches with `-i` to stay interactive; `AfterAgent` maps to canonical Stop. Usage parser validated against real chat logs.                                                                                      |
-| `copilot` | bundled, pending live E2E | GitHub Copilot CLI ≥ 2026-02. Launches with `-i` to stay interactive; VS Code-compatible PascalCase `Stop` hook (snake_case payloads); `--allow-all-tools` for unattended runs. No `usage_parser` yet — run `probe-adapter` to capture its token schema (see below). |
+| Profile   | Status                  | Notes                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| --------- | ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `claude`  | supported               | reference implementation                                                                                                                                                                                                                                                                                                                                                                                                            |
+| `codex`   | supported, E2E-verified | Codex ≥ 0.139. No slash expansion in the initial prompt — the profile renders `$skill-name` mentions (plus a "use subagents as needed" nudge) instead. No SessionEnd hook; window-death fallback covers crashes.                                                                                                                                                                                                                    |
+| `gemini`  | supported, E2E-verified | Gemini CLI ≥ 0.46 (hooks on by default since then). Launches with `-i` to stay interactive; `AfterAgent` maps to canonical Stop. Usage parser validated against real chat logs.                                                                                                                                                                                                                                                     |
+| `copilot` | supported, E2E-verified | GitHub Copilot **CLI** (the `copilot` binary, GA ≥ 2026-02) — _not_ the VS Code extension. Launches with `-i` to stay interactive; turn-end is `agentStop` (per response turn); `--allow-all-tools` for unattended runs. `copilot-events` usage parser reads token totals from the trailing `session.shutdown` line, so the profile waits a short grace (`usage_grace_s = 8`) before tallying. **Pin a capable model** (see below). |
+
+**Copilot — pin a capable model:** Copilot's free default (GPT-5 mini) is unreliable for the multi-step dev/review skills — it silently skips steps mid-workflow and fails the story. Set a capable model in policy, e.g. `[adapter] model = "claude-sonnet-4-6"` (passed through as `--model`), for end-to-end reliability. Because Copilot fires `agentStop` per response turn, a thorough multi-turn review needs more than one nudge to finish; the profile ships `stop_without_result_nudges = 5`, and you can tune it per stage (e.g. `[adapter.review] stop_without_result_nudges = …`). Both knobs are editable in the settings TUI under `[adapter]`.
 
 **On budgets:** agentic sessions are dominated by cache reads (80–90%+ of raw tokens), which every supported vendor bills at ~0.1x base input. The `max_tokens_per_story` check therefore uses a cost-weighted total — cache reads count at `limits.cache_read_weight` (default 0.1) — while displayed totals stay raw. Set the weight to 1.0 to budget raw tokens.
 

diff --git a/docs/FEATURES.md b/docs/FEATURES.md
@@ -112,8 +112,7 @@ See [README.md](../README.md) for the narrative overview and [setup-guide.md](se
 ### Multi-CLI / multi-agent support
 
 - Generic tmux adapter drives any CLI fitting the tmux-injection + hook-signal transport; CLI specifics live in declarative TOML profiles.
-- Supported, E2E-verified: `claude` (reference), `codex` (≥ 0.139), `gemini` (≥ 0.46).
-- Bundled but pending live E2E verification: `copilot` (GitHub Copilot CLI ≥ 2026-02; VS Code-compatible `Stop` hook, `-i` interactive launch, `--allow-all-tools`).
+- Supported, E2E-verified: `claude` (reference), `codex` (≥ 0.139), `gemini` (≥ 0.46), `copilot` (GitHub Copilot CLI ≥ 2026-02 — the `copilot` binary, not the VS Code extension; `agentStop` turn-end, `-i` interactive launch, `--allow-all-tools`; pin a capable model — the free GPT-5 mini default is unreliable for multi-step skills).
 - Per-stage CLI/model overrides: run dev on one CLI/model, review on another (`[adapter.dev]`, `[adapter.review]`, `[adapter.triage]`).
 - Add a CLI without touching Python: drop a TOML profile in `.automator/profiles/<name>.toml` (binary, prompt template, bypass flags, hook dialect, native→canonical event map).
 - `bmad-auto probe-adapter` collects + sanitizes the data needed to finalize/add a profile (hook payload shape, transcript location/format, token schema): a zero-launch scan by default, opt-in `--probe` for live capture. See the [adapter authoring guide](adapter-authoring-guide.md).

diff --git a/docs/images/dashboard.png b/docs/images/dashboard.png