Skip to content

Release 0.7.0#11

Merged
pbean merged 24 commits into
mainfrom
release/0.7.0
Jun 25, 2026
Merged

Release 0.7.0#11
pbean merged 24 commits into
mainfrom
release/0.7.0

Conversation

@pbean

@pbean pbean commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Release 0.7.0

Minor release (breaking, pre-1.0): retires two bundled skills and reshapes how the orchestrator drives development.

Highlights

  • Retired the bmad-auto-dev fork — the orchestrator now drives the upstream bmad-dev-auto skill unmodified as its sole dev primitive. No fork to keep in sync.
  • Review is now a re-invocation of bmad-dev-auto on the done spec (a fresh step-04 review pass), not a separate skill.
  • Retired the bmad-auto-review skill — its adversarial review is covered by bmad-dev-auto's inline step-04; the two hunters are now always-required base skills. bmad-auto init installs three bundled skills (bmad-auto-resolve, bmad-auto-sweep, bmad-auto-setup); bmad-dev-auto is a hard upstream prerequisite.
  • Skill-recommended review (review.trigger recommended/always) gated on followup_review_recommended.
  • Pluggable terminal-multiplexer seam (groundwork for native Windows) + POSIX portability hardening & CI guard — Linux/macOS/WSL behavior byte-identical; no native-Windows backend ships yet.
  • New adapter & profile authoring guide with the complete CLIProfile/HookSpec field reference.
  • Escalation re-drive fixes (manual-rollback loop, recovery-notice wording, stale blocked spec).

Full details in CHANGELOG.md under [0.7.0].

Release mechanics

  • Version stamped to 0.7.0 across all files (sync_version.py --check green).
  • Screenshots + demo GIF regenerated (TUI changed since v0.6.4).
  • trunk check (full, no filter) → no issues.
  • On merge to main, the Release workflow auto-creates the v0.7.0 tag and GitHub release from the CHANGELOG.

Post-review fixes

Addressed CodeRabbit's review (commit 11411ff): 12 of 13 inline findings fixed — _result_json wait-polling, sprint-status no-op guard, TUI liveness degradation, squash-commit HEAD restore, taskkill System32 resolution, macOS cp --reflink gating + non-linux extra rename, plus doc reconciliations — each with targeted tests. One finding skipped (probe.pipe_pane is already contract-tolerant). See the resolution comment for the full breakdown.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added follow-up review policy with recommended vs always, bounded by limits.max_review_cycles.
    • Introduced a terminal-multiplexer abstraction layer (tmux backend) for session/window/pane control.
    • Updated orchestration to drive upstream bmad-dev-auto and refreshed the bundled skill set (now bmad-auto-resolve, bmad-auto-sweep, bmad-auto-setup).
  • Bug Fixes

    • Improved legacy result.json synthesis, workflow contract enforcement, escalation replay/state handling, and sprint-status advancement.
  • Documentation

    • Refreshed README and multiple guides (adapter/TOML profiles, setup, TUI settings, roadmap, portability).
  • Chores

    • Bumped version to 0.7.0, updated marketplace/module metadata, and expanded .gitignore for build output.

pbean and others added 19 commits June 22, 2026 13:30
Replace the bmad-quick-dev fork (interactive workflow + an automation-mode.md
decision table that existed only to override that interactivity) with a
first-class machine-first skill: lean steps (resolve/plan/implement/review/
finalize), the orchestrator contract stated up front, no greeting/menus/HALTs.
Preserves epic-context compilation, previous-story continuity, and the inline
three-layer adversarial review: with review.enabled=false the dev session runs
that inline triple-review itself before finalizing to done (a judge that did the
planning is a better-informed judge); with review.enabled=true the orchestrator
runs a separate fresh-context review session instead. Mirrors the upstream draft
bmad-code-org/BMAD-METHOD#2498 (renamed bmad-dev-auto -> bmad-auto-dev to match
the orchestrator's /bmad-auto-dev invocation). No engine change required.

Also make the dev result.json `workflow` an enforced contract instead of a
documented-but-ignored string: verify_dev/verify_dev_bundle reject a mismatch
against verify.DEV_WORKFLOW ("auto-dev"), and the skill emits "auto-dev" rather
than the misleading legacy "quick-dev". Review's "code-review" is documented as
informational by design (verify_review is purely disk-derived and is never
handed the result.json).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ordering

Remediate review comments on PR #9:

- verify_dev: the sprint-status gate accepted both 'review' and 'done'
  regardless of review_enabled, so a skip-review run that left the sprint
  at 'review' could pass dev verification. Make the expected sprint state
  conditional (mirrors the spec-status `expected`) and fix the hardcoded
  error text. Add a regression test + a docstring. (Augment, medium)
- step-05-finalize.md / SKILL.md: "result.json is your last action" + "End
  the turn" contradicted the On Complete hook that runs afterward. Reword so
  result.json is the last *artifact* and the on_complete hook is an explicit
  numbered terminal step before ending the turn. (Augment, low)
- skills/README.md: intro claimed all skills are upstream forks; bmad-auto-dev
  is standalone and resolve/sweep are automator-native. Soften the sentence
  to match the component table. (Augment, low)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- conftest.py: bundle_dev_effect docstring referenced a nonexistent
  step-04-finalize; the terminal finalize step is step-05-finalize. (Augment)
- CHANGELOG: "a judge that did the planning" -> "who" (person referent). (CodeRabbit)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eam #2500

Fold three refinements from bmad-code-org/BMAD-METHOD#2500 into the skill:

- Spec Boundaries gain a frozen "Block If" tier enumerating decisions that
  cannot be made unattended. Triggering one in step-03 (implement) or step-04
  (review) escalates CRITICAL (type: block-if); step-02 populates it during
  planning. Reuses the existing CRITICAL pause path — no engine change.
- sync-sprint-status now writes a machine-readable `sprint-sync-skipped`
  frontmatter warning when a story key is absent (was log-only).
- compile-epic-context uses a deterministic fallback sentence when planning
  artifacts are missing, and drops "Edit freely" from the regenerated banner.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Integrate Alex Verhovsky's generic bmad-dev-auto skill (BMAD-METHOD #2500) as a
dev primitive the orchestrator drives, selected via policy.dev.skill
(= "bmad-auto-dev" default | "bmad-dev-auto"). The skill self-finalizes its spec
on disk; the orchestrator synthesizes the result, owns sprint-status and the
deferred-work ledger, and verifies on real on-disk state.

- devcontract.py: synthesize_result / parse_auto_run_result / find_result_artifact
  + reset_spec_status; maps blocked -> CRITICAL, forges workflow:auto-dev.
- sprintstatus.advance(): never-regress line editor over STATUS_ORDER, epic-lift.
- GenericDevAdapter: synthesizes the result from the on-disk spec (skill writes no
  result.json), _stop_nudges = 0; cli wires it for the generic dev role.
- engine: _generic_dev / _dev_review_enabled / _post_dev_state_sync (sprint +
  ledger single-writer), B6 spec reset+resume repair prompt.
- sweep: generic-path bundles (orchestrator marks the ledger), flat deferred-work
  normalizer in parse_legacy.

Validated end-to-end against real claude sessions (parity vs legacy, blocked ->
CRITICAL escalation, B6 repair). Full suite green, trunk clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…auto (0.6.5)

Retire the bundled bmad-auto-dev fork entirely; the orchestrator now drives the
upstream bmad-dev-auto skill unmodified as its sole dev primitive. All
automator-specific concerns (escalation, result synthesis, sprint/ledger
bookkeeping, repair-resume) stay orchestrator-side. bmad-auto-review stays
bundled and runs by default, gated on review.enabled.

- DEV_SKILLS={bmad-dev-auto}; the generic dev path is now the only path —
  GenericDevAdapter synthesizes result.json from the spec the skill leaves on
  disk; engine/sweep/cli always take the generic branch.
- Removed bmad-auto-dev from MODULE_SKILLS and deleted the bundled fork dir;
  relocated deferred-work-format.md into bmad-auto-review/ (a sibling dependency
  of the review and sweep skills, not dev).
- Added install.missing_base_skills() preflight wired into validate +
  run/sweep/resume: requires the upstream bmad-dev-auto skill, plus the review
  hunters (bmad-review-adversarial-general, bmad-review-edge-case-hunter) when
  review is enabled — fails loudly with remediation instead of stalling on an
  Unknown command.
- Worktree provisioning copies the non-bundled upstream skills from the main
  repo's installed tree, since they are no longer in the wheel.
- Bumped 0.6.4 -> 0.6.5 (sync_version.py) and updated docs + CHANGELOG. Full
  suite (1008) green, trunk clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mended

Consume the new `followup_review_recommended` spec frontmatter signal from
bmad-dev-auto (BMAD-METHOD #2505). The skill self-reviews inline every story
and flags an independent follow-up review only when its review-driven changes
were significant; the orchestrator now gates its separate bmad-auto-review pass
on that signal.

- devcontract.synthesize_result: carry the flag into the result dict on done.
- model.StoryTask: persist followup_review_recommended (legacy state -> False).
- engine: capture the flag post-dev; _review_and_commit skips the review loop
  (journalling review-not-recommended, validating gates + committing) unless
  recommended.
- policy.ReviewPolicy.trigger ("recommended" default | "always") + validation +
  REVIEW_TRIGGER_MODES; exposed in the TUI via core.toml and policy.toml.
- Cap reuses limits.max_review_cycles (default 3) as the oscillation guard;
  no new knob.

Tests: dev-session fixtures emit the flag so the review-flow suite still runs
review under the new default; added devcontract/model/policy/settings-schema/
engine coverage. Docs + CHANGELOG updated. Full suite (1018) green, trunk clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…in-place

Re-arming a resolved escalation requests a clean rebuild against the corrected
spec; in non-worktree runs that means resetting the in-place tree to the story
baseline. With the default scm.rollback_on_failure=false the orchestrator paused
for a manual reset but never cleared baseline_commit, so following the notice
(git reset --hard, then resume) re-paused on the next resume — an endless loop.

- verify.attempt_dirty: true only when a safe_rollback would change anything
  (tracked diff vs baseline + run-created untracked beyond the snapshot).
- _rollback_or_pause no-ops when the tree is already at baseline, so a clean tree
  (including one the operator just reset) proceeds to the re-drive, and an
  escalation that left no changes never triggers a spurious prompt.
- Manual-recovery notice no longer claims the story "failed"; it states the real
  cause (resolved-escalation rebuild vs. stopped attempt) via a new
  StoryTask.rearmed flag set by runs.rearm_escalation.

Tests: attempt_dirty unit coverage, a clean-skip test, a wording test, and an
end-to-end resolve->resume regression proving the loop is gone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…flips blocked status

verify_dev only set task.spec_file after every check passed, so a dev session
that HALTed with a `blocked` spec (the common escalation case) left spec_file
unset. rearm_escalation then had no spec path to flip to `ready-for-dev`, so the
resolved-escalation re-drive re-HALTed on the still-`blocked` frontmatter — a
second loop after the manual-rollback one.

The orchestrator now captures the spec the dev session produced (the synthesized
result names it even on a HALT — devcontract.synthesize_result) at the escalate
and defer decision points, before _escalate/_defer persist state. So re-arm
flips the status and the re-drive proceeds into implementation, and a deferred
story's spec is stashed as intended.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…m (Phase 1)

Introduce the transport-axis seam so a future non-POSIX backend (native-Windows
"psmux") can replace tmux without the adapters shelling out directly. WSL is
Linux, so this is the seam, not a Windows backend; tmux stays the only backend.

- adapters/multiplexer.py: TerminalMultiplexer ABC (full op contract) + cached
  get_multiplexer() factory.
- adapters/tmux_backend.py: TmuxMultiplexer, the only file allowed to invoke
  tmux. Implements the generic adapter's op subset + the parked-window sh -c
  trailer (new_parked_window, ready for Phase 2); remaining ops stubbed.
- adapters/generic_tmux.py -> generic.py: GenericAdapter routes every tmux body
  through self.mux; GenericTmuxAdapter alias kept for import stability.
- cli._make_adapters: share one get_multiplexer() across roles.
- tests: retarget monkeypatches to the backend; add test_multiplexer.py proving
  the adapter drives only the mux with no tmux on PATH.

Behavior-preserving on Linux; runs.py/tui/launch.py/probe.py/tui/data.py
deferred to Phase 2. Full suite green (1028), trunk clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iplexer seam (Phase 2)

Quarantine every tmux invocation behind TmuxMultiplexer: runs.py, tui/launch.py,
probe.py, and tui/data.py now drive sessions/windows/clients through
get_multiplexer() instead of shelling out to tmux directly. After this phase no
file outside adapters/tmux_backend.py issues a raw tmux invocation
(shutil.which("tmux") presence checks aside). Behavior-preserving on Linux.

- tmux_backend: implement the Phase-2 contract surface (kill_session,
  list_sessions, session_options, list_windows, window_alive, select_window,
  set/unset/show_window_option, attach_target_argv, current_*, detach_client,
  switch_client); new_session geometry is now optional (the control session
  takes tmux's default size).
- multiplexer: add MultiplexerError (TmuxError subclasses it, so call sites
  catch the seam type without importing the backend) and unset_window_option.
- runs/launch/probe/data: delegate; the sh -c parked-window trailer moves into
  new_parked_window. tmux backend errors surface as LaunchError at the seam.
- tests: patch the tmux subprocess/which seam on tmux_backend; StubMux gains
  unset_window_option.
- docs: finish the adapter-authoring-guide multiplexer seam section (the
  transport contract a psmux backend author implements).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…latform_util (Phase 3)

Add platform_util.py with portable process primitives so a future
native-Windows multiplexer backend can slot in without re-auditing call
sites: terminate_pid() (SIGTERM on POSIX, taskkill degrade on Windows),
detach_kwargs() (start_new_session vs CREATE_NEW_PROCESS_GROUP), and a
guarded SIGKILL constant. Linux/WSL behavior is unchanged.

Route runs.stop_run's engine-pid kill through terminate_pid; swap the
git-diff --no-index /dev/null literal in verify.py for os.devnull; add a
portability ack to the shutil.which-guarded notify-send in gates.py.

Unity-plugin POSIX-isms (the SIGKILL/start_new_session call sites that
detach_kwargs and SIGKILL are built for) are deferred to Phase 4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uards (Phase 4)

Quarantine the Unity plugin's POSIX-isms behind sys.platform branches while
keeping every Linux/WSL fast path byte-identical (none of the Windows branches
are exercised yet — no native-Windows backend ships this pass).

- unity_teardown.py: keep the /proc scan on Linux; add a psutil-based discovery
  path for non-Linux (lazy import of the optional `windows` extra with a clear
  error if missing). SIGKILL guarded via getattr; SIGTERM/SIGKILL routed through
  terminate/hard-kill helpers (taskkill on Windows); liveness uses
  psutil.pid_exists on Windows where os.kill(pid, 0) is destructive.
- unity_cleanup.py: keep /tmp/<company>/<product> on Linux (Unity's
  temporaryCachePath); derive from tempfile.gettempdir() elsewhere, with a note
  that native Windows Unity uses a different cache root (documented follow-up).
- unity_setup.py: cp -a --reflink fast path kept on POSIX, else
  shutil.copytree(dirs_exist_ok=True); symlink fallback wrapped in
  try/except OSError with a real-dir fallback + log; detach via a
  start_new_session/CREATE_NEW_PROCESS_GROUP helper.
- pyproject.toml: add [project.optional-dependencies] windows = ["psutil>=7.2.2"]
  (current release); relock uv.lock.
- docs: platform-behavior notes in plugin-authoring-guide.md and
  game-engine-plugin-guide.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se 5)

Add tests/test_portability_guard.py — an AST/byte scan of src/automator that
runs in the existing pytest matrix and fails if a hard POSIX dependency sneaks
back in after the multiplexer-seam + portability work (Phases 1-4):

- no ["tmux", ...] argv outside adapters/tmux_backend.py (the quarantine);
- no bare /tmp, /proc, /dev/null literal outside the platform-guarded Unity
  files (subpath- and shell-string-aware so 2>/dev/null and ~/.gemini/tmp/...
  don't false-positive);
- no bare signal.SIGKILL attribute access (must go through the getattr guard);
- no start_new_session=True outside the two detach helpers;
- shell=True only in verify.py and plugins/bus.py.

Each sanctioned exception (outside the wholesale tmux quarantine) now carries a
`portability:` ack on its line, which the guard verifies is present — so
exceptions stay deliberate. Adds those acks to the six existing exception sites.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final documentation sweep for the POSIX-decoupling work. Phases 1-5
(seam, portability fixes, Unity, CI guard) landed; this aligns the
prose with what shipped.

- README: reframe requirements to "tmux backend today, multiplexer-
  pluggable, WSL supported"; fix generic_tmux.py -> generic.py.
- setup-guide: add a platform-prerequisites section (tmux required for
  the current backend, WSL = supported Linux target, native Windows is
  a future backend).
- FEATURES: describe the CLI vs transport axes; tmux is one backend
  behind a pluggable seam.
- tui-guide: note tmux is the only multiplexer backend today; WSL works.
- ROADMAP: mark the multiplexer seam + portability hardening done
  (v0.6.5); native-Windows (psmux) backend is the remaining open item.
- CHANGELOG: terse 0.6.5 entry for the seam + portability hardening.

Verify: 1034 tests pass; full trunk check clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mad-auto-review

bmad-dev-auto now commits its own work each iteration (BMAD-METHOD#2506) and
routes a `done` spec to a fresh step-04 review pass (#2508). The orchestrator's
follow-up review is now just `/bmad-dev-auto <done spec>` — no separate review
skill — and the review session synthesizes its result from the spec like the
dev session does.

- verify.finalize_commit: squash the skill's per-iteration commits + the
  orchestrator's sprint/ledger writes back onto the pre-dev baseline into one
  story commit; pre_commit/post_commit hooks + message template stay authoritative
- _review_and_commit converges on status + followup_review_recommended (was the
  bmad-auto-review `clean` field); review role gets a spec-synthesizing
  GenericDevAdapter
- retire bmad-auto-review: delete the skill, relocate its deferred-work-format.md
  into bmad-auto-sweep, fold the two always-needed review hunters into
  DEV_BASE_SKILLS, scrub all references (install/marketplace/module/docs)
- tests: review_effect/bundle_review_effect emit the synthesized shape;
  finalize_commit + review-adapter-wiring regression tests; sync dogfood
  bmad-dev-auto copies

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion gaps

Close the remaining doc gaps after the dev/review skill retirement and the
POSIX-decoupling sweep:

- Fix the stale FEATURES line that still claimed the orchestrator drives "its
  own forks of the dev/review skills" — it drives upstream bmad-dev-auto
  unmodified and re-invokes it on the done spec for review.
- Expand the adapter authoring guide (retitled "Authoring CLI adapters &
  profiles") with a complete CLIProfile/HookSpec field reference and a
  "Writing a new adapter class" section for non-tmux transports (CodingCLIAdapter
  ABC, the SessionSpec/Handle/Result contracts, required/optional methods).
- Link the three authoring guides from the README Documentation index and
  repoint the "Other coding CLIs" profile-fields claim at the guide as the single
  canonical schema home.
- Correct the setup-guide uninstall block ("three bundled skills", not five) and
  note dev/review are retired.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pure prettier reformat of the new Profile/HookSpec reference tables — no
content change. trunk fmt produced it during release prep; release.py only
stages version files + CHANGELOG, so it lands here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@augmentcode

augmentcode Bot commented Jun 25, 2026

Copy link
Copy Markdown

This pull request is abnormally large and would use a significant amount of tokens to review. If you still wish to review it, comment "augment review" and we will review it.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@pbean, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 26 minutes and 28 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dee0c4c0-4c09-4ad9-b559-b0df44824788

📥 Commits

Reviewing files that changed from the base of the PR and between 38380c5 and 759e9a1.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • docs/adapter-authoring-guide.md
  • src/automator/adapters/generic.py
  • src/automator/adapters/profile.py
  • src/automator/data/profiles/copilot.toml
  • src/automator/platform_util.py
  • tests/test_generic_tmux.py
  • tests/test_platform_util.py
  • tests/test_profile.py

Walkthrough

This PR moves terminal control behind a multiplexer seam, switches orchestration to upstream bmad-dev-auto, adds result synthesis and sprint-state tracking, and updates the bundled skill set, docs, and tests to match the new workflow.

Changes

Upstream bmad-dev-auto Migration

Layer / File(s) Summary
Multiplexer contract and backend
src/automator/adapters/multiplexer.py, src/automator/adapters/tmux_backend.py, src/automator/adapters/generic.py, src/automator/probe.py, src/automator/runs.py, src/automator/tui/data.py, src/automator/tui/launch.py, src/automator/platform_util.py, src/automator/data/plugins/unity/*
TerminalMultiplexer and TmuxMultiplexer centralize tmux-backed session, window, client, and pane operations behind the seam, and the callers move onto those APIs.
Platform portability
src/automator/data/plugins/unity/*, docs/game-engine-plugin-guide.md, docs/plugin-authoring-guide.md, docs/tui-guide.md, docs/ROADMAP.md, tests/test_portability_guard.py
Unity helper scripts and portability docs add guarded Windows behavior for process discovery, detachment, and temp paths, and the guard test blocks stray POSIX-only constructs.
Dev contract and state
src/automator/{devcontract.py,policy.py,model.py,sprintstatus.py,deferredwork.py,install.py,verify.py}, src/automator/data/settings/core.toml, tests/conftest.py, tests/{test_devcontract.py,test_policy.py,test_model.py,test_sprintstatus_advance.py,test_deferredwork.py,test_install.py,test_settings_schema.py,test_verify.py}
The new dev contract, policy, and persisted state fields add auto-dev result synthesis, follow-up review triggers, sprint advancement, legacy deferred-work parsing, upstream base-skill provisioning, and workflow validation.
Verification and orchestration flow
src/automator/{engine.py,sweep.py,escalation.py,runs.py,cli.py,verify.py}, tests/conftest.py, tests/{test_engine.py,test_engine_worktree.py,test_sweep.py,test_verify.py,test_cli.py}, README.md, docs/{FEATURES.md,ROADMAP.md,tui-guide.md}, src/automator/data/profiles/copilot.toml
Verification, engine, and sweep now use auto-dev workflows and follow-up-review recommendations to drive retry, resume, sprint updates, and commit finalization; the review-trigger docs and profile comments match the new semantics.
Release docs and skill catalog
.claude-plugin/marketplace.json, .gitignore, CHANGELOG.md, README.md, docs/{FEATURES.md,setup-guide.md}, module.yaml, pyproject.toml, src/automator/__init__.py, scripts/gen_screenshots.py, src/automator/data/skills/{README.md,bmad-auto-resolve/SKILL.md,bmad-auto-setup/SKILL.md,bmad-auto-setup/assets/module.yaml,bmad-auto-sweep/deferred-work-format.md,bmad-auto-sweep/migration-mode.md}
Version 0.7.0 metadata, changelog, setup docs, and skill catalogs are rewritten to describe the new bundled bmad-auto-resolve/bmad-auto-sweep set.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

  • bmad-code-org/bmad-auto#3: This PR also changes the same engine recovery and rearm path in src/automator/engine.py.
  • bmad-code-org/bmad-auto#6: This PR is tied to the earlier probe/session control work that is now routed through the multiplexer seam.

Poem

I'm a bunny with a relay and a grin,
hopping through bmad-dev-auto from end to begin.
The ledger is tidy, the windows align,
review comes back round with a bright little sign.
🐇🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title matches the release-focused changeset and clearly identifies the version being shipped.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch release/0.7.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/setup-guide.md (1)

36-42: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Update the installer count.

The BMAD-method installer no longer copies five bundled skill dirs; the wheel now ships only bmad-auto-resolve, bmad-auto-sweep, and bmad-auto-setup (src/automator/install.py). Please fix this count so the setup docs don't tell users to expect retired dev/review bundles.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/setup-guide.md` around lines 36 - 42, The setup guide still says the
BMAD-method installer copies five bundled skill directories, but the installer
behavior now only includes the remaining skill bundles. Update the installer
count and wording in the setup docs to match the current bundle set referenced
by src/automator/install.py, and remove any mention of retired dev/review
packages so the description stays aligned with what the installer actually
copies.
src/automator/runs.py (1)

233-241: 🩺 Stability & Availability | 🟡 Minor

Refactor liveness probe to use a portable helper.

The wait loop in src/automator/runs.py currently uses os.kill(pid, 0) to check process existence. While signal 0 is non-destructive on Windows, os.kill semantics for existence checks vary by OS and Python version (raising PermissionError or ProcessLookupError). Encapsulate this logic in platform_util.pid_alive() to ensure consistent, portable behavior across the codebase.

Current implementation
# src/automator/runs.py:239-242
        while time.monotonic() < deadline:
            try:
                os.kill(pid, 0)
            except OSError:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/automator/runs.py` around lines 233 - 241, The liveness check in
runs.wait-for-termination still uses os.kill(pid, 0) directly, which should be
replaced with the shared portable helper. Update the wait loop in the process
termination flow to call platform_util.pid_alive() instead of probing with
os.kill, and keep the existing fallback behavior when the helper reports the pid
is gone. Use the existing terminate_pid and _STOP_WAIT_S logic as-is, only
swapping the existence check to the helper so process liveness is handled
consistently across platforms.
🧹 Nitpick comments (2)
tests/test_devcontract.py (1)

145-170: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a direct fallback-artifact test.

find_result_artifact() has special behavior for bmad-dev-auto-result-*.md files without an Auto Run Result heading; the unit tests currently only cover heading-based discovery and ignoring non-heading files.

Suggested test
+def test_find_artifact_accepts_no_spec_fallback_prefix(tmp_path):
+    fallback = tmp_path / "bmad-dev-auto-result-unclear-1234.md"
+    fallback.write_text(
+        "---\nstatus: blocked\n---\n\n# BMad Dev Auto Result\n\n"
+        "Status: blocked\nBlocking condition: unclear intent\n",
+        encoding="utf-8",
+    )
+    assert devcontract.find_result_artifact(tmp_path, since_ns=0) == fallback
+
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_devcontract.py` around lines 145 - 170, Add a focused unit test
for find_result_artifact that covers the fallback bmad-dev-auto-result-*.md
path: create a matching artifact file without the Auto Run Result heading, set
its mtime so it is the newest candidate, and assert
devcontract.find_result_artifact returns it. Place the new test alongside the
existing test_find_artifact_* cases and use the same tmp_path/_spec patterns so
the fallback behavior in find_result_artifact is exercised directly.
tests/test_policy.py (1)

18-18: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Optional: silence RUF043 on pytest.raises(match=...).

The match= strings contain unescaped regex metacharacters (.), which Ruff flags (RUF043). The current patterns still match (. is any-char), so this is cosmetic — escape the dot or mark the strings raw if you want the lint clean, e.g. match=r"dev\.skill".

Also applies to: 21-21, 36-36

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_policy.py` at line 18, The pytest.raises match patterns in
test_policy.py use unescaped dots, which triggers RUF043 even though the
assertions still work; update the affected match strings in the policy tests to
use escaped dots or raw regex strings, and locate the changes in the
pytest.raises calls that reference policy.PolicyError and the dev.skill pattern.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Around line 18-23: The CHANGELOG entry for the review flow is inconsistent
about the control path, so reconcile the wording in the bullet that mentions
`followup_review_recommended` and the later review-trigger section to describe a
single mechanism. Update the release note text around `bmad-dev-auto`,
`review.enabled`, and `review.trigger` so it matches the actual behavior and
does not claim both that the flag is no longer used and that it still gates the
flow.

In `@docs/adapter-authoring-guide.md`:
- Around line 71-72: The documentation text contains a typo where “aliveness” is
used instead of the actual contract method name window_alive, which can confuse
adapter authors. Update the wording in the surrounding explanation to refer
consistently to window_alive and keep the liveness description aligned with the
list-windows membership behavior mentioned in this section.

In `@docs/FEATURES.md`:
- Around line 50-54: The capability matrix entry for “Fresh-context adversarial
review” is inconsistent with the rewritten review section. Update the matrix
text in FEATURES.md so it matches the current review flow described by the
review-related bullets and the follow-up review behavior, keeping the layer
count and wording aligned with the Blind Hunter / Edge Case Hunter model used
elsewhere in the doc.

In `@docs/ROADMAP.md`:
- Line 12: The status line in ROADMAP is attributing the multiplexer seam and
portability hardening to the wrong release, so update the foundation version
stamp to match the release that actually introduces those changes. Adjust the
text in the ROADMAP entry so the provenance note aligns with the 0.7.0 release,
and verify the wording around the seam/hardening mention stays consistent with
the version naming used elsewhere.

In `@pyproject.toml`:
- Around line 21-24: The non-Linux teardown path in unity_teardown.py uses
psutil on macOS as well as Windows, but pyproject.toml only exposes it under the
windows extra, so update the dependency extras to a broader non-Linux scope (or
add it to both windows and macos) and make sure the install guidance matches
that new extra. Also adjust the fallback/import error messaging in
unity_teardown.py so the suggested pip install command reflects the actual extra
name used for _lingering_pids_psutil and the non-Linux branch.

In `@src/automator/adapters/generic.py`:
- Around line 291-298: The GenericDevAdapter._result_json method is ignoring the
wait flag and returning too early when the result artifact is not yet flushed to
disk. Update _result_json to mirror the polling logic used in _read_result: when
wait=True, keep retrying devcontract.find_result_artifact with a short sleep
loop until the artifact appears or the deadline is reached, then synthesize the
result; when wait=False, preserve the current immediate return behavior. Keep
the change localized to _result_json and use the existing handle.launched_ns,
spec, and story_key flow.

In `@src/automator/data/plugins/unity/unity_setup.py`:
- Around line 207-222: The fast-copy branch in _copy_library currently treats
any non-Windows system with cp installed as safe for _cp_library_tree, but
macOS’s BSD cp does not support --reflink. Update the platform check around
_copy_library and _cp_library_tree so the cp reflink path is only used on Linux
or when GNU coreutils is explicitly available; otherwise fall back to
_copytree_library or another macOS-safe clone/deep-copy path.

In `@src/automator/data/skills/bmad-auto-setup/SKILL.md`:
- Around line 74-76: Clarify the handoff description in SKILL.md where bmad-auto
orchestrator and bmad-dev-auto are mentioned: the current wording implies
bmad-dev-auto itself performs the follow-up review pass, but the new flow
re-invokes it on the done spec as a separate pass. Update the sentence around
the bmad-auto orchestrator and bmad-dev-auto references so it explicitly says
the orchestrator starts a second bmad-dev-auto run for follow-up review rather
than attributing that pass to the upstream primitive itself.

In `@src/automator/platform_util.py`:
- Around line 31-33: The Windows branch in platform_util’s process-termination
logic resolves taskkill by name, which can be hijacked via PATH or the working
directory. Update the subprocess.run call in the Windows-specific kill path to
invoke the System32 taskkill binary explicitly, keeping the existing win32 check
and taskkill behavior otherwise unchanged.

In `@src/automator/probe.py`:
- Around line 416-417: The probe window handoff in AutomatorProbe still lets
MultiplexerError escape from mux.pipe_pane, which breaks the intended race
tolerance. Update the code around the pipe_pane call in AutomatorProbe to catch
backend failures and fall back to the normal log-less probing path when a window
exits instantly, using the mux.pipe_pane and related probe-window logic as the
fix point.

In `@src/automator/sprintstatus.py`:
- Around line 200-217: The status update flow in story_status()/advance() can
report success even when _set_mapping_value() did not actually rewrite the story
entry, especially for keys that match STORY_RE but are not writable. Make
advance() track whether the specific story_key mapping was changed first, and
only return target when that rewrite succeeded; otherwise preserve the original
status or signal no-op. Keep the epic and last_updated updates gated by the same
changed state in the story-status path so the function does not claim success
unless the story line was actually updated.

In `@src/automator/tui/data.py`:
- Around line 81-90: The `_session_liveness` helper only treats `OSError` as a
failure path, so confirm whether `mux.has_session(session_name(run_id))` can
raise `MultiplexerError` and, if so, catch that exception (or its base) as well
so transient backend failures still return "unknown" instead of escaping through
`liveness()` and `discover_runs()`. Also verify `session_name` is imported and
available in this module since it is used by `_session_liveness`.

In `@src/automator/verify.py`:
- Around line 802-814: The squash-commit flow in verify.py should preserve the
original branch state if the commit step fails: in the helper that runs git
add/reset/commit, capture the current HEAD before the soft reset, and if the
`git commit` call in that sequence fails, restore HEAD back to that saved ref
before raising `GitError`. Use the existing `_git` helper and the commit path
around the `git reset --soft` and `git commit` calls to make the rollback happen
consistently.

---

Outside diff comments:
In `@docs/setup-guide.md`:
- Around line 36-42: The setup guide still says the BMAD-method installer copies
five bundled skill directories, but the installer behavior now only includes the
remaining skill bundles. Update the installer count and wording in the setup
docs to match the current bundle set referenced by src/automator/install.py, and
remove any mention of retired dev/review packages so the description stays
aligned with what the installer actually copies.

In `@src/automator/runs.py`:
- Around line 233-241: The liveness check in runs.wait-for-termination still
uses os.kill(pid, 0) directly, which should be replaced with the shared portable
helper. Update the wait loop in the process termination flow to call
platform_util.pid_alive() instead of probing with os.kill, and keep the existing
fallback behavior when the helper reports the pid is gone. Use the existing
terminate_pid and _STOP_WAIT_S logic as-is, only swapping the existence check to
the helper so process liveness is handled consistently across platforms.

---

Nitpick comments:
In `@tests/test_devcontract.py`:
- Around line 145-170: Add a focused unit test for find_result_artifact that
covers the fallback bmad-dev-auto-result-*.md path: create a matching artifact
file without the Auto Run Result heading, set its mtime so it is the newest
candidate, and assert devcontract.find_result_artifact returns it. Place the new
test alongside the existing test_find_artifact_* cases and use the same
tmp_path/_spec patterns so the fallback behavior in find_result_artifact is
exercised directly.

In `@tests/test_policy.py`:
- Line 18: The pytest.raises match patterns in test_policy.py use unescaped
dots, which triggers RUF043 even though the assertions still work; update the
affected match strings in the policy tests to use escaped dots or raw regex
strings, and locate the changes in the pytest.raises calls that reference
policy.PolicyError and the dev.skill pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5b5141ba-b10b-498f-bf9d-5e6114b9ffce

📥 Commits

Reviewing files that changed from the base of the PR and between d90628c and 67826ba.

⛔ Files ignored due to path filters (12)
  • docs/images/dashboard.png is excluded by !**/*.png
  • docs/images/dashboard.svg is excluded by !**/*.svg
  • docs/images/demo.gif is excluded by !**/*.gif
  • docs/images/settings-scm.png is excluded by !**/*.png
  • docs/images/settings-scm.svg is excluded by !**/*.svg
  • docs/images/settings.svg is excluded by !**/*.svg
  • docs/images/start-run-modal.png is excluded by !**/*.png
  • docs/images/start-run-modal.svg is excluded by !**/*.svg
  • docs/images/sweep-decision.png is excluded by !**/*.png
  • docs/images/sweep-decision.svg is excluded by !**/*.svg
  • src/automator/data/skills/bmad-auto-setup/assets/module-help.csv is excluded by !**/*.csv
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (88)
  • .claude-plugin/marketplace.json
  • .gitignore
  • CHANGELOG.md
  • README.md
  • docs/FEATURES.md
  • docs/ROADMAP.md
  • docs/adapter-authoring-guide.md
  • docs/game-engine-plugin-guide.md
  • docs/plugin-authoring-guide.md
  • docs/setup-guide.md
  • docs/tui-guide.md
  • module.yaml
  • pyproject.toml
  • scripts/gen_screenshots.py
  • src/automator/__init__.py
  • src/automator/adapters/generic.py
  • src/automator/adapters/multiplexer.py
  • src/automator/adapters/tmux_backend.py
  • src/automator/cli.py
  • src/automator/data/plugins/unity/unity_cleanup.py
  • src/automator/data/plugins/unity/unity_setup.py
  • src/automator/data/plugins/unity/unity_teardown.py
  • src/automator/data/profiles/copilot.toml
  • src/automator/data/settings/core.toml
  • src/automator/data/skills/README.md
  • src/automator/data/skills/bmad-auto-dev/SKILL.md
  • src/automator/data/skills/bmad-auto-dev/automation-mode.md
  • src/automator/data/skills/bmad-auto-dev/compile-epic-context.md
  • src/automator/data/skills/bmad-auto-dev/customize.toml
  • src/automator/data/skills/bmad-auto-dev/spec-template.md
  • src/automator/data/skills/bmad-auto-dev/step-01-clarify-and-route.md
  • src/automator/data/skills/bmad-auto-dev/step-02-plan.md
  • src/automator/data/skills/bmad-auto-dev/step-03-implement.md
  • src/automator/data/skills/bmad-auto-dev/step-04-review.md
  • src/automator/data/skills/bmad-auto-dev/step-05-present.md
  • src/automator/data/skills/bmad-auto-dev/step-auto-finalize.md
  • src/automator/data/skills/bmad-auto-dev/step-oneshot.md
  • src/automator/data/skills/bmad-auto-dev/sync-sprint-status.md
  • src/automator/data/skills/bmad-auto-resolve/SKILL.md
  • src/automator/data/skills/bmad-auto-review/SKILL.md
  • src/automator/data/skills/bmad-auto-review/automation-mode.md
  • src/automator/data/skills/bmad-auto-review/customize.toml
  • src/automator/data/skills/bmad-auto-review/steps/step-01-gather-context.md
  • src/automator/data/skills/bmad-auto-review/steps/step-02-review.md
  • src/automator/data/skills/bmad-auto-review/steps/step-03-triage.md
  • src/automator/data/skills/bmad-auto-review/steps/step-04-present.md
  • src/automator/data/skills/bmad-auto-setup/SKILL.md
  • src/automator/data/skills/bmad-auto-setup/assets/module.yaml
  • src/automator/data/skills/bmad-auto-sweep/deferred-work-format.md
  • src/automator/data/skills/bmad-auto-sweep/migration-mode.md
  • src/automator/deferredwork.py
  • src/automator/devcontract.py
  • src/automator/engine.py
  • src/automator/escalation.py
  • src/automator/gates.py
  • src/automator/install.py
  • src/automator/model.py
  • src/automator/platform_util.py
  • src/automator/plugins/bus.py
  • src/automator/policy.py
  • src/automator/probe.py
  • src/automator/runs.py
  • src/automator/sprintstatus.py
  • src/automator/sweep.py
  • src/automator/tui/data.py
  • src/automator/tui/launch.py
  • src/automator/verify.py
  • src/automator/workspace.py
  • tests/conftest.py
  • tests/test_cli.py
  • tests/test_deferredwork.py
  • tests/test_devcontract.py
  • tests/test_engine.py
  • tests/test_engine_worktree.py
  • tests/test_generic_tmux.py
  • tests/test_install.py
  • tests/test_model.py
  • tests/test_multiplexer.py
  • tests/test_policy.py
  • tests/test_portability_guard.py
  • tests/test_profile.py
  • tests/test_runs.py
  • tests/test_settings_schema.py
  • tests/test_sprintstatus_advance.py
  • tests/test_sweep.py
  • tests/test_tui_data.py
  • tests/test_tui_launch.py
  • tests/test_verify.py
💤 Files with no reviewable changes (20)
  • src/automator/data/skills/bmad-auto-dev/step-03-implement.md
  • src/automator/data/skills/bmad-auto-review/steps/step-03-triage.md
  • src/automator/data/skills/bmad-auto-review/steps/step-02-review.md
  • src/automator/data/skills/bmad-auto-dev/SKILL.md
  • src/automator/data/skills/bmad-auto-dev/step-auto-finalize.md
  • src/automator/data/skills/bmad-auto-dev/step-01-clarify-and-route.md
  • src/automator/data/skills/bmad-auto-dev/spec-template.md
  • src/automator/data/skills/bmad-auto-dev/compile-epic-context.md
  • src/automator/data/skills/bmad-auto-review/automation-mode.md
  • src/automator/data/skills/bmad-auto-dev/step-04-review.md
  • src/automator/data/skills/bmad-auto-review/SKILL.md
  • src/automator/data/skills/bmad-auto-dev/step-02-plan.md
  • src/automator/data/skills/bmad-auto-review/customize.toml
  • src/automator/data/skills/bmad-auto-dev/step-oneshot.md
  • src/automator/data/skills/bmad-auto-dev/customize.toml
  • src/automator/data/skills/bmad-auto-dev/sync-sprint-status.md
  • src/automator/data/skills/bmad-auto-dev/step-05-present.md
  • src/automator/data/skills/bmad-auto-review/steps/step-04-present.md
  • src/automator/data/skills/bmad-auto-dev/automation-mode.md
  • src/automator/data/skills/bmad-auto-review/steps/step-01-gather-context.md

Comment thread CHANGELOG.md Outdated
Comment thread docs/adapter-authoring-guide.md Outdated
Comment thread docs/FEATURES.md Outdated
Comment thread docs/ROADMAP.md Outdated
Comment thread pyproject.toml Outdated
Comment thread src/automator/platform_util.py Outdated
Comment thread src/automator/probe.py
Comment on lines 416 to +417
# pipe-pane may race a window that dies instantly; tolerate failure.
self._tmux("pipe-pane", "-t", window_id, "-o", f"cat >> {shlex.quote(str(log_file))}")
self.mux.pipe_pane(window_id, log_file)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Catch pipe_pane backend failures.

Line 417 can still raise MultiplexerError, despite the comment saying this race is tolerated. Wrap it so instantly-exiting probe windows degrade to normal log-less probing instead of crashing.

Proposed fix
         except MultiplexerError:
             return None
         # pipe-pane may race a window that dies instantly; tolerate failure.
-        self.mux.pipe_pane(window_id, log_file)
+        try:
+            self.mux.pipe_pane(window_id, log_file)
+        except MultiplexerError:
+            pass
         return window_id
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# pipe-pane may race a window that dies instantly; tolerate failure.
self._tmux("pipe-pane", "-t", window_id, "-o", f"cat >> {shlex.quote(str(log_file))}")
self.mux.pipe_pane(window_id, log_file)
# pipe-pane may race a window that dies instantly; tolerate failure.
try:
self.mux.pipe_pane(window_id, log_file)
except MultiplexerError:
pass
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/automator/probe.py` around lines 416 - 417, The probe window handoff in
AutomatorProbe still lets MultiplexerError escape from mux.pipe_pane, which
breaks the intended race tolerance. Update the code around the pipe_pane call in
AutomatorProbe to catch backend failures and fall back to the normal log-less
probing path when a window exits instantly, using the mux.pipe_pane and related
probe-window logic as the fix point.

Comment thread src/automator/sprintstatus.py
Comment thread src/automator/tui/data.py
Comment thread src/automator/verify.py
Validated all 13 inline findings against the tree; 12 valid, 1 skipped
(probe pipe_pane — the seam contract + tmux backend already tolerate the
window-death race, so re-wrapping is dead code).

Correctness / stability:
- generic: GenericDevAdapter._result_json honors `wait` — polls for the
  spec artifact instead of reading once, so a slow flush isn't a false stall.
- sprintstatus: advance() returns the unchanged status (not target) when the
  story line couldn't be rewritten (quoted/block keys YAML resolves but the
  line-regex can't), and leaves the file untouched.
- tui/data + tmux_backend: liveness degrades to "unknown" on a backend
  failure — has_session wraps timeout/OSError into TmuxError and
  _session_liveness catches MultiplexerError — instead of crashing the poll.
- verify: finalize_commit restores HEAD if the squash commit fails, so a
  rejecting hook can't drop the skill commit chain.

Security:
- Resolve taskkill from %SystemRoot%\System32 (platform_util + unity_teardown
  x2) so PATH/CWD can't hijack it.

macOS portability:
- unity_setup: gate the cp --reflink fast path to Linux (BSD cp rejects
  --reflink, which silently forced cold Unity imports on macOS).
- Rename the `windows` extra to `non-linux` (psutil is needed on macOS too);
  update the error message, docs, and uv.lock.

Docs:
- Reconcile the CHANGELOG review-trigger wording; fix the aliveness ->
  window_alive typo; FEATURES matrix 3 -> 2 review layers; correct three
  stale 0.6.5 -> 0.7.0 version stamps; clarify the follow-up-review handoff
  in bmad-auto-setup SKILL.md.

Adds targeted tests for each behavioral fix. Full suite green (1042);
trunk clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pbean

pbean commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed CodeRabbit review (commit 11411ff)

Validated all 13 inline findings against the tree — 12 addressed, 1 skipped. Full suite green (1042 tests), trunk check clean.

Fixed

Correctness / stability

  • generic.pyGenericDevAdapter._result_json now honors wait: it polls for the spec artifact (mirroring the base _await_result) instead of reading once, so a slow disk flush no longer reads as a stall.
  • sprintstatus.pyadvance() returns the current status (not target) when the story line can't be rewritten (quoted/block keys that YAML resolves but the line-regex can't), leaving the file untouched.
  • tui/data.py + tmux_backend.py — liveness degrades to "unknown" on a backend failure instead of crashing the TUI poll: has_session wraps timeout/OSError into TmuxError, and _session_liveness now catches MultiplexerError.
  • verify.pyfinalize_commit captures HEAD and restores it if the squash git commit fails (e.g. a rejecting pre-commit hook), so the skill commit chain isn't dropped to baseline.

Security

  • Resolve taskkill from %SystemRoot%\System32 at all three sites (platform_util.py + the two unity_teardown.py twins) so a PATH/CWD plant can't hijack it.

macOS portability

  • unity_setup.py — gate the cp --reflink fast path to Linux only; BSD/macOS cp rejects --reflink, which was silently forcing a cold Unity import on macOS. Off Linux it now deep-copies via shutil.
  • Rename the windows optional-dependency extra → non-linux (psutil is needed on macOS too); updated the error message, docs, and uv.lock.

Docs

  • Reconcile the CHANGELOG review-trigger wording (dropped the contradictory "review.trigger unchanged").
  • alivenesswindow_alive (adapter-authoring-guide).
  • FEATURES capability matrix: review "3 parallel layers" → "2 parallel layers (Blind Hunter / Edge Case Hunter)".
  • Three stale 0.6.5 version stamps → 0.7.0 (ROADMAP, FEATURES, setup-guide) — no 0.6.5 ever shipped; the seam/hardening lands here.
  • Clarify the follow-up-review handoff in bmad-auto-setup SKILL.md (the orchestrator re-invokes bmad-dev-auto; the primitive doesn't run the review pass itself).

Each behavioral fix has a targeted test.

Skipped (1)

  • probe.py:417 — "pipe_pane can still raise MultiplexerError": the seam contract documents pipe_pane as tolerant of the window having already died, and TmuxMultiplexer.pipe_pane already catches TmuxError and returns — it never raises. The premise contradicts the contract and the only backend, so wrapping it again would be dead code.

The docstring-coverage pre-merge warning is left as-is — an advisory threshold, not a correctness issue.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/automator/data/skills/bmad-auto-setup/SKILL.md`:
- Line 74: Update the setup description in SKILL.md so it no longer says the
orchestrator spawns “Claude Code sessions”; in the bmad-auto orchestrator
wording, replace that with adapter-neutral language such as “coding CLI
sessions” or “adapter sessions” to cover claude, codex, and gemini consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6f61f86c-fcc6-476b-99f6-e83a88209f5e

📥 Commits

Reviewing files that changed from the base of the PR and between 67826ba and 11411ff.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (20)
  • CHANGELOG.md
  • docs/FEATURES.md
  • docs/ROADMAP.md
  • docs/adapter-authoring-guide.md
  • docs/game-engine-plugin-guide.md
  • docs/setup-guide.md
  • pyproject.toml
  • src/automator/adapters/generic.py
  • src/automator/adapters/tmux_backend.py
  • src/automator/data/plugins/unity/unity_setup.py
  • src/automator/data/plugins/unity/unity_teardown.py
  • src/automator/data/skills/bmad-auto-setup/SKILL.md
  • src/automator/platform_util.py
  • src/automator/sprintstatus.py
  • src/automator/tui/data.py
  • src/automator/verify.py
  • tests/test_generic_tmux.py
  • tests/test_sprintstatus_advance.py
  • tests/test_tui_data.py
  • tests/test_verify.py
✅ Files skipped from review due to trivial changes (4)
  • CHANGELOG.md
  • docs/ROADMAP.md
  • docs/game-engine-plugin-guide.md
  • docs/adapter-authoring-guide.md
🚧 Files skipped from review as they are similar to previous changes (11)
  • tests/test_tui_data.py
  • src/automator/sprintstatus.py
  • tests/test_sprintstatus_advance.py
  • src/automator/tui/data.py
  • src/automator/data/plugins/unity/unity_setup.py
  • docs/setup-guide.md
  • tests/test_generic_tmux.py
  • docs/FEATURES.md
  • src/automator/adapters/generic.py
  • tests/test_verify.py
  • src/automator/verify.py

Comment thread src/automator/data/skills/bmad-auto-setup/SKILL.md Outdated
@pbean

pbean commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

augment review

@augmentcode

augmentcode Bot commented Jun 25, 2026

Copy link
Copy Markdown
🤖 Augment PR Summary

Summary: Release 0.7.0 — a major reshaping of how the orchestrator drives development and review.

Key changes:

  • Retires the bundled bmad-auto-dev and bmad-auto-review skills; the orchestrator now drives the upstream bmad-dev-auto skill unmodified as its sole dev primitive, with a new devcontract.py module translating its on-disk spec into the legacy result.json contract
  • Review is now a re-invocation of bmad-dev-auto on the done spec (a fresh step-04 pass), gated by a new review.trigger knob (recommended vs always)
  • Introduces a pluggable TerminalMultiplexer ABC (with TmuxMultiplexer as the sole backend) — all tmux subprocess calls quarantined into tmux_backend.py, enabling a future native-Windows backend
  • POSIX portability hardening: platform-guarded helpers for SIGKILL/taskkill, detach kwargs, /proc/psutil fallbacks in Unity plugin, plus a CI portability guard (test_portability_guard.py)
  • Skill-committed squash: verify.finalize_commit collapses the skill's per-iteration commits plus orchestrator bookkeeping into one commit per story
  • Bug fixes for escalation re-drive loops (clean-tree no-op, recovery wording, stale blocked spec recording)

Technical notes: The bmad-dev-auto skill writes no result.json — GenericDevAdapter synthesizes it from the spec's frontmatter + ## Auto Run Result section. Sprint-status is now orchestrator-written (sprintstatus.advance), not skill-written. Non-bundled upstream skills (dev + review hunters) are preflight-checked via install.missing_base_skills.

🤖 Was this summary useful? React with 👍 or 👎

@augmentcode augmentcode Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread README.md Outdated
**Gate modes:** `none` runs everything unattended; `per-epic` (default) pauses at epic boundaries; `per-story-spec-approval` pauses after each spec is written so you approve it before implementation is reviewed.

**Review:** `[review].enabled = false` drops the separate fresh-context review session; the dev pass instead runs `bmad-quick-dev`'s own internal triple-review (Blind Hunter / Edge Case Hunter / Acceptance Auditor) and finalizes the story straight to `done` — one session per story instead of two, verify commands still gating the commit. Governs deferred-work sweeps too.
**Review:** `[review].enabled = false` drops the separate fresh-context review session; the dev pass instead runs `bmad-quick-dev`'s own internal triple-review (Blind Hunter / Edge Case Hunter / Acceptance Auditor) and finalizes the story straight to `done` — one session per story instead of two, verify commands still gating the commit. Governs deferred-work sweeps too. When review is enabled, `[review].trigger` decides _when_ that separate pass runs: `recommended` (default) only when the `bmad-dev-auto` session flags `followup_review_recommended` — it already triple-reviews inline and recommends an independent pass only when its changes were significant; `always` runs it every story. The follow-up loop is bounded by `limits.max_review_cycles` (default 3), which caps oscillation.

@augmentcode augmentcode Bot Jun 25, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modified line still references bmad-quick-dev and the three-layer review ("Blind Hunter / Edge Case Hunter / Acceptance Auditor"), but the PR retires bmad-quick-dev in favor of bmad-dev-auto and drops the Acceptance Auditor layer — the rest of the PR (FEATURES.md, CHANGELOG, SKILL.md) consistently references two layers (Blind Hunter + Edge Case Hunter). Consider updating this sentence to match.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Reconcile docs with the 0.7.0 behavior change and close a real Windows
portability bug surfaced by the automated reviews:

- README: review fallback runs bmad-dev-auto's two-layer review (Blind
  Hunter / Edge Case Hunter), not bmad-quick-dev's triple-review.
- setup SKILL.md: adapter-neutral "coding CLI sessions" (claude/codex/gemini).
- setup-guide: installer copies three skill dirs, not five.
- platform_util.pid_alive(): os.kill(pid, 0) is destructive on Windows
  (TerminateProcess); route the three unguarded liveness checks (runs.py x2,
  tui/data.py) through it, and extend test_portability_guard to police bare
  os.kill(_, 0) so it can't regress.
- test_devcontract: cover the no-spec fallback-artifact (bmad-dev-auto-result-*)
  discovery path.

Skipped as invalid: probe.py pipe_pane (already tolerant per the multiplexer
contract); RUF043 nitpick (not selected, lint green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pbean

pbean commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the remaining CodeRabbit + Augment findings in 38380c5 (prior batch was 11411ff):

Fixed

  • README.md — review fallback now describes bmad-dev-auto's two-layer review (Blind Hunter / Edge Case Hunter), dropping the stale bmad-quick-dev triple-review wording. (Augment)
  • bmad-auto-setup/SKILL.md — adapter-neutral "coding CLI sessions through the selected adapter(s)" instead of "Claude Code sessions". (CodeRabbit)
  • docs/setup-guide.md — installer copies three skill dirs, not five. (CodeRabbit, outside-diff)
  • os.kill(pid, 0) liveness checks — this is destructive on Windows (maps to TerminateProcess). Added platform_util.pid_alive(), routed all three unguarded sites (runs.py ×2, tui/data.py) through it, and extended test_portability_guard.py to police bare os.kill(_, 0) so it can't regress. (CodeRabbit flagged runs.py:233-241; same bug existed at two more sites.)
  • test_devcontract.py — added coverage for the no-spec fallback-artifact (bmad-dev-auto-result-*) discovery path. (CodeRabbit nitpick)

Skipped (invalid / no-op)

  • probe.py:417 pipe_pane — the backend already swallows TmuxError (a MultiplexerError subclass) and the multiplexer contract mandates tolerance, so the proposed try/except is dead code.
  • test_policy.py:18 RUF043 — not selected in the ruff/trunk config; lint is green.

Full pytest and trunk check pass.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/automator/platform_util.py`:
- Around line 47-50: Reject non-positive PIDs before any process probe or
termination in pid_alive() and terminate_pid() within platform_util.py. Add an
early guard for pid <= 0 so os.kill() and any platform-specific termination path
are never reached with 0 or negative values, and make both methods fail fast for
invalid PIDs. Use the existing pid_alive, terminate_pid, and os.kill call sites
as the places to update.

In `@src/automator/runs.py`:
- Around line 47-49: The current write_pid() implementation stores only
os.getpid(), which can be reused by another process and cause engine_alive() and
stop_run() to act on the wrong process. Update the PID_FILE format to include a
unique engine identity token in addition to the numeric PID, and have
write_pid(), engine_alive(), and stop_run() read and verify that token before
treating the run as active or sending a stop signal. Use a stable per-run marker
such as a UUID or process start timestamp, and reject stale PID files whose
identity does not match the current engine instance.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8934d8a8-e1c5-4e64-a15f-52a3955d5661

📥 Commits

Reviewing files that changed from the base of the PR and between 11411ff and 38380c5.

📒 Files selected for processing (9)
  • README.md
  • docs/setup-guide.md
  • src/automator/data/plugins/unity/unity_teardown.py
  • src/automator/data/skills/bmad-auto-setup/SKILL.md
  • src/automator/platform_util.py
  • src/automator/runs.py
  • src/automator/tui/data.py
  • tests/test_devcontract.py
  • tests/test_portability_guard.py
✅ Files skipped from review due to trivial changes (2)
  • docs/setup-guide.md
  • README.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/test_devcontract.py
  • tests/test_portability_guard.py
  • src/automator/data/plugins/unity/unity_teardown.py

Comment thread src/automator/platform_util.py
Comment thread src/automator/runs.py
Comment on lines +47 to 49
"""Record the engine process pid. Never deleted: a stale pid that
``pid_alive`` reports as gone is the signal that a run was interrupted."""
(run_dir / PID_FILE).write_text(str(os.getpid()), encoding="utf-8")

@coderabbitai coderabbitai Bot Jun 25, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check whether engine.pid has any companion identity/start-time/heartbeat validation.
rg -n -C3 'engine\.pid|PID_FILE|write_pid|read_pid|pid_alive|terminate_pid' src tests

Repository: bmad-code-org/bmad-auto

Length of output: 20505


🏁 Script executed:

#!/bin/bash
# Verify if platform_util.pid_alive or terminate_pid has any token verification
cat src/automator/platform_util.py

Repository: bmad-code-org/bmad-auto

Length of output: 4165


🏁 Script executed:

#!/bin/bash
# Suggestion: Modify write_pid to store (pid + start_time) or UUID
# Modify read_pid to return the tuple/string
# Modify engine_alive/stop_run to verify the identity before acting
# This prevents false positives from PID reuse

# Example improvement in src/automator/runs.py:
# 1. On write, include `time.time()` or `uuid4()`.
# 2. On read, parse both.
# 3. Before `terminate_pid`, verify the running process's start time or check a mapped session.

Repository: bmad-code-org/bmad-auto

Length of output: 161


Don't treat a bare PID as engine identity.

write_pid() currently stores only the numeric ID. Because operating systems recycle process IDs, a terminated engine's PID can be reassigned to an unrelated process. If this happens, engine_alive() will incorrectly report the run as active, and stop_run() may terminate a process that has nothing to do with the project.

Persist a unique marker (such as a random UUID, the process start time, or a combined tuple) inside engine.pid. Before relying on the engine_alive check or executing stop_run(), read and verify that this token still matches the current engine's identity rather than trusting the numeric ID alone.

See current snippet
def write_pid(run_dir: Path) -> None:
    """Record the engine process pid. Never deleted: a stale pid that
    ``pid_alive`` reports as gone is the signal that a run was interrupted."""
    (run_dir / PID_FILE).write_text(str(os.getpid()), encoding="utf-8")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/automator/runs.py` around lines 47 - 49, The current write_pid()
implementation stores only os.getpid(), which can be reused by another process
and cause engine_alive() and stop_run() to act on the wrong process. Update the
PID_FILE format to include a unique engine identity token in addition to the
numeric PID, and have write_pid(), engine_alive(), and stop_run() read and
verify that token before treating the run as active or sending a stop signal.
Use a stable per-run marker such as a UUID or process start timestamp, and
reject stale PID files whose identity does not match the current engine
instance.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern. The pid <= 0 aspect is fixed in 6492cdf. The full process-identity verification (PID-reuse defense) is a pre-existing, multi-file change that re-introduces an acked /proc read and changes read_pid's signature across engine_alive/stop_run/tui + migration tolerance — out of scope for this release PR. Tracked as #12.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbean, thanks for the clarification. Agreed — the broader PID-reuse defense is a legitimate follow-up, but it sounds too cross-cutting for this release PR given the engine.pid format, read_pid() signature, migration tolerance, and TUI touchpoints. Good call to keep the narrower pid <= 0 fix here and track the full identity hardening separately in #12.

I won’t push for that larger change on this PR.

CodeRabbit (Major) on #11: a corrupt engine.pid read as 0 or negative would
make os.kill target a process group — for 0, the orchestrator's own group — so
terminate_pid could signal unintended processes. Guard pid <= 0 at the top of
both pid_alive() (-> False) and terminate_pid() (-> no-op) before os.kill is
reached. Adds tests/test_platform_util.py asserting os.kill is never called for
non-positive pids.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pbean

pbean commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Went through the latest CodeRabbit re-review (2 actionable, both Major, on the code the prior commit touched):

Fixed — 6492cdf

  • platform_util.py — reject non-positive PIDs in pid_alive() (→ False) and terminate_pid() (→ no-op) before os.kill is reached. A corrupt engine.pid read as 0 would otherwise make os.kill(0, …) target the orchestrator's own process group. Added tests/test_platform_util.py asserting os.kill is never invoked for pid <= 0.

Deferred — #12

  • "Don't treat a bare PID as engine identity" (PID reuse). Valid, but pre-existing (this PR only refactored the existence check) and the correct fix — persist a process start-time token, verify it in engine_alive/stop_run/tui — re-introduces an acked /proc read and changes read_pid's signature across ~6–8 files plus legacy-format tolerance. Out of scope for a release PR; tracked in engine.pid: verify process start-time identity to defend against PID reuse #12 with the full design. The destructive path is also bounded (same-UID, SIGTERM) and low-probability.

Full pytest (1047 passing) and trunk check (clean) green.

pbean and others added 2 commits June 25, 2026 09:20
…at the dev stage

Copilot fires agentStop for every subagent turn too — with an empty
transcriptPath and a tool-use session id, not the main session's turn-end.
With dev decoupled to bmad-dev-auto (which implements via subagents), that
premature Stop reached the dev stage, where 0 nudges made wait_for_completion
declare an outright stall before the skill wrote its terminal spec (same root
cause as the 0.6.4 review stall, now one stage earlier).

Add a per-profile subagent_stop_without_transcript flag (true for copilot):
a Stop carrying no transcript is treated as a subagent stop and skipped, so
the main session's real turn-end drives completion — and usage tallying is
restored, since that Stop carries the transcript.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document the genuinely new fixes made during the #11 review cycle that
hardened pre-existing behavior rather than just polishing the PR's own new
code: guard non-positive PIDs in pid_alive/terminate_pid, and route the
remaining os.kill(pid, 0) liveness checks through pid_alive (destructive on
Windows). The Copilot subagent agentStop fix is already recorded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pbean pbean merged commit dc8eaec into main Jun 25, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant