fix(story-automator): compute story test counts from JUnit truth by Pawel-N-pl · Pull Request #42 · bmad-code-org/bmad-automator

Pawel-N-pl · 2026-06-07T12:49:57Z

What

New test-counts helper + orchestration gate that computes a story's test
counts (tests/failures/errors/skipped, +assertions for PHPUnit) from a JUnit
report and writes a single machine-owned ### Test Counts block — mirroring the
Phase-1 File List reconcile.

Why

Second half of doc-drift (#40): the dev agent transcribes counts from memory and
they drift. Phase 1 fixed the File List; this makes counts machine-computed too.

Fix

core/junit.py: stack-agnostic JUnit parse (sums direct <testsuite>
children to avoid phpunit double-count; assertions nullable).
test-counts --repo --story [--since EPOCH] [--write]. Capture-first floor:
- Tier 1: a JUnit artifact fresh since dev-start (--since) → parse it.
- Tier 2: else if test.command configured → re-run it, parse what it emits.
- Tier 3: else → skip with a logged reason (File List reconcile still runs).
New validated top-level test {command, junitPath} policy block; read via
load_policy_unresolved (merge+validate, no sibling-skill resolution).
{junit}/{story} placeholders. One command = one prod-like invocation.
Wired into step-03-execute.md next to the Phase-1 reconcile-story --write.
data/prompts/dev.md: nudge to emit JUnit + keep counts out of prose
(Tier-1 hit-rate only; Tier 2 is the correctness floor).

Notes

Counts live in ONE machine-owned section. dev.json (the LLM session-parse
schema) is deliberately left untouched so no count is ever LLM-sourced.
Stacked on fix(story-automator): reconcile story File List against git before review #41 (Phase 1); base retargets to main after fix(story-automator): reconcile story File List against git before review #41 merges.

Tests

tests/test_test_counts.py (19): JUnit parse edge cases, Tier 1/2/3 selection,
--since staleness, {story} placeholder, --write idempotency, error paths.
Full suite green: env -u AI_AGENT npm run verify (440 tests).

Part of #40

coderabbitai · 2026-06-07T12:50:04Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1ac693f1-94cb-437e-8a87-4235bc4b3ba5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/dev-step-test-counts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Pawel-N-pl · 2026-06-07T12:50:51Z

@coderabbitai review

coderabbitai · 2026-06-07T12:50:56Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

augmentcode · 2026-06-07T12:53:41Z

🤖 Augment PR Summary

Summary: Adds a machine-computed “Test Counts” reconciliation step to the story automator, deriving counts directly from JUnit output to eliminate doc drift.

Changes:

Introduced a stack-agnostic JUnit parser (core/junit.py) that sums direct <testsuite> children (avoids PHPUnit double-counting) and optionally captures assertions.
Added a new test-counts CLI helper that selects counts via a tiered strategy: fresh artifact since dev start → rerun configured test command → skip with reason.
Added a validated top-level test policy block (command, junitPath) and a load_policy_unresolved reader to avoid requiring sibling-skill resolution.
Wired test-counts --since ... --write into step-03-execute.md alongside the existing File List reconcile.
Updated the dev prompt to prefer emitting JUnit and to keep counts out of story prose.
Added unit tests covering parsing edge cases, tier selection, staleness gating, and idempotent writes.

Technical Notes: The helper writes a single machine-owned ### Test Counts block in the story file; test counts are intentionally not sourced from LLM session parsing.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-06-07T12:53:42Z

+        elif arg == "--since" and idx + 1 < len(args):
+            try:
+                since = float(args[idx + 1])
+            except ValueError:


skills/bmad-story-automator/src/story_automator/commands/basic.py:438 — --since parse failures are silently treated as since=None, which can unintentionally bypass staleness gating and let a stale JUnit artifact be treated as fresh. That could reintroduce the count drift this PR is trying to prevent when callers pass a malformed timestamp.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-06-07T12:53:42Z

+    if fresh:  # Tier 1: trust the artifact emitted by this dev run
+        source = "capture"
+    elif command:  # Tier 2: deterministic floor — re-run and parse what it emits
+        resolved = command.replace("{junit}", str(junit_path)).replace("{story}", story_file.stem)


skills/bmad-story-automator/src/story_automator/commands/basic.py:472 — The {junit}/{story} substitutions are inserted into a bash -c string without any escaping, so paths containing spaces or shell metacharacters can cause the Tier-2 rerun command to misbehave (and then look like test_artifact_not_emitted). This can show up even with otherwise-correct config depending on where the repo lives on disk.

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

…tions Address Augment review on #42: - reject non-numeric --since instead of silently dropping the staleness gate (a stale artifact could otherwise pass as a fresh capture and re-drift) - shlex.quote the {junit}/{story} substitutions so paths with spaces or shell metacharacters don't break the Tier-2 rerun Part of #40

Pawel-N-pl · 2026-06-07T13:00:04Z

Thanks @augmentcode — both addressed in cc4935f:

--since validation (medium): a non-numeric --since now returns since_invalid (exit 1) instead of silently falling back to since=None. Dropping the staleness gate could let a stale JUnit pass as a fresh capture and re-introduce the very drift this PR prevents.
Unescaped substitutions (low): {junit}/{story} are now shlex.quote-d before going into bash -c, so paths with spaces/metacharacters no longer break the Tier-2 rerun. Placeholders must be left unquoted in the command template.

Both covered by new tests in tests/test_test_counts.py (test_invalid_since_is_an_error, test_rerun_handles_spaces_in_junit_path). Full suite green: env -u AI_AGENT npm run verify (442 tests).

Part of #40

…tions Address Augment review on #42: - reject non-numeric --since instead of silently dropping the staleness gate (a stale artifact could otherwise pass as a fresh capture and re-drift) - shlex.quote the {junit}/{story} substitutions so paths with spaces or shell metacharacters don't break the Tier-2 rerun Part of #40

Pawel-N-pl · 2026-06-07T15:37:42Z

Force-pushed (rebase): rebased onto the repositioned #41 and moved the test-counts sync to done-close (step-03b-execute-finish), right next to the File List reconcile — same rationale as #41 (auto re-runs the suite after dev, so a dev-close count is stale by review/commit time).

No logic change to test-counts itself; only its invocation point moved. --since "$dev_started" (still set in step-03 §B) keeps trusting the final artifact left by the latest run for the story — at done-close that's the post-auto suite, i.e. the count that actually gets committed. Stacked on #41.

bma-d

Thanks for the work here. I'm requesting changes because the new test-counts path can still record incorrect machine-owned counts and accepts a few config/path shapes that should fail closed.

bma-d · 2026-06-12T07:20:03Z

+    return counts
+
+
+def _int_attr(value: str | None) -> int:


This treats present-but-invalid numeric attributes as 0, and negative values pass through unchanged. Since these counts become the machine-owned story record, malformed JUnit output can produce fabricated totals instead of failing closed.

Repro: parsing <testsuite tests="abc" failures="1" errors="0" skipped="0" assertions="x"/> returns tests: 0, and tests="-1" returns -1.

Suggested fix: require present count attributes to be non-negative integers, raise ValueError on malformed or negative values, and add tests for non-numeric, empty, and negative attrs.

bma-d · 2026-06-12T07:20:03Z

    runtime = _expect_optional_dict(policy, "runtime")
    _expect_optional_nested_dict(runtime, "merge", "runtime")
    parser_runtime_config(policy)
+    test = _expect_optional_dict(policy, "test")


The new test policy block only type-checks command and junitPath when those exact keys are present. Unknown keys are accepted, so a typo like junit_path silently becomes junitPath: "" and disables JUnit capture instead of failing the policy.

Suggested fix: reject unknown keys inside test, then add a policy test for typo keys returning policy_invalid.

bma-d · 2026-06-12T07:20:03Z

+    if not junit_rel:  # Tier 3: nothing configured — File List reconcile still ran independently
+        write_json({"ok": True, "skipped": True, "reason": "test_not_configured", "test_counts": None, "wrote": False})
+        return 0
+    junit_path = Path(repo) / junit_rel.replace("{story}", story_file.stem)


junitPath is joined directly to repo without rejecting absolute paths or parent traversal. In Python, an absolute junitPath discards the repo prefix, and ../outside.xml can escape the project, so the rerun path can read/write outside the repo.

Suggested fix: require a repo-relative path, resolve it, check it stays within the project root, and add tests for absolute and .. paths.

bma-d · 2026-06-12T07:20:03Z

+    return "\n".join(new_lines) + suffix
+
+
+def cmd_test_counts(args: list[str]) -> int:


This adds the whole test-counts command flow to commands/basic.py, which is now 538 lines. The file already owns stop hooks, git/story helpers, session listing, file-list reconciliation, and now JUnit policy/rerun/story-section writing, so it is becoming a catch-all command module.

Suggested fix: move cmd_test_counts and its render/section helpers into a dedicated command module, keeping only the CLI registration in cli.py.

fix(story-automator): address maintainer review on bmad-code-org#38/bmad-code-org#39/bmad-code-org#41/bmad-code-org#42

…tions Address Augment review on bmad-code-org#42: - reject non-numeric --since instead of silently dropping the staleness gate (a stale artifact could otherwise pass as a fresh capture and re-drift) - shlex.quote the {junit}/{story} substitutions so paths with spaces or shell metacharacters don't break the Tier-2 rerun Part of bmad-code-org#40 (cherry picked from commit 7b50066)

Pawel-N-pl requested a review from bma-d as a code owner June 7, 2026 12:49

augmentcode Bot reviewed Jun 7, 2026

View reviewed changes

ashigit added 2 commits June 7, 2026 17:35

fix(story-automator): compute story test counts from JUnit truth

675ce7c

Part of #40

Pawel-N-pl force-pushed the fix/dev-step-test-counts branch from cc4935f to 7b50066 Compare June 7, 2026 15:37

This was referenced Jun 7, 2026

fix(story-automator): reconcile story File List against git before review #41

Open

feat(story-automator): story-structure completeness gate before done (AI-4.2 / TD-16) Pawel-N-pl/bmad-automator#3

Merged

bma-d requested changes Jun 12, 2026

View reviewed changes

Pawel-N-pl mentioned this pull request Jun 12, 2026

fix(story-automator): address maintainer review on #38/#39/#41/#42 Pawel-N-pl/bmad-automator#5

Merged

Pawel-N-pl added a commit to Pawel-N-pl/bmad-automator that referenced this pull request Jun 12, 2026

Merge pull request #5 from Pawel-N-pl/fix/brian-review-followups

cd7b494

fix(story-automator): address maintainer review on bmad-code-org#38/bmad-code-org#39/bmad-code-org#41/bmad-code-org#42

		return "\n".join(new_lines) + suffix


		def cmd_test_counts(args: list[str]) -> int:

Uh oh!

Conversation

Pawel-N-pl commented Jun 7, 2026

What

Why

Fix

Notes

Tests

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Pawel-N-pl commented Jun 7, 2026

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

augmentcode Bot commented Jun 7, 2026

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode Bot Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode Bot Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pawel-N-pl commented Jun 7, 2026

Uh oh!

Pawel-N-pl commented Jun 7, 2026

Uh oh!

bma-d left a comment

Choose a reason for hiding this comment

Uh oh!

bma-d Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

bma-d Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

bma-d Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

bma-d Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

augmentcode Bot Jun 7, 2026 •

edited

Loading

augmentcode Bot Jun 7, 2026 •

edited

Loading