Skip to content

fix(story-automator): compute story test counts from JUnit truth#42

Open
Pawel-N-pl wants to merge 2 commits into
fix/dev-step-filelist-reconcilefrom
fix/dev-step-test-counts
Open

fix(story-automator): compute story test counts from JUnit truth#42
Pawel-N-pl wants to merge 2 commits into
fix/dev-step-filelist-reconcilefrom
fix/dev-step-test-counts

Conversation

@Pawel-N-pl

Copy link
Copy Markdown

What

New test-counts helper + orchestration gate that computes a story's test
counts (tests/failures/errors/skipped, +assertions for PHPUnit) from a JUnit
report and writes a single machine-owned ### Test Counts block — mirroring the
Phase-1 File List reconcile.

Why

Second half of doc-drift (#40): the dev agent transcribes counts from memory and
they drift. Phase 1 fixed the File List; this makes counts machine-computed too.

Fix

  • core/junit.py: stack-agnostic JUnit parse (sums direct <testsuite>
    children to avoid phpunit double-count; assertions nullable).
  • test-counts --repo --story [--since EPOCH] [--write]. Capture-first floor:
    • Tier 1: a JUnit artifact fresh since dev-start (--since) → parse it.
    • Tier 2: else if test.command configured → re-run it, parse what it emits.
    • Tier 3: else → skip with a logged reason (File List reconcile still runs).
  • New validated top-level test {command, junitPath} policy block; read via
    load_policy_unresolved (merge+validate, no sibling-skill resolution).
    {junit}/{story} placeholders. One command = one prod-like invocation.
  • Wired into step-03-execute.md next to the Phase-1 reconcile-story --write.
  • data/prompts/dev.md: nudge to emit JUnit + keep counts out of prose
    (Tier-1 hit-rate only; Tier 2 is the correctness floor).

Notes

Tests

tests/test_test_counts.py (19): JUnit parse edge cases, Tier 1/2/3 selection,
--since staleness, {story} placeholder, --write idempotency, error paths.
Full suite green: env -u AI_AGENT npm run verify (440 tests).

Part of #40

@Pawel-N-pl Pawel-N-pl requested a review from bma-d as a code owner June 7, 2026 12:49
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1ac693f1-94cb-437e-8a87-4235bc4b3ba5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/dev-step-test-counts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Pawel-N-pl

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@augmentcode

augmentcode Bot commented Jun 7, 2026

Copy link
Copy Markdown
🤖 Augment PR Summary

Summary: Adds a machine-computed “Test Counts” reconciliation step to the story automator, deriving counts directly from JUnit output to eliminate doc drift.

Changes:

  • Introduced a stack-agnostic JUnit parser (core/junit.py) that sums direct <testsuite> children (avoids PHPUnit double-counting) and optionally captures assertions.
  • Added a new test-counts CLI helper that selects counts via a tiered strategy: fresh artifact since dev start → rerun configured test command → skip with reason.
  • Added a validated top-level test policy block (command, junitPath) and a load_policy_unresolved reader to avoid requiring sibling-skill resolution.
  • Wired test-counts --since ... --write into step-03-execute.md alongside the existing File List reconcile.
  • Updated the dev prompt to prefer emitting JUnit and to keep counts out of story prose.
  • Added unit tests covering parsing edge cases, tier selection, staleness gating, and idempotent writes.

Technical Notes: The helper writes a single machine-owned ### Test Counts block in the story file; test counts are intentionally not sourced from LLM session parsing.

🤖 Was this summary useful? React with 👍 or 👎

@augmentcode augmentcode Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

elif arg == "--since" and idx + 1 < len(args):
try:
since = float(args[idx + 1])
except ValueError:

@augmentcode augmentcode Bot Jun 7, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skills/bmad-story-automator/src/story_automator/commands/basic.py:438 — --since parse failures are silently treated as since=None, which can unintentionally bypass staleness gating and let a stale JUnit artifact be treated as fresh. That could reintroduce the count drift this PR is trying to prevent when callers pass a malformed timestamp.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

if fresh: # Tier 1: trust the artifact emitted by this dev run
source = "capture"
elif command: # Tier 2: deterministic floor — re-run and parse what it emits
resolved = command.replace("{junit}", str(junit_path)).replace("{story}", story_file.stem)

@augmentcode augmentcode Bot Jun 7, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skills/bmad-story-automator/src/story_automator/commands/basic.py:472 — The {junit}/{story} substitutions are inserted into a bash -c string without any escaping, so paths containing spaces or shell metacharacters can cause the Tier-2 rerun command to misbehave (and then look like test_artifact_not_emitted). This can show up even with otherwise-correct config depending on where the repo lives on disk.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Pawel-N-pl pushed a commit that referenced this pull request Jun 7, 2026
…tions

Address Augment review on #42:
- reject non-numeric --since instead of silently dropping the staleness gate
  (a stale artifact could otherwise pass as a fresh capture and re-drift)
- shlex.quote the {junit}/{story} substitutions so paths with spaces or shell
  metacharacters don't break the Tier-2 rerun

Part of #40
@Pawel-N-pl

Copy link
Copy Markdown
Author

Thanks @augmentcode — both addressed in cc4935f:

  • --since validation (medium): a non-numeric --since now returns since_invalid (exit 1) instead of silently falling back to since=None. Dropping the staleness gate could let a stale JUnit pass as a fresh capture and re-introduce the very drift this PR prevents.
  • Unescaped substitutions (low): {junit}/{story} are now shlex.quote-d before going into bash -c, so paths with spaces/metacharacters no longer break the Tier-2 rerun. Placeholders must be left unquoted in the command template.

Both covered by new tests in tests/test_test_counts.py (test_invalid_since_is_an_error, test_rerun_handles_spaces_in_junit_path). Full suite green: env -u AI_AGENT npm run verify (442 tests).

ashigit added 2 commits June 7, 2026 17:35
…tions

Address Augment review on #42:
- reject non-numeric --since instead of silently dropping the staleness gate
  (a stale artifact could otherwise pass as a fresh capture and re-drift)
- shlex.quote the {junit}/{story} substitutions so paths with spaces or shell
  metacharacters don't break the Tier-2 rerun

Part of #40
@Pawel-N-pl Pawel-N-pl force-pushed the fix/dev-step-test-counts branch from cc4935f to 7b50066 Compare June 7, 2026 15:37
@Pawel-N-pl

Copy link
Copy Markdown
Author

Force-pushed (rebase): rebased onto the repositioned #41 and moved the test-counts sync to done-close (step-03b-execute-finish), right next to the File List reconcile — same rationale as #41 (auto re-runs the suite after dev, so a dev-close count is stale by review/commit time).

No logic change to test-counts itself; only its invocation point moved. --since "$dev_started" (still set in step-03 §B) keeps trusting the final artifact left by the latest run for the story — at done-close that's the post-auto suite, i.e. the count that actually gets committed. Stacked on #41.

@bma-d bma-d left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work here. I'm requesting changes because the new test-counts path can still record incorrect machine-owned counts and accepts a few config/path shapes that should fail closed.

return counts


def _int_attr(value: str | None) -> int:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This treats present-but-invalid numeric attributes as 0, and negative values pass through unchanged. Since these counts become the machine-owned story record, malformed JUnit output can produce fabricated totals instead of failing closed.

Repro: parsing <testsuite tests="abc" failures="1" errors="0" skipped="0" assertions="x"/> returns tests: 0, and tests="-1" returns -1.

Suggested fix: require present count attributes to be non-negative integers, raise ValueError on malformed or negative values, and add tests for non-numeric, empty, and negative attrs.

runtime = _expect_optional_dict(policy, "runtime")
_expect_optional_nested_dict(runtime, "merge", "runtime")
parser_runtime_config(policy)
test = _expect_optional_dict(policy, "test")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test policy block only type-checks command and junitPath when those exact keys are present. Unknown keys are accepted, so a typo like junit_path silently becomes junitPath: "" and disables JUnit capture instead of failing the policy.

Suggested fix: reject unknown keys inside test, then add a policy test for typo keys returning policy_invalid.

if not junit_rel: # Tier 3: nothing configured — File List reconcile still ran independently
write_json({"ok": True, "skipped": True, "reason": "test_not_configured", "test_counts": None, "wrote": False})
return 0
junit_path = Path(repo) / junit_rel.replace("{story}", story_file.stem)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

junitPath is joined directly to repo without rejecting absolute paths or parent traversal. In Python, an absolute junitPath discards the repo prefix, and ../outside.xml can escape the project, so the rerun path can read/write outside the repo.

Suggested fix: require a repo-relative path, resolve it, check it stays within the project root, and add tests for absolute and .. paths.

return "\n".join(new_lines) + suffix


def cmd_test_counts(args: list[str]) -> int:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds the whole test-counts command flow to commands/basic.py, which is now 538 lines. The file already owns stop hooks, git/story helpers, session listing, file-list reconciliation, and now JUnit policy/rerun/story-section writing, so it is becoming a catch-all command module.

Suggested fix: move cmd_test_counts and its render/section helpers into a dedicated command module, keeping only the CLI registration in cli.py.

Pawel-N-pl added a commit to Pawel-N-pl/bmad-automator that referenced this pull request Jun 12, 2026
mira5557373 pushed a commit to mira5557373/bmad-automator that referenced this pull request Jun 23, 2026
…tions

Address Augment review on bmad-code-org#42:
- reject non-numeric --since instead of silently dropping the staleness gate
  (a stale artifact could otherwise pass as a fresh capture and re-drift)
- shlex.quote the {junit}/{story} substitutions so paths with spaces or shell
  metacharacters don't break the Tier-2 rerun

Part of bmad-code-org#40

(cherry picked from commit 7b50066)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants