fix(story-automator): compute story test counts from JUnit truth#42
fix(story-automator): compute story test counts from JUnit truth#42Pawel-N-pl wants to merge 2 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
🤖 Augment PR SummarySummary: Adds a machine-computed “Test Counts” reconciliation step to the story automator, deriving counts directly from JUnit output to eliminate doc drift. Changes:
Technical Notes: The helper writes a single machine-owned 🤖 Was this summary useful? React with 👍 or 👎 |
| elif arg == "--since" and idx + 1 < len(args): | ||
| try: | ||
| since = float(args[idx + 1]) | ||
| except ValueError: |
There was a problem hiding this comment.
skills/bmad-story-automator/src/story_automator/commands/basic.py:438 — --since parse failures are silently treated as since=None, which can unintentionally bypass staleness gating and let a stale JUnit artifact be treated as fresh. That could reintroduce the count drift this PR is trying to prevent when callers pass a malformed timestamp.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| if fresh: # Tier 1: trust the artifact emitted by this dev run | ||
| source = "capture" | ||
| elif command: # Tier 2: deterministic floor — re-run and parse what it emits | ||
| resolved = command.replace("{junit}", str(junit_path)).replace("{story}", story_file.stem) |
There was a problem hiding this comment.
skills/bmad-story-automator/src/story_automator/commands/basic.py:472 — The {junit}/{story} substitutions are inserted into a bash -c string without any escaping, so paths containing spaces or shell metacharacters can cause the Tier-2 rerun command to misbehave (and then look like test_artifact_not_emitted). This can show up even with otherwise-correct config depending on where the repo lives on disk.
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
…tions Address Augment review on #42: - reject non-numeric --since instead of silently dropping the staleness gate (a stale artifact could otherwise pass as a fresh capture and re-drift) - shlex.quote the {junit}/{story} substitutions so paths with spaces or shell metacharacters don't break the Tier-2 rerun Part of #40
|
Thanks @augmentcode — both addressed in cc4935f:
Both covered by new tests in |
…tions Address Augment review on #42: - reject non-numeric --since instead of silently dropping the staleness gate (a stale artifact could otherwise pass as a fresh capture and re-drift) - shlex.quote the {junit}/{story} substitutions so paths with spaces or shell metacharacters don't break the Tier-2 rerun Part of #40
cc4935f to
7b50066
Compare
|
Force-pushed (rebase): rebased onto the repositioned #41 and moved the test-counts sync to done-close ( No logic change to |
bma-d
left a comment
There was a problem hiding this comment.
Thanks for the work here. I'm requesting changes because the new test-counts path can still record incorrect machine-owned counts and accepts a few config/path shapes that should fail closed.
| return counts | ||
|
|
||
|
|
||
| def _int_attr(value: str | None) -> int: |
There was a problem hiding this comment.
This treats present-but-invalid numeric attributes as 0, and negative values pass through unchanged. Since these counts become the machine-owned story record, malformed JUnit output can produce fabricated totals instead of failing closed.
Repro: parsing <testsuite tests="abc" failures="1" errors="0" skipped="0" assertions="x"/> returns tests: 0, and tests="-1" returns -1.
Suggested fix: require present count attributes to be non-negative integers, raise ValueError on malformed or negative values, and add tests for non-numeric, empty, and negative attrs.
| runtime = _expect_optional_dict(policy, "runtime") | ||
| _expect_optional_nested_dict(runtime, "merge", "runtime") | ||
| parser_runtime_config(policy) | ||
| test = _expect_optional_dict(policy, "test") |
There was a problem hiding this comment.
The new test policy block only type-checks command and junitPath when those exact keys are present. Unknown keys are accepted, so a typo like junit_path silently becomes junitPath: "" and disables JUnit capture instead of failing the policy.
Suggested fix: reject unknown keys inside test, then add a policy test for typo keys returning policy_invalid.
| if not junit_rel: # Tier 3: nothing configured — File List reconcile still ran independently | ||
| write_json({"ok": True, "skipped": True, "reason": "test_not_configured", "test_counts": None, "wrote": False}) | ||
| return 0 | ||
| junit_path = Path(repo) / junit_rel.replace("{story}", story_file.stem) |
There was a problem hiding this comment.
junitPath is joined directly to repo without rejecting absolute paths or parent traversal. In Python, an absolute junitPath discards the repo prefix, and ../outside.xml can escape the project, so the rerun path can read/write outside the repo.
Suggested fix: require a repo-relative path, resolve it, check it stays within the project root, and add tests for absolute and .. paths.
| return "\n".join(new_lines) + suffix | ||
|
|
||
|
|
||
| def cmd_test_counts(args: list[str]) -> int: |
There was a problem hiding this comment.
This adds the whole test-counts command flow to commands/basic.py, which is now 538 lines. The file already owns stop hooks, git/story helpers, session listing, file-list reconciliation, and now JUnit policy/rerun/story-section writing, so it is becoming a catch-all command module.
Suggested fix: move cmd_test_counts and its render/section helpers into a dedicated command module, keeping only the CLI registration in cli.py.
fix(story-automator): address maintainer review on bmad-code-org#38/bmad-code-org#39/bmad-code-org#41/bmad-code-org#42
…tions Address Augment review on bmad-code-org#42: - reject non-numeric --since instead of silently dropping the staleness gate (a stale artifact could otherwise pass as a fresh capture and re-drift) - shlex.quote the {junit}/{story} substitutions so paths with spaces or shell metacharacters don't break the Tier-2 rerun Part of bmad-code-org#40 (cherry picked from commit 7b50066)
What
New
test-countshelper + orchestration gate that computes a story's testcounts (tests/failures/errors/skipped, +assertions for PHPUnit) from a JUnit
report and writes a single machine-owned
### Test Countsblock — mirroring thePhase-1 File List reconcile.
Why
Second half of doc-drift (#40): the dev agent transcribes counts from memory and
they drift. Phase 1 fixed the File List; this makes counts machine-computed too.
Fix
core/junit.py: stack-agnostic JUnit parse (sums direct<testsuite>children to avoid phpunit double-count;
assertionsnullable).test-counts --repo --story [--since EPOCH] [--write]. Capture-first floor:--since) → parse it.test.commandconfigured → re-run it, parse what it emits.test {command, junitPath}policy block; read viaload_policy_unresolved(merge+validate, no sibling-skill resolution).{junit}/{story}placeholders. One command = one prod-like invocation.step-03-execute.mdnext to the Phase-1reconcile-story --write.data/prompts/dev.md: nudge to emit JUnit + keep counts out of prose(Tier-1 hit-rate only; Tier 2 is the correctness floor).
Notes
dev.json(the LLM session-parseschema) is deliberately left untouched so no count is ever LLM-sourced.
mainafter fix(story-automator): reconcile story File List against git before review #41 merges.Tests
tests/test_test_counts.py(19): JUnit parse edge cases, Tier 1/2/3 selection,--sincestaleness,{story}placeholder,--writeidempotency, error paths.Full suite green:
env -u AI_AGENT npm run verify(440 tests).Part of #40