Skip to content

kaizen: AgentRunnerの失敗分類について、command_missing/timeout/rate_limitedの未網羅ケースを追加しました。 (#76)#86

Merged
s-hiraoku merged 5 commits into
mainfrom
kaizen/issue-76-add-missing-test-coverage-for-command-missing-ti
Jul 5, 2026
Merged

kaizen: AgentRunnerの失敗分類について、command_missing/timeout/rate_limitedの未網羅ケースを追加しました。 (#76)#86
s-hiraoku merged 5 commits into
mainfrom
kaizen/issue-76-add-missing-test-coverage-for-command-missing-ti

Conversation

@s-hiraoku

@s-hiraoku s-hiraoku commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Closes #76

Summary

AgentRunnerの失敗分類について、command_missing/timeout/rate_limitedの未網羅ケースを追加しました。
fallbackOnが対象分類を含む場合のフォールバック成功と、含まない場合の停止を検証しています。

Builder notes

検証: npm test / npm run validate:json / 共有skillファイル存在確認がすべて成功。保護パスの変更なし。

Provider evidence:

  • codex: exitCode=0, status=selected, failureClass=none, fallbackReason=none, payloadSource=last-message
    Selected backend: codex
    Final payload source: last-message

Verification

  • npm test
  • npm run validate:json
  • test -f skills/gh-link-issue-pr/SKILL.md && test -f skills/kaizen-bug-router/SKILL.md && test -f skills/pr-guardian/SKILL.md

Verifier

verifier: open_pr_with_warning
summary: Open PR with warning and 2 should_fix item(s); risk is medium.
notes: evidence_grade=reported
risk=medium
confidence=60
should_fix=Verification output contains a non-blocking risk signal.; Verification output contains a non-blocking risk signal.

Kaizen risk policy

Verifier cleared PR with warning: Open PR with warning and 2 should_fix item(s); risk is medium.

Changed files: 1
Changed lines: 143

Summary by CodeRabbit

  • Tests
    • Expanded fallback coverage for failed command execution, timeouts, and rate-limiting scenarios.
    • Added validation for both allowed fallback handling and blocked failure outcomes, improving confidence in CLI behavior under common failure conditions.

@coderabbitai

coderabbitai Bot commented Jul 5, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@s-hiraoku, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 53 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Repository: kaizen-agents-org/coderabbit/.coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: b4b8ac38-689d-4ae9-a639-fc5f9de5562c

📥 Commits

Reviewing files that changed from the base of the PR and between fdcdbcb and aba2d84.

📒 Files selected for processing (1)
  • test/agents/AgentRunner.test.ts
📝 Walkthrough

Walkthrough

This PR adds test coverage for three previously untested failure classification categories (command_missing, timeout, rate_limited) in the builder-agent test suite, including a shared fixture helper and CLI tests validating fallback behavior based on fallbackOn configuration.

Changes

Failure Classification Fallback Tests

Layer / File(s) Summary
Failure case matrix and fixture runner
test/builder-agent.test.js
Introduces failureClassificationCases covering command_missing (ENOENT, "command not found"), timeout (two raw variants), and rate_limited (429, rate limit reached, too many requests, quota exceeded), each with expected evidence regexes. Adds runFailureClassificationFixture to set up a temp workspace, optionally omit the provider command, run the CLI with a Hermes provider and fallbackOn list, and return parsed build-result.json, stdout, and any thrown error.
CLI tests validating classification-driven fallback
test/builder-agent.test.js
Adds tests iterating over failureClassificationCases to verify fixed outcome with fallback+selection evidence when fallbackOn includes the failure class, versus blocked outcome with non-zero exit and stopped evidence when fallbackOn is empty.

Estimated code review effort: 2 (Simple) | ~10 minutes

Possibly related PRs

  • kaizen-agents-org/builder-agent#30: Both PRs modify builder-agent fallback tests around fallbackOn-gated behavior for classified provider failures and recorded fallback/evidence fields.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title matches the main change: adding missing AgentRunner failure-classification test coverage for command_missing, timeout, and rate_limited cases.
Linked Issues check ✅ Passed The added tests cover the requested command_missing, timeout, and rate_limited variants and verify fallbackOn behavior in both fallback and stop cases.
Out of Scope Changes check ✅ Passed The changes stay within the test file and focus on the linked classification coverage without introducing unrelated code or public API changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kaizen/issue-76-add-missing-test-coverage-for-command-missing-ti

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 371d2fc43d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread test/builder-agent.test.js Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/builder-agent.test.js`:
- Around line 1072-1102: The two `it()` blocks in `builder-agent.test.js` repeat
the same `failureClassificationCases` loop and most assertions, so factor the
shared scaffolding into a helper near `runFailureClassificationFixture` or the
test block. Keep the helper focused on the common evidence checks and pass in
the expected outcome so the fallback-on and fallback-off cases only differ by
their specific assertions.
- Around line 1427-1431: The temp directory created in
runFailureClassificationFixture is never removed, so each fixture run leaks
files. Add cleanup for the mkdtemp-created dir in
runFailureClassificationFixture, ideally using a try/finally around the fixture
setup and execution so the directory is always deleted even if the test fails.
Use the existing runFailureClassificationFixture helper and its
dir/binDir/resultPath setup to locate the cleanup point.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: kaizen-agents-org/coderabbit/.coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 3c64e066-b193-4bea-b6c3-3729f195d549

📥 Commits

Reviewing files that changed from the base of the PR and between 185c9c6 and fdcdbcb.

📒 Files selected for processing (1)
  • test/builder-agent.test.js

Comment thread test/builder-agent.test.js Outdated
Comment thread test/builder-agent.test.js Outdated
s-hiraoku added 2 commits July 5, 2026 13:08
…issing-test-coverage-for-command-missing-ti

# Conflicts:
#	test/builder-agent.test.js
@s-hiraoku

Copy link
Copy Markdown
Contributor Author

pr-guardian follow-up completed.

  • Resolved the merge conflict by merging current main into this branch and moving the Add missing test coverage for command_missing/timeout/rate_limited failure classification #76 fallback classification coverage into the split TypeScript test suite.
  • Addressed the outstanding CodeRabbit threads: shared evidence assertions are factored through assertClassifiedProviderEvidence, and the fixture temp directory is cleaned up with try/finally + rm.
  • Verified locally with npm test and npm run validate:json.
  • GitHub checks are passing: CI test and CodeRabbit.
  • Review threads are resolved; PR state is CLEAN / MERGEABLE.

@s-hiraoku s-hiraoku merged commit 84a2967 into main Jul 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add missing test coverage for command_missing/timeout/rate_limited failure classification

1 participant