feat(core): report inconclusive status when all tests have execution errors#941
Merged
feat(core): report inconclusive status when all tests have execution errors#941
Conversation
…errors When all eval tests fail due to execution errors (e.g., misconfigured model, network failures), the run now reports INCONCLUSIVE instead of a misleading PASS/FAIL verdict. - Exit code 2 for all-execution-error runs (distinct from exit 1 for threshold failures) - CLI shows yellow INCONCLUSIVE verdict with clear messaging - JUnit XML uses executionStatus to classify <error> vs <failure>, preventing double-counting of execution errors as failures Closes #894 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying agentv with
|
| Latest commit: |
c73afe6
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://7f811a05.agentv.pages.dev |
| Branch Preview URL: | https://feat-894-inconclusive-status.agentv.pages.dev |
Make the UAT requirement more prominent with a blocking warning, clearer red/green definitions, and explicit dependency on completing UAT before proceeding to later checklist steps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
Manual Red/Green UATScenario: Run eval with a nonexistent Azure deployment ( Red (main) — misleading PASS with exit code 0All tests errored, but the run reports PASS with exit code 0 — CI would treat this as success. Green (this branch) — clear INCONCLUSIVE with exit code 2Now reports INCONCLUSIVE with distinct exit code 2, so CI can differentiate between threshold failure (exit 1), execution errors (exit 2), and success (exit 0). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When all eval tests fail due to execution errors (e.g., misconfigured model, network failures, auth errors), the run now reports a distinct INCONCLUSIVE status instead of a misleading PASS/FAIL verdict.
RESULT: INCONCLUSIVE (all N test(s) had execution errors — no evaluation was performed)executionStatusto classify<error>vs<failure>elements, preventing execution errors from being double-counted as failuresTest plan
statistics-inconclusive.test.ts)executionStatusCloses #894
🤖 Generated with Claude Code