Skip to content

feat(core): report inconclusive status when all tests have execution errors#941

Merged
christso merged 2 commits intomainfrom
feat/894-inconclusive-status
Apr 5, 2026
Merged

feat(core): report inconclusive status when all tests have execution errors#941
christso merged 2 commits intomainfrom
feat/894-inconclusive-status

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 5, 2026

Summary

When all eval tests fail due to execution errors (e.g., misconfigured model, network failures, auth errors), the run now reports a distinct INCONCLUSIVE status instead of a misleading PASS/FAIL verdict.

  • Exit code 2 for all-execution-error runs (distinct from exit 1 for threshold failures), so CI workflows can differentiate
  • Yellow INCONCLUSIVE verdict in CLI output: RESULT: INCONCLUSIVE (all N test(s) had execution errors — no evaluation was performed)
  • JUnit XML uses executionStatus to classify <error> vs <failure> elements, preventing execution errors from being double-counted as failures

Test plan

  • New unit tests for INCONCLUSIVE verdict formatting (statistics-inconclusive.test.ts)
  • New unit tests for JUnit error/failure classification by executionStatus
  • Existing tests updated and passing
  • Typecheck passes
  • Lint passes
  • Manual red/green UAT with a misconfigured model target

Closes #894

🤖 Generated with Claude Code

…errors

When all eval tests fail due to execution errors (e.g., misconfigured
model, network failures), the run now reports INCONCLUSIVE instead of
a misleading PASS/FAIL verdict.

- Exit code 2 for all-execution-error runs (distinct from exit 1 for
  threshold failures)
- CLI shows yellow INCONCLUSIVE verdict with clear messaging
- JUnit XML uses executionStatus to classify <error> vs <failure>,
  preventing double-counting of execution errors as failures

Closes #894

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 5, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: c73afe6
Status: ✅  Deploy successful!
Preview URL: https://7f811a05.agentv.pages.dev
Branch Preview URL: https://feat-894-inconclusive-status.agentv.pages.dev

View logs

Make the UAT requirement more prominent with a blocking warning,
clearer red/green definitions, and explicit dependency on completing
UAT before proceeding to later checklist steps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso
Copy link
Copy Markdown
Collaborator Author

christso commented Apr 5, 2026

Manual Red/Green UAT

Scenario: Run eval with a nonexistent Azure deployment (AZURE_DEPLOYMENT_NAME=nonexistent-model-xyz-404) so all tests produce execution errors.

Red (main) — misleading PASS with exit code 0

2/2   ❌ test-1 | azure | ERROR: The API deployment for this resource does not exist...
2/2   ❌ test-2 | azure | ERROR: The API deployment for this resource does not exist...

RESULT: PASS  (0/0 scored >= 0.8, mean: 0.000)

EXIT CODE: 0

All tests errored, but the run reports PASS with exit code 0 — CI would treat this as success.

Green (this branch) — clear INCONCLUSIVE with exit code 2

2/2   ❌ test-1 | azure | ERROR: The API deployment for this resource does not exist...
2/2   ❌ test-2 | azure | ERROR: The API deployment for this resource does not exist...

RESULT: INCONCLUSIVE  (all 2 test(s) had execution errors — no evaluation was performed)

EXIT CODE: 2

Now reports INCONCLUSIVE with distinct exit code 2, so CI can differentiate between threshold failure (exit 1), execution errors (exit 2), and success (exit 0).

@christso christso marked this pull request as ready for review April 5, 2026 07:41
@christso christso merged commit f22a2ad into main Apr 5, 2026
4 checks passed
@christso christso deleted the feat/894-inconclusive-status branch April 5, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(core): report inconclusive status when all tests have execution errors

1 participant