feat(core): report inconclusive status when all tests have execution errors by christso · Pull Request #941 · EntityProcess/agentv

christso · 2026-04-05T07:22:06Z

Summary

When all eval tests fail due to execution errors (e.g., misconfigured model, network failures, auth errors), the run now reports a distinct INCONCLUSIVE status instead of a misleading PASS/FAIL verdict.

Exit code 2 for all-execution-error runs (distinct from exit 1 for threshold failures), so CI workflows can differentiate
Yellow INCONCLUSIVE verdict in CLI output: RESULT: INCONCLUSIVE (all N test(s) had execution errors — no evaluation was performed)
JUnit XML uses executionStatus to classify <error> vs <failure> elements, preventing execution errors from being double-counted as failures

Test plan

New unit tests for INCONCLUSIVE verdict formatting (statistics-inconclusive.test.ts)
New unit tests for JUnit error/failure classification by executionStatus
Existing tests updated and passing
Typecheck passes
Lint passes
Manual red/green UAT with a misconfigured model target

Closes #894

🤖 Generated with Claude Code

…errors When all eval tests fail due to execution errors (e.g., misconfigured model, network failures), the run now reports INCONCLUSIVE instead of a misleading PASS/FAIL verdict. - Exit code 2 for all-execution-error runs (distinct from exit 1 for threshold failures) - CLI shows yellow INCONCLUSIVE verdict with clear messaging - JUnit XML uses executionStatus to classify <error> vs <failure>, preventing double-counting of execution errors as failures Closes #894 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-04-05T07:22:52Z

Deploying agentv with Cloudflare Pages

Latest commit:	`c73afe6`
Status:	✅ Deploy successful!
Preview URL:	https://7f811a05.agentv.pages.dev
Branch Preview URL:	https://feat-894-inconclusive-status.agentv.pages.dev

View logs

Make the UAT requirement more prominent with a blocking warning, clearer red/green definitions, and explicit dependency on completing UAT before proceeding to later checklist steps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

christso · 2026-04-05T07:41:23Z

Manual Red/Green UAT

Scenario: Run eval with a nonexistent Azure deployment (AZURE_DEPLOYMENT_NAME=nonexistent-model-xyz-404) so all tests produce execution errors.

Red (main) — misleading PASS with exit code 0

2/2   ❌ test-1 | azure | ERROR: The API deployment for this resource does not exist...
2/2   ❌ test-2 | azure | ERROR: The API deployment for this resource does not exist...

RESULT: PASS  (0/0 scored >= 0.8, mean: 0.000)

EXIT CODE: 0

All tests errored, but the run reports PASS with exit code 0 — CI would treat this as success.

Green (this branch) — clear INCONCLUSIVE with exit code 2

2/2   ❌ test-1 | azure | ERROR: The API deployment for this resource does not exist...
2/2   ❌ test-2 | azure | ERROR: The API deployment for this resource does not exist...

RESULT: INCONCLUSIVE  (all 2 test(s) had execution errors — no evaluation was performed)

EXIT CODE: 2

Now reports INCONCLUSIVE with distinct exit code 2, so CI can differentiate between threshold failure (exit 1), execution errors (exit 2), and success (exit 0).

christso marked this pull request as ready for review April 5, 2026 07:41

christso merged commit f22a2ad into main Apr 5, 2026
4 checks passed

christso deleted the feat/894-inconclusive-status branch April 5, 2026 07:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): report inconclusive status when all tests have execution errors#941

feat(core): report inconclusive status when all tests have execution errors#941
christso merged 2 commits intomainfrom
feat/894-inconclusive-status

christso commented Apr 5, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

christso commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Apr 5, 2026

Summary

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

christso commented Apr 5, 2026

Manual Red/Green UAT

Red (main) — misleading PASS with exit code 0

Green (this branch) — clear INCONCLUSIVE with exit code 2

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Apr 5, 2026 •

edited

Loading