Skip to content

fix(stats): include execution errors in pass/total denominator#999

Merged
christso merged 1 commit intomainfrom
fix/998-stat-denominator
Apr 9, 2026
Merged

fix(stats): include execution errors in pass/total denominator#999
christso merged 1 commit intomainfrom
fix/998-stat-denominator

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 9, 2026

Summary

  • Changes the X / Y stat in the final eval summary from passedCount / gradedCount to passedCount / total
  • gradedCount previously excluded execution errors, making results look better than they were
  • Aligns with the Convex Evals convention that inspired AgentV's design: denominator = all tests actually attempted

Before / After

Before: 2 of 23 tests crash before grading → RESULT: PASS (7/21 scored >= 0.8, mean: 0.900)

After: RESULT: PASS (7/23 scored >= 0.8, mean: 0.900)

Execution errors are still reported separately in the detail section — they just no longer disappear from the fraction.

Test plan

  • All existing unit tests pass (no assertion changes needed — tests check PASS/FAIL/INCONCLUSIVE verdict, not the exact fraction string)
  • Pre-push hooks green (build, typecheck, lint, test, validate:examples)

Closes #998

Previously the summary showed passedCount/gradedCount where gradedCount
excluded execution errors, making results appear better than they were.
Now shows passedCount/total to match the Convex Evals convention that
inspired this design: the denominator is all tests actually attempted,
not just those that reached the grading stage.

Execution errors are still reported separately in the detail section.

Closes #998
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9f99201
Status: ✅  Deploy successful!
Preview URL: https://843cfe2a.agentv.pages.dev
Branch Preview URL: https://fix-998-stat-denominator.agentv.pages.dev

View logs

@christso christso merged commit d414916 into main Apr 9, 2026
4 checks passed
@christso christso deleted the fix/998-stat-denominator branch April 9, 2026 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(stats): align pass/total denominator with Convex Evals convention

1 participant