feat(compare): add normalized gain metric by christso · Pull Request #1101 · EntityProcess/agentv

christso · 2026-04-14T23:53:03Z

Summary

Adds Hake's normalized gain (`g`) to `agentv compare` output, measuring improvement relative to remaining headroom.

The metric

```
g = (score_candidate − score_baseline) / (1 − score_baseline)
```

Raw delta (`Δ`) tells you how much scores changed. Normalized gain tells you how much of the available improvement was captured:

Baseline	Candidate	Δ	g	Interpretation
0.10	0.55	+0.45	0.50	Captured 50% of remaining headroom
0.90	0.95	+0.05	0.50	Same proportional gain, despite smaller Δ
0.50	0.25	−0.25	−0.50	Regression: lost 50% of headroom
1.00	1.00	0.00	null	No headroom, metric undefined

Returns `null` when baseline is already 1.0 (perfect score). Null values are excluded from mean computation.

Where it appears

Table output: `g: +0.256` in summary line
Matrix pairwise: `g +0.256` alongside existing `Δ`
JSON output: `mean_normalized_gain` in summary, `normalized_gain` per matched result

Red/Green E2E

Before (main — no `g`):
```
Summary: 2 wins, 1 loss, 0 ties | Mean Δ: +0.267 | Status: improved
```

After (this branch):
```
Summary: 2 wins, 1 loss, 0 ties | Mean Δ: +0.267 | g: +0.256 | Status: improved
```

JSON output now includes `normalized_gain` per test and `mean_normalized_gain` in summary.

Changes

`apps/cli/src/commands/compare/index.ts` — `computeNormalizedGain()`, `normalizedGain` on `MatchedResult`, `meanNormalizedGain` in summary, display in `formatTable` and `formatMatrix`
`apps/cli/test/commands/compare/compare.test.ts` — 10 new tests covering all cases
`apps/web/src/content/docs/docs/tools/compare.mdx` — docs updated with formula, interpretation table, updated output examples

Test plan

50/50 tests pass (10 new tests added)
Typecheck, lint, build all pass (pre-push hook)
Red/green CLI e2e verified
JSON output verified: `normalized_gain` per test, `mean_normalized_gain` in summary
Docs updated

Closes #1100

🤖 Generated with Claude Code

Add Hake's normalized gain (g) to compare output, measuring improvement relative to remaining headroom rather than raw absolute delta. Formula: g = (score_candidate − score_baseline) / (1 − score_baseline) This separates genuine scaffolding from ceiling effects — a +5pp gain from a 90% baseline (g=0.5) is proportionally much larger than +5pp from a 10% baseline (g=0.056). Shown as "Norm. gain" in table output and "g" in matrix pairwise summary. Available as mean_normalized_gain in JSON output. Returns null when baseline is 1.0 (perfect score, no headroom). Closes #1100 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-04-14T23:53:24Z

Deploying agentv with Cloudflare Pages

Latest commit:	`ea1e60a`
Status:	✅ Deploy successful!
Preview URL:	https://38a450aa.agentv.pages.dev
Branch Preview URL:	https://feat-1100-normalized-gain.agentv.pages.dev

View logs

Use 'g' consistently in both table summary and matrix pairwise output, matching the standard notation from Hake (1998) and SkillsBench paper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add normalized gain (g) to compare docs: formula, interpretation table, updated table/JSON output examples, and tips section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(compare): use standard symbol 'g' for normalized gain

6da47fb

Use 'g' consistently in both table summary and matrix pairwise output, matching the standard notation from Hake (1998) and SkillsBench paper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

christso mentioned this pull request Apr 14, 2026

feat(studio): comparison analytics charts for skills/workflow benchmarking #1102

Closed

docs(compare): document normalized gain metric

ea1e60a

Add normalized gain (g) to compare docs: formula, interpretation table, updated table/JSON output examples, and tips section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

christso merged commit b37834e into main Apr 15, 2026
4 checks passed

christso deleted the feat/1100-normalized-gain branch April 15, 2026 00:22

christso mentioned this pull request Apr 15, 2026

examples/showcase: expand bug-fix-benchmark with rigorous multi-scenario workflow evals #1100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(compare): add normalized gain metric#1101

feat(compare): add normalized gain metric#1101
christso merged 3 commits intomainfrom
feat/1100-normalized-gain

christso commented Apr 14, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The metric

Where it appears

Red/Green E2E

Changes

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Apr 14, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Apr 14, 2026 •

edited

Loading