Skip to content

feat(studio): cross-model comparison matrix view#984

Merged
christso merged 3 commits intomainfrom
feat/981-comparison-view
Apr 8, 2026
Merged

feat(studio): cross-model comparison matrix view#984
christso merged 3 commits intomainfrom
feat/981-comparison-view

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 8, 2026

Summary

  • Adds a Compare tab to the Studio UI (both single-project and multi-project views) displaying a matrix of experiments (columns) x targets (rows)
  • Each cell shows pass rate (color-coded: green >80%, yellow 50-80%, red <50%), average score, and pass/fail counts
  • Cells are expandable to reveal per-test-case breakdown with individual scores
  • Best/worst performers per row are highlighted with up/down indicators
  • Backend: new /api/compare and /api/projects/:projectId/compare endpoints that aggregate results by experiment x target

Closes #981

Test plan

  • bun run build passes
  • bun run test — all 1901 tests pass
  • bun run lint — no errors
  • Manual verification: run agentv studio with multi-experiment/multi-target results and confirm the Compare tab renders the matrix correctly
  • Verify both unscoped (single-project) and project-scoped views show the Compare tab

🤖 Generated with Claude Code

christso and others added 2 commits April 8, 2026 22:22
Add a Compare tab to the Studio UI that displays a matrix of experiment
(columns) x target (rows) cells. Each cell shows pass rate, average score,
and test counts, color-coded by performance thresholds (green >80%, yellow
50-80%, red <50%). Cells are expandable to show per-test-case breakdown.

Backend: new /api/compare and /api/projects/:projectId/compare endpoints
that group runs by experiment x target and compute pass_rate + avg_score.

Closes #981

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 8, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 87ca80b
Status: ✅  Deploy successful!
Preview URL: https://f560b893.agentv.pages.dev
Branch Preview URL: https://feat-981-comparison-view.agentv.pages.dev

View logs

- Use JSON.stringify key to prevent cell collisions
- Cap tests array per cell to prevent unbounded payload
- Deduplicate test results (keep latest per test_id)
- Add aria-expanded to expandable cells
- Thread error state into CompareTab
- Remove type assertion in project compare tab

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso christso marked this pull request as ready for review April 8, 2026 23:01
@christso christso merged commit b70ea3f into main Apr 8, 2026
4 checks passed
@christso christso deleted the feat/981-comparison-view branch April 8, 2026 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(studio): cross-model comparison view in Studio UI

1 participant