Summary
Follow-up to #805 (merged in #806). The AgentV Studio scaffold is functional with run list, run detail, eval detail (Steps/Output/Task), category breakdown, eval sidebar, and failure reason display. This issue tracks the remaining features needed for full parity with the convex-evals visualizer.
Research
Screenshot Map (target state for each gap)
Current State (what #806 shipped)
Gap 1 (High): File Tree in Output/Task Tabs
What
Convex-evals Output tab shows a collapsible directory tree alongside Monaco Editor. Users click individual files to view them with syntax highlighting. Currently agentv dumps serialized conversation text into a single Monaco panel.
Where to change
- New component: `apps/studio/src/components/FileTree.tsx` — collapsible tree with folder/file icons
- Modify: `apps/studio/src/components/EvalDetail.tsx` — `OutputTab` and `TaskTab` functions
- New API endpoint: `GET /api/runs/:filename/evals/:evalId/files` in `apps/cli/src/commands/results/serve.ts` — returns file tree from the eval's artifact directory (grading/, timing/, input/, output/ folders)
- Reference: convex-evals `visualizer/src/lib/evalComponents.tsx` and screenshot `06-eval-output-tab.png`
Implementation
- Add Hono endpoint that reads the eval's run directory, lists files in `input/`, `output/`, `grading/`, `timing/` subdirectories, and returns a tree structure: `{ name, path, type: "file"|"dir", children? }`
- Create `FileTree` component: collapsible folders, file-type icons (use emoji like convex-evals: 📁 folder, 📘 .ts, 📋 .json, 📜 .log), click-to-select highlighting
- Split Output/Task tab into left panel (FileTree, ~250px) + right panel (MonacoViewer)
- On file click, fetch file content via `GET /api/runs/:filename/evals/:evalId/files/:path` and render in Monaco with language auto-detection from extension
- Default selection: first file in tree, or `run.log` if it exists
Red/Green Gates
Gap 2 (High): Category Drill-Down Page
What
Convex-evals has a dedicated page per category (e.g., `/experiment/no_guidelines/run/abc/Fundamentals`) showing stat cards and eval list scoped to that category. Currently agentv only filters the flat list via category card clicks on the run detail page.
Where to change
- New route: `apps/studio/src/routes/runs/$runId.category.$category.tsx`
- Modify: `apps/studio/src/components/RunDetail.tsx` — category cards should `` to the new route instead of toggling a filter
- Modify: `apps/studio/src/components/Sidebar.tsx` — on category pages, sidebar should show eval list for that category only
Implementation
- Create route file `runs/$runId.category.$category.tsx` that fetches run data and filters to evals matching the category
- Page shows: category name as heading, stat cards (Total, Passed, Failed, Pass Rate scoped to category), eval table
- Category cards on run detail page become `` instead of `onClick` filter toggle
- Sidebar on category page shows evals in that category with pass/fail indicators
Red/Green Gates
Gap 3 (Medium): Landing Page Tabs — Experiments & Targets
What
Convex-evals landing page has 3 tabs: Recent Runs (default), Experiments, Models. AgentV has only the run list.
Where to change
- Modify: `apps/studio/src/routes/index.tsx` — add tab bar and tab content components
- New components: `apps/studio/src/components/ExperimentsTab.tsx`, `apps/studio/src/components/TargetsTab.tsx`
- New API endpoint: `GET /api/experiments` and `GET /api/targets` in serve.ts — aggregate across all runs
Implementation
- Add tab bar to landing page: "Recent Runs" | "Experiments" | "Targets" (use same tab styling as eval detail: cyan underline for active)
- Experiments tab: table with columns — Experiment, Runs, Targets, Evals (passed/total), Pass Rate (score bar), Last Run. Group data by `experiment` field across all runs.
- Targets tab: table with columns — Target, Runs, Experiments, Evals (passed/total), Pass Rate (score bar). Group data by `target` field across all runs.
- API endpoints aggregate from existing run index data (no new data sources needed)
- Rows in both tabs should be clickable, navigating to filtered views
Red/Green Gates
Gap 4 (Medium): Experiment Detail Page
What
Clicking an experiment in the Experiments tab should show a dedicated page with all runs in that experiment.
Where to change
- New route: `apps/studio/src/routes/experiments/$experimentName.tsx`
- Modify: `ExperimentsTab.tsx` — rows link to this new route
Implementation
- Create route that fetches all runs, filters to those matching the experiment name
- Page shows: experiment name as heading, stat cards (Total Runs, Completed, Pass Rate, Targets), run table (same columns as landing but scoped)
- Sidebar shows experiment list with pass rate bars (similar to convex-evals experiment sidebar)
Red/Green Gates
Gap 5 (Medium): Breadcrumb Navigation
What
Convex-evals has a full breadcrumb trail: `Home > Experiment > Run > Category > Eval`. Currently we show simple "Run: X / Eval: Y" text.
Where to change
- New component: `apps/studio/src/components/Breadcrumbs.tsx`
- Modify: `apps/studio/src/components/Layout.tsx` — render breadcrumbs above page content
- Use TanStack Router's `useMatches()` or `useRouterState()` to derive breadcrumb segments from the current route
Implementation
- Create a `Breadcrumbs` component that reads the current route matches
- Each segment is a clickable link: Home (/) > Run (timestamp) > Category (name) > Eval (testId)
- Separator: `>` or `/` between segments
- Last segment is non-clickable (current page)
- Styling: gray-400 text, cyan for links, truncate long segments
Red/Green Gates
Low Priority (do last or skip)
Step timing badges
Add duration next to pass/fail checkmarks in assertion steps: "✓ Output contains 'Hello' (0.2s)". Check `durationMs` on assertion entries.
- GREEN: At least one step shows timing in parentheses
- RED: Steps show only checkmark + text
Run metadata enrichment
Surface `target`, `experiment`, `eval_set` in run list and run detail headers.
- GREEN: Run list table has a "Target" or "Experiment" column
- RED: Run list only shows timestamp-based run IDs
Top navigation bar
Persistent top nav with "AgentV Studio" logo, breadcrumbs, and tab links.
- GREEN: Top bar is visible on all pages with logo and navigation
- RED: No top bar exists (sidebar only)
Pagination
"Load more" button or virtual scrolling for large result sets.
- GREEN: Run list with 50+ entries shows pagination or virtual scroll
- RED: All rows render at once regardless of count
Implementation Notes
- All studio code lives in `apps/studio/src/`
- Routes use TanStack Router file-based routing in `src/routes/`
- Data fetching uses TanStack Query hooks in `src/lib/api.ts`
- The Hono API in `apps/cli/src/commands/results/serve.ts` may need new endpoints
- Build: `bun --filter @agentv/studio build`
- Test: `bun --filter agentv test` (353 tests)
- Lint: `biome check apps/studio/`
- Dark theme uses Tailwind CSS 4 utilities (bg-gray-950, text-gray-100, etc.)
Verification Protocol
After implementing each gap, run `agentv studio` with test data (use `--dry-run-delay 100` to generate runs from examples/) and use agent-browser to screenshot each screen. Compare side-by-side with convex-evals reference screenshots in `research/findings/convex-evals/screenshots/`.
Non-Goals
Related
Summary
Follow-up to #805 (merged in #806). The AgentV Studio scaffold is functional with run list, run detail, eval detail (Steps/Output/Task), category breakdown, eval sidebar, and failure reason display. This issue tracks the remaining features needed for full parity with the convex-evals visualizer.
Research
/home/christso/projects/convex-evals/(visualizer invisualizer/src/)Screenshot Map (target state for each gap)
06-eval-output-tab.png,16-output-code-file.png.tsfile with syntax highlighting13-category-view.png08-experiments-tab.png09-models-tab.png10-experiment-detail.png,12-experiments-sidebar-detail.png05-eval-detail.pngCurrent State (what #806 shipped)
Gap 1 (High): File Tree in Output/Task Tabs
What
Convex-evals Output tab shows a collapsible directory tree alongside Monaco Editor. Users click individual files to view them with syntax highlighting. Currently agentv dumps serialized conversation text into a single Monaco panel.
Where to change
Implementation
Red/Green Gates
Gap 2 (High): Category Drill-Down Page
What
Convex-evals has a dedicated page per category (e.g., `/experiment/no_guidelines/run/abc/Fundamentals`) showing stat cards and eval list scoped to that category. Currently agentv only filters the flat list via category card clicks on the run detail page.
Where to change
Implementation
Red/Green Gates
Gap 3 (Medium): Landing Page Tabs — Experiments & Targets
What
Convex-evals landing page has 3 tabs: Recent Runs (default), Experiments, Models. AgentV has only the run list.
Where to change
Implementation
Red/Green Gates
Gap 4 (Medium): Experiment Detail Page
What
Clicking an experiment in the Experiments tab should show a dedicated page with all runs in that experiment.
Where to change
Implementation
Red/Green Gates
Gap 5 (Medium): Breadcrumb Navigation
What
Convex-evals has a full breadcrumb trail: `Home > Experiment > Run > Category > Eval`. Currently we show simple "Run: X / Eval: Y" text.
Where to change
Implementation
Red/Green Gates
Low Priority (do last or skip)
Step timing badges
Add duration next to pass/fail checkmarks in assertion steps: "✓ Output contains 'Hello' (0.2s)". Check `durationMs` on assertion entries.
Run metadata enrichment
Surface `target`, `experiment`, `eval_set` in run list and run detail headers.
Top navigation bar
Persistent top nav with "AgentV Studio" logo, breadcrumbs, and tab links.
Pagination
"Load more" button or virtual scrolling for large result sets.
Implementation Notes
Verification Protocol
After implementing each gap, run `agentv studio` with test data (use `--dry-run-delay 100` to generate runs from examples/) and use agent-browser to screenshot each screen. Compare side-by-side with convex-evals reference screenshots in `research/findings/convex-evals/screenshots/`.
Non-Goals
Related