feat(results): export remote runs and sync Studio#994
Conversation
Deploying agentv with
|
| Latest commit: |
ce01810
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://06352432.agentv.pages.dev |
| Branch Preview URL: | https://feat-826-992-results-export.agentv.pages.dev |
E2E Verification ResultsAll pre-push hooks pass (build, typecheck, lint, test, validate) after the 1. Eval run — no regression, no-op when unconfigured Run completes and writes artifacts; no remote-export errors (no config → silent no-op). ✅ 2. Studio — local runs have {"runs":[{"filename":"2026-04-09T00-16-30-674Z","source":"local","target":"llm-dry-run",...}]}✅ 3. Remote status — sensible response when unconfigured {"configured":false,"available":false,"repo":"","cache_dir":"","run_count":0}✅ 4. Results export command ✅ 5. All 1953 tests pass (core: 1461, eval: 67, cli: 425) ✅ Code Review Findings & FixesOne fix applied before marking ready: Fixed: Not a bug: |
- Show the 'not configured' subtitle only in All Sources mode, not when user has explicitly selected Local Only or Remote Only - Dim the Remote Only button when no remote repo is configured, with tooltip - Show actionable empty state when Remote Only is selected with no config: directs users to add results.export to .agentv/config.yaml - Accept emptyMessage prop on RunList for context-sensitive empty states
- Document results.export configuration for auto-push to remote git repo - Add Remote Results section covering config, auth, and Studio sync UI - Replace studio-runs.png with updated screenshot showing source toolbar - Add studio-run-detail.png showing run detail with source label and scores - Add studio-experiments.png showing experiment comparison view - Update Features list to include remote results and source badges - Images optimized with pngquant + optipng (60-65% size reduction)
…show date-only timestamp with full on hover
- RunList: status dot (✓/✗) as first column; human run name (target·experiment); coloured pass-rate pill (green/amber/red); drop Score bar + separate Target/ Experiment/Mean Score columns; compact 6-column layout - RunDetail: Category Breakdown replaced with clean table showing pass-rate pill, passed, failed, total per category; status dot on each eval row; ERR badge instead of 0% score on execution errors; remove unused StatusBadge
- Add optional `category` top-level field to eval YAML (overrides directory-derived category); backwards-compatible — existing evals keep directory-based derivation - Regenerate eval-schema.json to include the new field - Update studio-runs.png and studio-run-detail.png with new UI showing ✓/✗ status dots, coloured pass-rate pills, human run names, and Category Breakdown table
…un list - /api/config now returns project_name (basename of cwd) so Studio shows it as a muted subtitle under the Evaluation Runs heading - Remove per-row Source column from RunList — source is a project-level concern already covered by the All Sources / Local Only / Remote Only toolbar - Update screenshots
…assed/Failed/Total to run list - StatsCards: revert pass rate to cyan, passed to emerald, failed to red - Status dots: restore emerald-400 for pass (only pills use blue gradient) - RunList: add Passed/Failed/Total columns, derived from pass_rate × test_count
… in Experiments; date-only Last Run - Extract PassRatePill to shared component (RunList, RunDetail, suite route, ExperimentsTab) - All score/pass-rate displays now use the gradient blue pill (text inside) - ExperimentsTab: add Evals column (passed/total), date-only Last Run with hover tooltip - Restore semantic colors for stats bar and status dots (only pills are blue)
…shots with emoji tabs
…ocale date for older
Closes #826
Closes #992
Summary
results.exportconfig support and shared cached git repo utilities for draft-PR artifact exportVerification
bun test packages/core/test/evaluation/loaders/config-loader.test.ts packages/core/test/evaluation/validation/config-validator.test.ts apps/cli/test/commands/results/serve.test.tsbunx tsc -p apps/studio/tsconfig.json --noEmitgit diff --checkNotes
bunx tsc -p apps/cli/tsconfig.json --noEmitis currently noisy due existing@agentv/coredeclaration-resolution issues in the repo, so it was not a useful signal for this change.