Skip to content

feat(results): export remote runs and sync Studio#994

Merged
christso merged 34 commits intomainfrom
feat/826-992-results-export-remote-studio
Apr 9, 2026
Merged

feat(results): export remote runs and sync Studio#994
christso merged 34 commits intomainfrom
feat/826-992-results-export-remote-studio

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 9, 2026

Closes #826
Closes #992

Summary

  • add results.export config support and shared cached git repo utilities for draft-PR artifact export
  • auto-export eval and pipeline bench artifacts to a configured remote repo with graceful warnings on auth/access failures
  • extend Studio backend and UI to merge local/remote runs, tag source, expose sync/status endpoints, and support source filtering with local-only fallback

Verification

  • bun test packages/core/test/evaluation/loaders/config-loader.test.ts packages/core/test/evaluation/validation/config-validator.test.ts apps/cli/test/commands/results/serve.test.ts
  • bunx tsc -p apps/studio/tsconfig.json --noEmit
  • git diff --check

Notes

  • bunx tsc -p apps/cli/tsconfig.json --noEmit is currently noisy due existing @agentv/core declaration-resolution issues in the repo, so it was not a useful signal for this change.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 9, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: ce01810
Status: ✅  Deploy successful!
Preview URL: https://06352432.agentv.pages.dev
Branch Preview URL: https://feat-826-992-results-export.agentv.pages.dev

View logs

@christso
Copy link
Copy Markdown
Collaborator Author

christso commented Apr 9, 2026

E2E Verification Results

All pre-push hooks pass (build, typecheck, lint, test, validate) after the DEFAULT_THRESHOLD fix.

1. Eval run — no regression, no-op when unconfigured

$ bun apps/cli/src/cli.ts eval examples/features/basic/evals/dataset.eval.yaml --dry-run --test-id code-review-javascript
Artifact directory: .agentv/results/runs/default/2026-04-09T00-16-30-674Z
Results written to: .agentv/results/runs/default/2026-04-09T00-16-30-674Z/index.jsonl

Run completes and writes artifacts; no remote-export errors (no config → silent no-op). ✅

2. Studio — local runs have source: 'local'

{"runs":[{"filename":"2026-04-09T00-16-30-674Z","source":"local","target":"llm-dry-run",...}]}

3. Remote status — sensible response when unconfigured

{"configured":false,"available":false,"repo":"","cache_dir":"","run_count":0}

4. Results export command

Exported 1 test(s) to /tmp/export
  code-review-javascript

5. All 1953 tests pass (core: 1461, eval: 67, cli: 425) ✅


Code Review Findings & Fixes

One fix applied before marking ready:

Fixed: pipeline/bench.ts was hardcoding 0.8 for pass/fail in the export summary instead of importing and using DEFAULT_THRESHOLD. Fixed in commit 7221f6a9.

Not a bug: pipeline/run.ts intentionally does not auto-export — it's a pre-grading step that writes response.md files, not final index.jsonl results. Auto-export correctly happens at pipeline bench which produces the final JSONL.

@christso christso marked this pull request as ready for review April 9, 2026 00:26
christso added 23 commits April 9, 2026 00:39
- Show the 'not configured' subtitle only in All Sources mode, not when
  user has explicitly selected Local Only or Remote Only
- Dim the Remote Only button when no remote repo is configured, with tooltip
- Show actionable empty state when Remote Only is selected with no config:
  directs users to add results.export to .agentv/config.yaml
- Accept emptyMessage prop on RunList for context-sensitive empty states
- Document results.export configuration for auto-push to remote git repo
- Add Remote Results section covering config, auth, and Studio sync UI
- Replace studio-runs.png with updated screenshot showing source toolbar
- Add studio-run-detail.png showing run detail with source label and scores
- Add studio-experiments.png showing experiment comparison view
- Update Features list to include remote results and source badges
- Images optimized with pngquant + optipng (60-65% size reduction)
- RunList: status dot (✓/✗) as first column; human run name (target·experiment);
  coloured pass-rate pill (green/amber/red); drop Score bar + separate Target/
  Experiment/Mean Score columns; compact 6-column layout
- RunDetail: Category Breakdown replaced with clean table showing pass-rate pill,
  passed, failed, total per category; status dot on each eval row; ERR badge
  instead of 0% score on execution errors; remove unused StatusBadge
- Add optional `category` top-level field to eval YAML (overrides directory-derived
  category); backwards-compatible — existing evals keep directory-based derivation
- Regenerate eval-schema.json to include the new field
- Update studio-runs.png and studio-run-detail.png with new UI showing ✓/✗ status
  dots, coloured pass-rate pills, human run names, and Category Breakdown table
…un list

- /api/config now returns project_name (basename of cwd) so Studio shows it
  as a muted subtitle under the Evaluation Runs heading
- Remove per-row Source column from RunList — source is a project-level
  concern already covered by the All Sources / Local Only / Remote Only toolbar
- Update screenshots
christso added 7 commits April 9, 2026 02:28
…assed/Failed/Total to run list

- StatsCards: revert pass rate to cyan, passed to emerald, failed to red
- Status dots: restore emerald-400 for pass (only pills use blue gradient)
- RunList: add Passed/Failed/Total columns, derived from pass_rate × test_count
… in Experiments; date-only Last Run

- Extract PassRatePill to shared component (RunList, RunDetail, suite route, ExperimentsTab)
- All score/pass-rate displays now use the gradient blue pill (text inside)
- ExperimentsTab: add Evals column (passed/total), date-only Last Run with hover tooltip
- Restore semantic colors for stats bar and status dots (only pills are blue)
@christso christso merged commit 70e8df1 into main Apr 9, 2026
4 checks passed
@christso christso deleted the feat/826-992-results-export-remote-studio branch April 9, 2026 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(studio): browse eval results from remote git repo feat: auto-push eval results to configurable git repo

1 participant