Summary
Add a configuration option to automatically push eval result artifacts to a git repository after each eval run. The agent creates a PR (not auto-merge) so a human still reviews and merges.
Motivation
Currently eval results live in .agentv/results/runs/ locally and are lost unless manually committed. For reproducibility and historical comparison, results should be automatically pushed to a dedicated repo (e.g., EntityProcess/agentv-evals).
Design
Configuration
In .agentv/config.yaml:
results:
export:
repo: EntityProcess/agentv-evals # GitHub repo path
path: autopilot-dev/runs # Directory within the repo
auto_push: true # Enable auto-push after each run
branch_prefix: eval-results # Branch naming prefix
Clone / Cache Strategy
The target repo is cloned/fetched to ~/.agentv/cache/results-repo/. Subsequent runs reuse this cached clone (fetch-only). A broader data_dir config option can be added later as a separate concern.
Authentication
Assumes gh CLI and git CLI are already authenticated. If not, show a meaningful error message (e.g., Run 'gh auth login' to authenticate). Do not fail the eval run — warn and skip the export step.
Artifacts
The entire runs/<run-id>/ directory is pushed. No filtering.
Workflow
- After
agentv eval run or agentv pipeline completes, if auto_push is enabled:
- Fetch the cached clone of the target repo (or clone if first run)
- Create a branch:
<branch_prefix>/<experiment>-<eval-file>-<timestamp> (e.g., eval-results/autopilot-dev-ad-explore-2026-03-29T01-15-06)
- Copy the entire
runs/<run-id>/ directory to the configured path
- Commit with a structured message including eval summary (pass/fail counts, mean score)
- Push branch and create a draft PR with results summary in the body
- Human reviews and merges the PR
PR Granularity
One PR per run invocation. A single agentv eval run or agentv pipeline execution produces one PR that bundles all evals from that run. The PR body contains a summary table per eval.
PR Format
feat(results): ad-explore claude-cli — 3/3 PASS (1.000)
## Results
| Test | Score | Status |
|---|---|---|
| discovers-existing-implementation | 1.000 | PASS |
| finds-all-consumers | 1.000 | PASS |
| structured-summary | 1.000 | PASS |
Run: 2026-03-29T01-15-06-826Z
Target: claude-cli
Eval: evals/autopilot-dev/ad-explore.eval.yaml
For bundled runs (multiple evals), repeat the results table per eval.
Size Warning
Warn (don't error) if total artifact size exceeds 10MB.
Acceptance Signals
Non-Goals
Related
Summary
Add a configuration option to automatically push eval result artifacts to a git repository after each eval run. The agent creates a PR (not auto-merge) so a human still reviews and merges.
Motivation
Currently eval results live in
.agentv/results/runs/locally and are lost unless manually committed. For reproducibility and historical comparison, results should be automatically pushed to a dedicated repo (e.g.,EntityProcess/agentv-evals).Design
Configuration
In
.agentv/config.yaml:Clone / Cache Strategy
The target repo is cloned/fetched to
~/.agentv/cache/results-repo/. Subsequent runs reuse this cached clone (fetch-only). A broaderdata_dirconfig option can be added later as a separate concern.Authentication
Assumes
ghCLI andgitCLI are already authenticated. If not, show a meaningful error message (e.g.,Run 'gh auth login' to authenticate). Do not fail the eval run — warn and skip the export step.Artifacts
The entire
runs/<run-id>/directory is pushed. No filtering.Workflow
agentv eval runoragentv pipelinecompletes, ifauto_pushis enabled:<branch_prefix>/<experiment>-<eval-file>-<timestamp>(e.g.,eval-results/autopilot-dev-ad-explore-2026-03-29T01-15-06)runs/<run-id>/directory to the configured pathPR Granularity
One PR per run invocation. A single
agentv eval runoragentv pipelineexecution produces one PR that bundles all evals from that run. The PR body contains a summary table per eval.PR Format
For bundled runs (multiple evals), repeat the results table per eval.
Size Warning
Warn (don't error) if total artifact size exceeds 10MB.
Acceptance Signals
.agentv/config.yamlsupportsresults.exportsectionagentv eval runandagentv pipelinecommandsNon-Goals
data_dirfor cache location (separate issue)Related