feat: auto-push eval results to configurable git repo

## Summary

Add a configuration option to automatically push eval result artifacts to a git repository after each eval run. The agent creates a PR (not auto-merge) so a human still reviews and merges.

## Motivation

Currently eval results live in `.agentv/results/runs/` locally and are lost unless manually committed. For reproducibility and historical comparison, results should be automatically pushed to a dedicated repo (e.g., `EntityProcess/agentv-evals`).

## Design

### Configuration

In `.agentv/config.yaml`:

```yaml
results:
  export:
    repo: EntityProcess/agentv-evals     # GitHub repo path
    path: autopilot-dev/runs             # Directory within the repo
    auto_push: true                      # Enable auto-push after each run
    branch_prefix: eval-results          # Branch naming prefix
```

### Clone / Cache Strategy

The target repo is cloned/fetched to `~/.agentv/cache/results-repo/`. Subsequent runs reuse this cached clone (fetch-only). A broader `data_dir` config option can be added later as a separate concern.

### Authentication

Assumes `gh` CLI and `git` CLI are already authenticated. If not, show a meaningful error message (e.g., `Run 'gh auth login' to authenticate`). Do not fail the eval run — warn and skip the export step.

### Artifacts

The entire `runs/<run-id>/` directory is pushed. No filtering.

### Workflow

1. After `agentv eval run` or `agentv pipeline` completes, if `auto_push` is enabled:
2. Fetch the cached clone of the target repo (or clone if first run)
3. Create a branch: `<branch_prefix>/<experiment>-<eval-file>-<timestamp>` (e.g., `eval-results/autopilot-dev-ad-explore-2026-03-29T01-15-06`)
4. Copy the entire `runs/<run-id>/` directory to the configured path
5. Commit with a structured message including eval summary (pass/fail counts, mean score)
6. Push branch and create a draft PR with results summary in the body
7. Human reviews and merges the PR

### PR Granularity

**One PR per run invocation.** A single `agentv eval run` or `agentv pipeline` execution produces one PR that bundles all evals from that run. The PR body contains a summary table per eval.

### PR Format

```
feat(results): ad-explore claude-cli — 3/3 PASS (1.000)

## Results
| Test | Score | Status |
|---|---|---|
| discovers-existing-implementation | 1.000 | PASS |
| finds-all-consumers | 1.000 | PASS |
| structured-summary | 1.000 | PASS |

Run: 2026-03-29T01-15-06-826Z
Target: claude-cli
Eval: evals/autopilot-dev/ad-explore.eval.yaml
```

For bundled runs (multiple evals), repeat the results table per eval.

### Size Warning

Warn (don't error) if total artifact size exceeds 10MB.

## Acceptance Signals

- [ ] `.agentv/config.yaml` supports `results.export` section
- [ ] After eval run, artifacts are pushed to configured repo as a draft PR
- [ ] One PR per run invocation (bundles all evals in that run)
- [ ] PR includes structured results summary
- [ ] Human must merge — no auto-merge
- [ ] Works with `agentv eval run` and `agentv pipeline` commands
- [ ] Graceful fallback if repo is not accessible or auth fails (warning, not error)
- [ ] Warn if artifact size exceeds 10MB

## Non-Goals

- Auto-merging PRs (human review required)
- Real-time streaming of results
- Dashboard integration (separate concern — EntityProcess/agentv#563)
- Configurable `data_dir` for cache location (separate issue)

## Related

- EntityProcess/agentv-evals — current manual results repo
- EntityProcess/agentv#563 — Studio eval management platform
- EntityProcess/agentv#801 — artifact structure standardization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-push eval results to configurable git repo #826

Summary

Motivation

Design

Configuration

Clone / Cache Strategy

Authentication

Artifacts

Workflow

PR Granularity

PR Format

Size Warning

Acceptance Signals

Non-Goals

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: auto-push eval results to configurable git repo #826

Description

Summary

Motivation

Design

Configuration

Clone / Cache Strategy

Authentication

Artifacts

Workflow

PR Granularity

PR Format

Size Warning

Acceptance Signals

Non-Goals

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions