Objective
Add an export adapter that converts agentv results to the SWE-bench submission format, enabling participation in the SWE-bench leaderboard.
Motivation
AgentV's Docker workspace feature (#971) makes it capable of running SWE-bench evaluations. However, to submit results to the official SWE-bench leaderboard, results must be in their specific format (all_preds.jsonl + metadata.yaml + trajs/). A built-in exporter bridges this gap.
Design
Following "Lightweight Core, Plugin Extensibility" — this should be a CLI command or export adapter, not a core feature.
Proposed interface
agentv export --format swe-bench <results-dir> --output <submission-dir>
Output structure
submission/
├── all_preds.jsonl # {instance_id, model_name_or_path, model_patch}
├── metadata.yaml # Agent scaffold description
├── README.md # Auto-generated from config
├── trajs/ # Agent trajectories
│ └── <instance_id>.json # From agentv trace data
└── results.json # Converted from index.jsonl
Mapping (agentv → SWE-bench)
| AgentV Field |
SWE-bench Field |
test_id |
instance_id |
unified_diff / output |
model_patch |
target |
model_name_or_path |
trace_summary |
trajs/<instance_id>.json |
score, scores[] |
results.json |
Acceptance Criteria
Non-goals
- Auto-submitting to leaderboard (just generate the format)
Objective
Add an export adapter that converts agentv results to the SWE-bench submission format, enabling participation in the SWE-bench leaderboard.
Motivation
AgentV's Docker workspace feature (#971) makes it capable of running SWE-bench evaluations. However, to submit results to the official SWE-bench leaderboard, results must be in their specific format (
all_preds.jsonl+metadata.yaml+trajs/). A built-in exporter bridges this gap.Design
Following "Lightweight Core, Plugin Extensibility" — this should be a CLI command or export adapter, not a core feature.
Proposed interface
Output structure
Mapping (agentv → SWE-bench)
test_idinstance_idunified_diff/ outputmodel_patchtargetmodel_name_or_pathtrace_summarytrajs/<instance_id>.jsonscore,scores[]results.jsonAcceptance Criteria
agentv export --format swe-benchproduces valid SWE-bench submission directoryall_preds.jsonlwith correct instance_id → model_patch mappingNon-goals