feat(cli): add HuggingFace dataset import command#986
Merged
Conversation
Add `agentv import huggingface` to import datasets from HuggingFace Hub into AgentV EVAL.yaml format. Supports SWE-bench-style datasets with automatic field mapping (instance_id -> test id, problem_statement -> input, FAIL_TO_PASS -> code-grader assertions, repo -> docker workspace). The command shells out to a Python script via `uv run` (per repo convention for Python scripts). The script uses inline PEP 723 metadata so `uv` auto-installs `datasets` and `pyyaml` dependencies. Closes #978 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying agentv with
|
| Latest commit: |
aa7072a
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://b7f07d5b.agentv.pages.dev |
| Branch Preview URL: | https://feat-978-huggingface-import.agentv.pages.dev |
- Handle missing uv with clear error message - Surface child process stderr on failure - Add PASS_TO_PASS regression test assertion - Wrap datasets import in try/except ImportError - Validate --limit is positive Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace template literal with string literal (no interpolation needed) - Fix execFile formatting to match biome style Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
base_commit is not informational metadata — it's required to reproduce
the evaluation environment. SWE-bench builds Docker images with
`git reset --hard {base_commit}` and resets test files to this commit
before running tests. Place it in workspace.docker where it belongs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
agentv import huggingfaceCLI command to import datasets from HuggingFace Hub into AgentV EVAL.yaml formatscripts/import-huggingface.py) usesdatasetslibrary to load from HuggingFace and converts instances to EVAL.yaml filesinstance_id-> test id,problem_statement-> input,FAIL_TO_PASS-> code-grader assertions,repo-> docker workspace configCloses #978
Test plan
bun run buildpassesbun run test— all 1901 tests passbun run typecheckpassesbun run lintpassesbun run validate:examples— all 55 example eval files validagentv import huggingface --repo SWE-bench/SWE-bench_Verified --split test --limit 2 --output /tmp/test/produces valid EVAL.yaml filesagentv validateRed/Green UAT
Red (before):
agentv import huggingfacecommand does not exist — running it produces an error.Green (after):
🤖 Generated with Claude Code