Objective
The docker-workspace.ts module needs to read base_commit from the eval YAML workspace config and run git reset --hard {base_commit} inside the container before the agent starts work.
Context
PR #986 (agentv import huggingface) now places base_commit in workspace.docker:
workspace:
docker:
image: swebench/sweb.eval.django__django:latest
timeout: 600
memory: 4g
base_commit: 4fd3044ca0135da903a70dfb66992293f529ecf1
But docker-workspace.ts (from PR #971) doesn't read this field. Without it, the container repo state is whatever the Docker image was built with, which may not match the specific commit the SWE-bench instance requires.
SWE-bench behavior
SWE-bench's harness does:
- Image build time:
git reset --hard {base_commit} (baked into the image)
- Eval time: Reset test files to
base_commit before running tests: git checkout {base_commit} {modified_files}
For pre-built SWE-bench Docker images, the commit is already baked in. But for custom or rebuilt images, the workspace must ensure the correct checkout.
Implementation
- Add
base_commit to the Docker workspace schema in packages/core/src/evaluation/validation/eval-file.schema.ts
- In
docker-workspace.ts, after container start, if base_commit is set:
- Run
git reset --hard {base_commit} in the container
- Verify the checkout succeeded
- Add tests
Acceptance criteria
Objective
The
docker-workspace.tsmodule needs to readbase_commitfrom the eval YAML workspace config and rungit reset --hard {base_commit}inside the container before the agent starts work.Context
PR #986 (
agentv import huggingface) now placesbase_commitinworkspace.docker:But
docker-workspace.ts(from PR #971) doesn't read this field. Without it, the container repo state is whatever the Docker image was built with, which may not match the specific commit the SWE-bench instance requires.SWE-bench behavior
SWE-bench's harness does:
git reset --hard {base_commit}(baked into the image)base_commitbefore running tests:git checkout {base_commit} {modified_files}For pre-built SWE-bench Docker images, the commit is already baked in. But for custom or rebuilt images, the workspace must ensure the correct checkout.
Implementation
base_committo the Docker workspace schema inpackages/core/src/evaluation/validation/eval-file.schema.tsdocker-workspace.ts, after container start, ifbase_commitis set:git reset --hard {base_commit}in the containerAcceptance criteria
workspace.docker.base_commitis recognized in eval YAMLbase_commitbefore agent work beginsagentv validatewith the new field