Add SWERLVanilluxSandboxEnv#1739
Conversation
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Code Review
This pull request introduces SWERLVanilluxSandboxEnv, a bash-only sandbox RL environment that mirrors the offline mini-swe-agent 'vanillux' solver harness, along with its configuration, prompt templates, and unit tests. Feedback on the implementation suggests several improvements: normalizing path separators to forward slashes in tar creation for Windows compatibility, recreating the backend on reset instead of modifying its private attributes, catching broader exceptions during reward parsing and lock directory cleanup to prevent crashes, and resolving dead code where SandboxOOMError is caught but not yet raised by the backend.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| for root, dirs, files in os.walk(host_dir): | ||
| rel_root = os.path.relpath(root, host_dir) | ||
| for d in dirs: | ||
| rel_dir = os.path.join(rel_root, d) if rel_root != "." else d | ||
| info = tarfile.TarInfo(name=f"{prefix}/{rel_dir}") | ||
| info.type = tarfile.DIRTYPE | ||
| info.mode = 0o755 | ||
| tar.addfile(info) | ||
| for fname in files: | ||
| src_path = os.path.join(root, fname) | ||
| rel_file = os.path.join(rel_root, fname) if rel_root != "." else fname | ||
| tar.add(src_path, arcname=f"{prefix}/{rel_file}") |
There was a problem hiding this comment.
On Windows hosts, os.path.join and os.path.relpath use backslashes (\\) as path separators. When creating a tar archive for Docker/Linux, path separators must be forward slashes (/). Otherwise, the files will be extracted with backslashes in their names instead of being placed in directories. Normalizing the paths to use forward slashes ensures cross-platform compatibility.
| for root, dirs, files in os.walk(host_dir): | |
| rel_root = os.path.relpath(root, host_dir) | |
| for d in dirs: | |
| rel_dir = os.path.join(rel_root, d) if rel_root != "." else d | |
| info = tarfile.TarInfo(name=f"{prefix}/{rel_dir}") | |
| info.type = tarfile.DIRTYPE | |
| info.mode = 0o755 | |
| tar.addfile(info) | |
| for fname in files: | |
| src_path = os.path.join(root, fname) | |
| rel_file = os.path.join(rel_root, fname) if rel_root != "." else fname | |
| tar.add(src_path, arcname=f"{prefix}/{rel_file}") | |
| for root, dirs, files in os.walk(host_dir): | |
| rel_root = os.path.relpath(root, host_dir) | |
| for d in dirs: | |
| rel_dir = os.path.join(rel_root, d) if rel_root != "." else d | |
| normalized_dir = f"{prefix}/{rel_dir}".replace(os.sep, "/") | |
| info = tarfile.TarInfo(name=normalized_dir) | |
| info.type = tarfile.DIRTYPE | |
| info.mode = 0o755 | |
| tar.addfile(info) | |
| for fname in files: | |
| src_path = os.path.join(root, fname) | |
| rel_file = os.path.join(rel_root, fname) if rel_root != "." else fname | |
| normalized_file = f"{prefix}/{rel_file}".replace(os.sep, "/") | |
| tar.add(src_path, arcname=normalized_file) |
| if ( | ||
| self._backend is not None | ||
| and self._backend_type == "docker" | ||
| and getattr(self._backend, "_docker_host", None) != self._backend_kwargs.get("docker_host") | ||
| ): | ||
| self._backend.close() | ||
| record_phase("close") | ||
| self._backend = None | ||
|
|
||
| if self._backend is not None: | ||
| self._backend.close() | ||
| record_phase("close") | ||
| self._backend._image = self._backend_kwargs.get("image", self._backend._image) | ||
| self._backend.start() | ||
| record_phase("start") | ||
| else: | ||
| bkwargs = dict(self._backend_kwargs) | ||
| bkwargs.setdefault("timeout", self._timeout) | ||
| self._backend = create_backend(self._backend_type, **bkwargs) | ||
| record_phase("create_backend") | ||
| self._backend.start() | ||
| record_phase("start") |
There was a problem hiding this comment.
Reusing the backend object and modifying its private/protected attribute _image directly is a code smell and can lead to AttributeError if other backends are introduced in the future. Since close() is called and a new container is started anyway, it is much cleaner and safer to always recreate the backend on reset.
if self._backend is not None:
self._backend.close()
record_phase("close")
self._backend = None
bkwargs = dict(self._backend_kwargs)
bkwargs.setdefault("timeout", self._timeout)
self._backend = create_backend(self._backend_type, **bkwargs)
record_phase("create_backend")
self._backend.start()
record_phase("start")| except (ValueError, TypeError): | ||
| pass |
There was a problem hiding this comment.
The run_command method in DockerBackend can raise RuntimeError or other Docker API exceptions (e.g., if the container has exited or the daemon is unreachable). Catching only ValueError and TypeError will let these exceptions propagate and crash the step. Catching Exception ensures that _parse_reward is fully robust and safely returns 0.0 as intended.
| except (ValueError, TypeError): | |
| pass | |
| except Exception: | |
| pass |
| with contextlib.suppress(FileNotFoundError): | ||
| os.rmdir(lock_dir) |
There was a problem hiding this comment.
On some filesystems or container environments, os.rmdir might raise other OSError exceptions (such as PermissionError). Suppressing OSError instead of just FileNotFoundError is safer and ensures that lock cleanup issues do not crash the environment setup.
| with contextlib.suppress(FileNotFoundError): | |
| os.rmdir(lock_dir) | |
| with contextlib.suppress(OSError): | |
| os.rmdir(lock_dir) |
| except SandboxOOMError as e: | ||
| logger.warning(f"[{self._task_id}] sandbox OOM: {e}") | ||
| return StepResult( | ||
| result=("Sandbox container was killed by the OOM reaper. Ending episode with reward 0."), | ||
| reward=0.0, | ||
| done=True, | ||
| metadata={"oom_killed": True, "task_id": self._task_id}, | ||
| ) |
There was a problem hiding this comment.
The SandboxOOMError exception is caught here, but it is currently never raised by DockerBackend (the only active backend), making this block dead code. Consider updating DockerBackend.run_command to detect OOM-killed containers (e.g., by checking exit code 137 or inspecting container state) and raising SandboxOOMError accordingly.
A self-contained bash-only sandbox RL environment mirroring the offline mini-swe-agent "vanillux" solver harness: a persistent shell (cwd/env preserved across calls), mini-swe-agent head/tail observation truncation, (exit_code=N) suffixes, prompt templates vendored in vanillux_prompts.yaml, and opt-in append_turns_remaining / tool_call_format_error_feedback. Self-contained off main: inlines its own SUBMIT_MARKER / LAST_STEP_WARNING / timing constants (no swerl_sandbox dependency), adds SandboxOOMError to backends, the optional RLEnvironment.get_tool_call_format_error_message hook, and registration. Includes unit tests using fake backends (no Docker). Co-authored-by: Cursor <cursoragent@cursor.com>
ec413fb to
b28883e
Compare
Summary
SWERLVanilluxSandboxEnv(config_name="swerl_vanillux_sandbox"), a self-contained bash-only sandbox RL environment that mirrors the offline mini-swe-agent "vanillux" solver harness:bashcalls via a wrapper script),(exit_code=N)suffixes on observations,vanillux_prompts.yaml,append_turns_remainingandtool_call_format_error_feedback.echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT.Standalone
mainand does not depend onswerl_sandbox.pyor any other sandbox-env work: it inlines its ownSUBMIT_MARKER/LAST_STEP_WARNING/ timing constants.SandboxOOMErrorinbackends.pyand the optionalRLEnvironment.get_tool_call_format_error_messagehook, plusTOOL_REGISTRY/__init__registration.Test plan
open_instruct/test_swerl_vanillux_sandbox.py(12 tests, fake backends — no Docker): registration, bash-only tool surface, opt-in format-error message, instance/prompt rendering, head/tail truncation, submit-marker → verifier, exit-code suffix, and turns-remaining/submit-warning behavior. All pass locally.ruff+ formatting clean.GPU_TESTS=bypass