Add SWERLVanilluxSandboxEnv by hamishivi · Pull Request #1739 · allenai/open-instruct

hamishivi · 2026-06-25T16:48:08Z

Summary

Add SWERLVanilluxSandboxEnv (config_name="swerl_vanillux_sandbox"), a self-contained bash-only sandbox RL environment that mirrors the offline mini-swe-agent "vanillux" solver harness:
- a persistent shell (cwd/env preserved across bash calls via a wrapper script),
- mini-swe-agent head/tail observation truncation,
- (exit_code=N) suffixes on observations,
- prompt templates vendored from mini-swe-agent v2.2.x in vanillux_prompts.yaml,
- opt-in append_turns_remaining and tool_call_format_error_feedback.
Submit-based evaluation via echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT.

Standalone

Branches off main and does not depend on swerl_sandbox.py or any other sandbox-env work: it inlines its own SUBMIT_MARKER / LAST_STEP_WARNING / timing constants.
Adds the only two small shared pieces it needs: SandboxOOMError in backends.py and the optional RLEnvironment.get_tool_call_format_error_message hook, plus TOOL_REGISTRY / __init__ registration.
Runs on the existing Docker backend.

Test plan

New open_instruct/test_swerl_vanillux_sandbox.py (12 tests, fake backends — no Docker): registration, bash-only tool surface, opt-in format-error message, instance/prompt rendering, head/tail truncation, submit-marker → verifier, exit-code suffix, and turns-remaining/submit-warning behavior. All pass locally.
ruff + formatting clean.

GPU_TESTS=bypass

Co-authored-by: Cursor <cursoragent@cursor.com>

gemini-code-assist

Code Review

This pull request introduces SWERLVanilluxSandboxEnv, a bash-only sandbox RL environment that mirrors the offline mini-swe-agent 'vanillux' solver harness, along with its configuration, prompt templates, and unit tests. Feedback on the implementation suggests several improvements: normalizing path separators to forward slashes in tar creation for Windows compatibility, recreating the backend on reset instead of modifying its private attributes, catching broader exceptions during reward parsing and lock directory cleanup to prevent crashes, and resolving dead code where SandboxOOMError is caught but not yet raised by the backend.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-25T16:50:02Z

+            for root, dirs, files in os.walk(host_dir):
+                rel_root = os.path.relpath(root, host_dir)
+                for d in dirs:
+                    rel_dir = os.path.join(rel_root, d) if rel_root != "." else d
+                    info = tarfile.TarInfo(name=f"{prefix}/{rel_dir}")
+                    info.type = tarfile.DIRTYPE
+                    info.mode = 0o755
+                    tar.addfile(info)
+                for fname in files:
+                    src_path = os.path.join(root, fname)
+                    rel_file = os.path.join(rel_root, fname) if rel_root != "." else fname
+                    tar.add(src_path, arcname=f"{prefix}/{rel_file}")


On Windows hosts, os.path.join and os.path.relpath use backslashes (\\) as path separators. When creating a tar archive for Docker/Linux, path separators must be forward slashes (/). Otherwise, the files will be extracted with backslashes in their names instead of being placed in directories. Normalizing the paths to use forward slashes ensures cross-platform compatibility.

Suggested change

for root, dirs, files in os.walk(host_dir):

rel_root = os.path.relpath(root, host_dir)

for d in dirs:

rel_dir = os.path.join(rel_root, d) if rel_root != "." else d

info = tarfile.TarInfo(name=f"{prefix}/{rel_dir}")

info.type = tarfile.DIRTYPE

info.mode = 0o755

tar.addfile(info)

for fname in files:

src_path = os.path.join(root, fname)

rel_file = os.path.join(rel_root, fname) if rel_root != "." else fname

tar.add(src_path, arcname=f"{prefix}/{rel_file}")

for root, dirs, files in os.walk(host_dir):

rel_root = os.path.relpath(root, host_dir)

for d in dirs:

rel_dir = os.path.join(rel_root, d) if rel_root != "." else d

normalized_dir = f"{prefix}/{rel_dir}".replace(os.sep, "/")

info = tarfile.TarInfo(name=normalized_dir)

info.type = tarfile.DIRTYPE

info.mode = 0o755

tar.addfile(info)

for fname in files:

src_path = os.path.join(root, fname)

rel_file = os.path.join(rel_root, fname) if rel_root != "." else fname

normalized_file = f"{prefix}/{rel_file}".replace(os.sep, "/")

tar.add(src_path, arcname=normalized_file)

gemini-code-assist · 2026-06-25T16:50:03Z

+        if (
+            self._backend is not None
+            and self._backend_type == "docker"
+            and getattr(self._backend, "_docker_host", None) != self._backend_kwargs.get("docker_host")
+        ):
+            self._backend.close()
+            record_phase("close")
+            self._backend = None
+
+        if self._backend is not None:
+            self._backend.close()
+            record_phase("close")
+            self._backend._image = self._backend_kwargs.get("image", self._backend._image)
+            self._backend.start()
+            record_phase("start")
+        else:
+            bkwargs = dict(self._backend_kwargs)
+            bkwargs.setdefault("timeout", self._timeout)
+            self._backend = create_backend(self._backend_type, **bkwargs)
+            record_phase("create_backend")
+            self._backend.start()
+            record_phase("start")


Reusing the backend object and modifying its private/protected attribute _image directly is a code smell and can lead to AttributeError if other backends are introduced in the future. Since close() is called and a new container is started anyway, it is much cleaner and safer to always recreate the backend on reset.

if self._backend is not None: self._backend.close() record_phase("close") self._backend = None bkwargs = dict(self._backend_kwargs) bkwargs.setdefault("timeout", self._timeout) self._backend = create_backend(self._backend_type, **bkwargs) record_phase("create_backend") self._backend.start() record_phase("start")

gemini-code-assist · 2026-06-25T16:50:03Z

+        except (ValueError, TypeError):
+            pass


The run_command method in DockerBackend can raise RuntimeError or other Docker API exceptions (e.g., if the container has exited or the daemon is unreachable). Catching only ValueError and TypeError will let these exceptions propagate and crash the step. Catching Exception ensures that _parse_reward is fully robust and safely returns 0.0 as intended.

Suggested change

except (ValueError, TypeError):

pass

except Exception:

pass

gemini-code-assist · 2026-06-25T16:50:03Z

+                    with contextlib.suppress(FileNotFoundError):
+                        os.rmdir(lock_dir)


On some filesystems or container environments, os.rmdir might raise other OSError exceptions (such as PermissionError). Suppressing OSError instead of just FileNotFoundError is safer and ensures that lock cleanup issues do not crash the environment setup.

Suggested change

with contextlib.suppress(FileNotFoundError):

os.rmdir(lock_dir)

with contextlib.suppress(OSError):

os.rmdir(lock_dir)

gemini-code-assist · 2026-06-25T16:50:03Z

+            except SandboxOOMError as e:
+                logger.warning(f"[{self._task_id}] sandbox OOM: {e}")
+                return StepResult(
+                    result=("Sandbox container was killed by the OOM reaper. Ending episode with reward 0."),
+                    reward=0.0,
+                    done=True,
+                    metadata={"oom_killed": True, "task_id": self._task_id},
+                )


The SandboxOOMError exception is caught here, but it is currently never raised by DockerBackend (the only active backend), making this block dead code. Consider updating DockerBackend.run_command to detect OOM-killed containers (e.g., by checking exit code 137 or inspecting container state) and raising SandboxOOMError accordingly.

A self-contained bash-only sandbox RL environment mirroring the offline mini-swe-agent "vanillux" solver harness: a persistent shell (cwd/env preserved across calls), mini-swe-agent head/tail observation truncation, (exit_code=N) suffixes, prompt templates vendored in vanillux_prompts.yaml, and opt-in append_turns_remaining / tool_call_format_error_feedback. Self-contained off main: inlines its own SUBMIT_MARKER / LAST_STEP_WARNING / timing constants (no swerl_sandbox dependency), adds SandboxOOMError to backends, the optional RLEnvironment.get_tool_call_format_error_message hook, and registration. Includes unit tests using fake backends (no Docker). Co-authored-by: Cursor <cursoragent@cursor.com>

hamishivi added a commit that referenced this pull request Jun 25, 2026

Add CHANGELOG entry for SWERLVanilluxSandboxEnv (#1739)

ec413fb

Co-authored-by: Cursor <cursoragent@cursor.com>

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

hamishivi mentioned this pull request Jun 25, 2026

Add SWERLSandboxEnv for per-sample Docker tasks with submit-based evaluation #1492

Closed

4 tasks

hamishivi force-pushed the hamishivi/swerl-vanillux-env branch from ec413fb to b28883e Compare June 25, 2026 16:57

hamishivi changed the base branch from swerl-sandbox-env to main June 25, 2026 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SWERLVanilluxSandboxEnv#1739

Add SWERLVanilluxSandboxEnv#1739
hamishivi wants to merge 1 commit into
mainfrom
hamishivi/swerl-vanillux-env

hamishivi commented Jun 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		with contextlib.suppress(FileNotFoundError):
		os.rmdir(lock_dir)

Uh oh!

Conversation

hamishivi commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Standalone

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hamishivi commented Jun 25, 2026 •

edited

Loading