Skip to content

Add SWERLVanilluxSandboxEnv#1739

Open
hamishivi wants to merge 1 commit into
mainfrom
hamishivi/swerl-vanillux-env
Open

Add SWERLVanilluxSandboxEnv#1739
hamishivi wants to merge 1 commit into
mainfrom
hamishivi/swerl-vanillux-env

Conversation

@hamishivi

@hamishivi hamishivi commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add SWERLVanilluxSandboxEnv (config_name="swerl_vanillux_sandbox"), a self-contained bash-only sandbox RL environment that mirrors the offline mini-swe-agent "vanillux" solver harness:
    • a persistent shell (cwd/env preserved across bash calls via a wrapper script),
    • mini-swe-agent head/tail observation truncation,
    • (exit_code=N) suffixes on observations,
    • prompt templates vendored from mini-swe-agent v2.2.x in vanillux_prompts.yaml,
    • opt-in append_turns_remaining and tool_call_format_error_feedback.
  • Submit-based evaluation via echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT.

Standalone

  • Branches off main and does not depend on swerl_sandbox.py or any other sandbox-env work: it inlines its own SUBMIT_MARKER / LAST_STEP_WARNING / timing constants.
  • Adds the only two small shared pieces it needs: SandboxOOMError in backends.py and the optional RLEnvironment.get_tool_call_format_error_message hook, plus TOOL_REGISTRY / __init__ registration.
  • Runs on the existing Docker backend.

Test plan

  • New open_instruct/test_swerl_vanillux_sandbox.py (12 tests, fake backends — no Docker): registration, bash-only tool surface, opt-in format-error message, instance/prompt rendering, head/tail truncation, submit-marker → verifier, exit-code suffix, and turns-remaining/submit-warning behavior. All pass locally.
  • ruff + formatting clean.

GPU_TESTS=bypass

hamishivi added a commit that referenced this pull request Jun 25, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces SWERLVanilluxSandboxEnv, a bash-only sandbox RL environment that mirrors the offline mini-swe-agent 'vanillux' solver harness, along with its configuration, prompt templates, and unit tests. Feedback on the implementation suggests several improvements: normalizing path separators to forward slashes in tar creation for Windows compatibility, recreating the backend on reset instead of modifying its private attributes, catching broader exceptions during reward parsing and lock directory cleanup to prevent crashes, and resolving dead code where SandboxOOMError is caught but not yet raised by the backend.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +383 to +394
for root, dirs, files in os.walk(host_dir):
rel_root = os.path.relpath(root, host_dir)
for d in dirs:
rel_dir = os.path.join(rel_root, d) if rel_root != "." else d
info = tarfile.TarInfo(name=f"{prefix}/{rel_dir}")
info.type = tarfile.DIRTYPE
info.mode = 0o755
tar.addfile(info)
for fname in files:
src_path = os.path.join(root, fname)
rel_file = os.path.join(rel_root, fname) if rel_root != "." else fname
tar.add(src_path, arcname=f"{prefix}/{rel_file}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

On Windows hosts, os.path.join and os.path.relpath use backslashes (\\) as path separators. When creating a tar archive for Docker/Linux, path separators must be forward slashes (/). Otherwise, the files will be extracted with backslashes in their names instead of being placed in directories. Normalizing the paths to use forward slashes ensures cross-platform compatibility.

Suggested change
for root, dirs, files in os.walk(host_dir):
rel_root = os.path.relpath(root, host_dir)
for d in dirs:
rel_dir = os.path.join(rel_root, d) if rel_root != "." else d
info = tarfile.TarInfo(name=f"{prefix}/{rel_dir}")
info.type = tarfile.DIRTYPE
info.mode = 0o755
tar.addfile(info)
for fname in files:
src_path = os.path.join(root, fname)
rel_file = os.path.join(rel_root, fname) if rel_root != "." else fname
tar.add(src_path, arcname=f"{prefix}/{rel_file}")
for root, dirs, files in os.walk(host_dir):
rel_root = os.path.relpath(root, host_dir)
for d in dirs:
rel_dir = os.path.join(rel_root, d) if rel_root != "." else d
normalized_dir = f"{prefix}/{rel_dir}".replace(os.sep, "/")
info = tarfile.TarInfo(name=normalized_dir)
info.type = tarfile.DIRTYPE
info.mode = 0o755
tar.addfile(info)
for fname in files:
src_path = os.path.join(root, fname)
rel_file = os.path.join(rel_root, fname) if rel_root != "." else fname
normalized_file = f"{prefix}/{rel_file}".replace(os.sep, "/")
tar.add(src_path, arcname=normalized_file)

Comment on lines +295 to +316
if (
self._backend is not None
and self._backend_type == "docker"
and getattr(self._backend, "_docker_host", None) != self._backend_kwargs.get("docker_host")
):
self._backend.close()
record_phase("close")
self._backend = None

if self._backend is not None:
self._backend.close()
record_phase("close")
self._backend._image = self._backend_kwargs.get("image", self._backend._image)
self._backend.start()
record_phase("start")
else:
bkwargs = dict(self._backend_kwargs)
bkwargs.setdefault("timeout", self._timeout)
self._backend = create_backend(self._backend_type, **bkwargs)
record_phase("create_backend")
self._backend.start()
record_phase("start")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Reusing the backend object and modifying its private/protected attribute _image directly is a code smell and can lead to AttributeError if other backends are introduced in the future. Since close() is called and a new container is started anyway, it is much cleaner and safer to always recreate the backend on reset.

        if self._backend is not None:
            self._backend.close()
            record_phase("close")
            self._backend = None

        bkwargs = dict(self._backend_kwargs)
        bkwargs.setdefault("timeout", self._timeout)
        self._backend = create_backend(self._backend_type, **bkwargs)
        record_phase("create_backend")
        self._backend.start()
        record_phase("start")

Comment on lines +525 to +526
except (ValueError, TypeError):
pass

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The run_command method in DockerBackend can raise RuntimeError or other Docker API exceptions (e.g., if the container has exited or the daemon is unreachable). Catching only ValueError and TypeError will let these exceptions propagate and crash the step. Catching Exception ensures that _parse_reward is fully robust and safely returns 0.0 as intended.

Suggested change
except (ValueError, TypeError):
pass
except Exception:
pass

Comment on lines +195 to +196
with contextlib.suppress(FileNotFoundError):
os.rmdir(lock_dir)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

On some filesystems or container environments, os.rmdir might raise other OSError exceptions (such as PermissionError). Suppressing OSError instead of just FileNotFoundError is safer and ensures that lock cleanup issues do not crash the environment setup.

Suggested change
with contextlib.suppress(FileNotFoundError):
os.rmdir(lock_dir)
with contextlib.suppress(OSError):
os.rmdir(lock_dir)

Comment on lines +420 to +427
except SandboxOOMError as e:
logger.warning(f"[{self._task_id}] sandbox OOM: {e}")
return StepResult(
result=("Sandbox container was killed by the OOM reaper. Ending episode with reward 0."),
reward=0.0,
done=True,
metadata={"oom_killed": True, "task_id": self._task_id},
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The SandboxOOMError exception is caught here, but it is currently never raised by DockerBackend (the only active backend), making this block dead code. Consider updating DockerBackend.run_command to detect OOM-killed containers (e.g., by checking exit code 137 or inspecting container state) and raising SandboxOOMError accordingly.

A self-contained bash-only sandbox RL environment mirroring the offline
mini-swe-agent "vanillux" solver harness: a persistent shell (cwd/env preserved
across calls), mini-swe-agent head/tail observation truncation, (exit_code=N)
suffixes, prompt templates vendored in vanillux_prompts.yaml, and opt-in
append_turns_remaining / tool_call_format_error_feedback.

Self-contained off main: inlines its own SUBMIT_MARKER / LAST_STEP_WARNING /
timing constants (no swerl_sandbox dependency), adds SandboxOOMError to backends,
the optional RLEnvironment.get_tool_call_format_error_message hook, and
registration. Includes unit tests using fake backends (no Docker).

Co-authored-by: Cursor <cursoragent@cursor.com>
@hamishivi hamishivi force-pushed the hamishivi/swerl-vanillux-env branch from ec413fb to b28883e Compare June 25, 2026 16:57
@hamishivi hamishivi changed the base branch from swerl-sandbox-env to main June 25, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant