RLMEnv: Simplify constructor and internals by snimu · Pull Request #966 · PrimeIntellect-ai/verifiers

snimu · 2026-02-27T14:23:24Z

Description

Remove 14 unused/niche constructor args that were silently swallowed via **kwargs or had no
remaining use case (interception_host, interception_port, interception_url, execution_backend,
context_key, sandbox_start_command, sandbox_client_max_workers, root_tool_serialization,
stagger/jitter params, etc.)
Remove _InterceptionPool singleton and all shared-pool branching — each RLMEnv instance now owns
its own interception server and tunnel (this undoes a recent change by myself which was poorly motivated and thought through)
Add explicit max_turns: int = 50 constructor param (previously inherited a default of 10 from
StatefulToolEnv, easily lost via **kwargs)
Rename sub_tool_max_turns → sub_llm_max_turns for consistency with max_sub_llm_parallelism and the
sub_llm_* metric names
Hardcode interception_port=0 (OS-assigned) and bind_host="127.0.0.1" — the old configurability only
mattered for the now-removed pool
Update docs and docstring to remove outdated claims

Also adds the sub_llm_max_completion_tokens arg to control the maximum number of completion tokens across all sub-LLM calls. max_tokens as set by prime-rl already controlled the per-sub-LLM-call number of tokens, but this now controls the total token budget. The RLM is told this budget in the system prompt and the number of used up completion tokens in each llm_batch call. The enforcement isn't perfect due to the parallelism of sub-LLM calls, but it works fairly well. When the budget is reached, sub-LLMs may make no further calls and all calls to llm_batch fail, but the RLM can still perform its work in other ways.

Note: requires small changes to the -rlm environments.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Note

Medium Risk
Medium risk because it makes breaking constructor/API changes and alters interception/tunnel lifecycle and sub-LLM execution behavior (timeouts, batching, and new budget-based early exits). Main failure modes are misconfigured integrations and unexpected skipped llm_batch calls under parallelism.

Overview
Simplifies RLMEnv’s public API and internals by removing a large set of constructor knobs (e.g., interception host/port/url config, context key overrides, stagger/jitter, sandbox client sizing, deprecated backend params) and introducing an explicit max_turns parameter (replacing the prior max_iterations pass-through).

Changes interception behavior by deleting the _InterceptionPool singleton and shared-pool code paths; each RLMEnv instance now starts and owns its own interception server/tunnel, with interception binding/port selection effectively hardcoded (localhost + OS-assigned port).

Adds a new rollout-wide sub-LLM completion-token budget via sub_llm_max_completion_tokens, enforced both before starting llm_batch and during sub-LLM tool loops (with a forced final-answer call), and surfaces budget info in the root system prompt and llm_batch summary output.

Tests and docs are updated accordingly: rename sub_tool_max_turns → sub_llm_max_turns, remove pool tests, fold sandbox tests into test_rlm_env.py, and trim the experimental README’s RLMEnv section.

^{Written by Cursor Bugbot for commit 0722cae. This will update automatically on new commits. Configure here.}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor · 2026-02-27T14:32:09Z

verifiers/envs/experimental/rlm_env.py

            sandbox_id,
            cmd,
-            timeout=self.env._compute_install_wait_seconds(),
+            timeout=self.env.max_startup_wait_seconds,


Pip install timeout no longer scales with packages

Low Severity

The removed _compute_install_wait_seconds() scaled the pip install timeout based on the number of packages (30s per package, minimum max_startup_wait_seconds). Now using the flat max_startup_wait_seconds (default 120s) means environments with many pip_install_packages (5+) may time out during installation where they previously succeeded.

If somebody installs that many packages, they know what they're doing, and should be able to simply increase the max_startup_wait_seconds..

…duction-2026-02-27 merge in main

verifiers/envs/experimental/rlm_env.py

The type checker flags 5 unresolved-attribute errors because _interception_server is typed as InterceptionServer | None. Use cast() at each access site to narrow the type, since these code paths only run when interception is active (not gateway mode). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The validation checks correctly use "heavy" but the error messages still said "high", which would mislead users into using an invalid value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

tests/test_rlm_env.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds a rollout-level completion-token budget shared across all sub-LLM calls. When set, the environment tracks cumulative sub-LLM completion tokens and refuses new calls once the budget is reached. The root model is informed of the budget in its system prompt and in the per-batch summary printed after each llm_batch() call. None (default) means unlimited, preserving backward compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move all sandbox backend tests into the main RLM test file and delete the separate file. No test changes — just consolidation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The budget gate in _run_sub_llm_request only fired before starting a sub-LLM call. A single call with multiple tool-calling turns could blow past the budget unchecked. Now _run_sub_llm checks the combined committed + in-flight completion tokens after each turn and breaks out of the loop early when the budget is exceeded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…duction-2026-02-27 # Conflicts: # verifiers/envs/experimental/cli_agent_env.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

verifiers/envs/experimental/rlm_env.py

Move the assistant message append before the token budget check so the forced final answer path sees a complete conversation, consistent with the normal max-turns exit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

snimu and others added 4 commits February 27, 2026 13:53

simplify rlm args

3e764de

rename sub_tool_max_turns → sub_llm_max_turns

b252f53

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

simplify RLMEnv docs

d060b11

fix outdated claims in RLMEnv docstring

9e27e18

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed Feb 27, 2026

View reviewed changes

snimu mentioned this pull request Feb 27, 2026

Update RLMEnv args to align with verifiers PR#966 PrimeIntellect-ai/research-environments#186

Open

Merge remote-tracking branch 'origin/main' into sebastian/rlm-args-re…

92e71e4

…duction-2026-02-27 merge in main

cursor bot reviewed Mar 1, 2026

View reviewed changes

verifiers/envs/experimental/rlm_env.py Show resolved Hide resolved

snimu and others added 2 commits March 1, 2026 11:33

fix verbosity validation error messages: 'high' -> 'heavy'

dec7b9e

The validation checks correctly use "heavy" but the error messages still said "high", which would mislead users into using an invalid value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed Mar 1, 2026

View reviewed changes

tests/test_rlm_env.py Show resolved Hide resolved

snimu and others added 5 commits March 1, 2026 11:48

test default max_turns value in test_default_initialization

02f16ca

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

merge test_rlm_env_sandbox.py into test_rlm_env.py

e3e5250

Move all sandbox backend tests into the main RLM test file and delete the separate file. No test changes — just consolidation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into sebastian/rlm-args-re…

2e77064

…duction-2026-02-27 # Conflicts: # verifiers/envs/experimental/cli_agent_env.py

cursor bot reviewed Mar 2, 2026

View reviewed changes

verifiers/envs/experimental/rlm_env.py Show resolved Hide resolved

fix sub-LLM budget break skipping assistant message

0722cae

Move the assistant message append before the token budget check so the forced final answer path sees a complete conversation, consistent with the normal max-turns exit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

snimu merged commit 522396c into main Mar 2, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RLMEnv: Simplify constructor and internals#966

RLMEnv: Simplify constructor and internals#966
snimu merged 13 commits intomainfrom
sebastian/rlm-args-reduction-2026-02-27

snimu commented Feb 27, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot Feb 27, 2026

Uh oh!

snimu Feb 27, 2026

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

snimu commented Feb 27, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Uh oh!

cursor bot Feb 27, 2026

Choose a reason for hiding this comment

Pip install timeout no longer scales with packages

Uh oh!

snimu Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

snimu commented Feb 27, 2026 •

edited by cursor bot

Loading