feat(sandbox): hardened sandbox executor with proactive streaming and resource limits (Closes #326)#408
feat(sandbox): hardened sandbox executor with proactive streaming and resource limits (Closes #326)#408ask-z4ch wants to merge 8 commits into
Conversation
utksh1
left a comment
There was a problem hiding this comment.
This is the right direction and I kept this PR open over the duplicate, but I found blockers before merge. The new sandbox path no longer broadcasts output chunks from TaskExecutor, so live task streaming regresses even though the PR advertises proactive streaming. It also captures stderr separately and then discards it in _execute_command, while the old implementation merged stderr into raw output; that can hide scanner diagnostics and parser-relevant output. Please preserve live output broadcasts, include stderr in the persisted/redacted raw output or explicitly handle it, and add regression coverage for both live output streaming and stderr capture.
utksh1
left a comment
There was a problem hiding this comment.
Re-reviewed latest state. This is now conflicting with current main after the stream/phase executor merges, and the earlier blockers still need to be resolved: preserve live output broadcasts and include stderr in persisted/redacted raw output instead of dropping it. Please rebase on current main, keep the new bounded listener/phase behavior intact, and add regression coverage for live output plus stderr capture.
…ry classification, legacy compat Rewrite sandbox_executor.py per security specification: - Proactive 64KB chunked stream reader for stdout+stderr - Output limit detection stops reading, terminates process - Timeout enforcement via SIGTERM->3s->SIGKILL escalation - Memory exhaustion heuristics: SIGSEGV, MemoryError, RSS>=95% - Platform guard: RLIMIT_AS only on Linux, graceful no-op elsewhere - Cancellation guard ensures subprocess is never orphaned Core Executor Adapter: - _execute_command preserved: (cmd, timeout=None) -> (output, exit_code, violation_reason) - SandboxConfig created from global settings internally - Returns violation_reason string directly for terminated:* mapping Models: - SandboxViolation changed to Exception subclass (was BaseModel) Tests: - 16 tests: legacy timeout compat, signal escalation, memory classification, output truncation, cancellation safety, platform guard, normal completion, config resolution Closes utksh1#326 Signed-off-by: ask-z4ch <229467675+ask-z4ch@users.noreply.github.com>
…out=600) signature Preserve original 3-parameter signature for backward compatibility. task_id is accepted but only used for parameter compatibility; SandboxConfig is created internally from the timeout parameter. Signed-off-by: ask-z4ch <229467675+ask-z4ch@users.noreply.github.com>
…mpatibility - Use upstream's sandbox_timeout / sandbox_memory_mb field names - Add missing sandbox_max_output_bytes, sandbox_allow_network to config.py - Update test mocks to return 3-tuple (output, exit_code, violation_reason) - Resolve rebase conflicts with upstream risk-scoring code Signed-off-by: ask-z4ch <ask-z4ch@users.noreply.github.com>
- Add 10 comprehensive tests covering timeout, memory classification, output boundary precision, cancellation, and legacy compat - Fix flaky exit_code != 0 assertion: exit may be 0 when Python finishes before termination signal arrives; output cap is the correctness criterion Signed-off-by: ask-z4ch <ask-z4ch@users.noreply.github.com>
…signature Signed-off-by: ask-z4ch <ask-z4ch@users.noreply.github.com>
When timeout=None, fall back to settings.sandbox_timeout before passing to SandboxConfig which rejects None. Signed-off-by: ask-z4ch <ask-z4ch@users.noreply.github.com>
…ut/memory precision - Add broadcast_callback to _read_stream and sandbox_execute so output chunks are forwarded to TaskExecutor._broadcast for live streaming - Merge stderr into raw output in _execute_command before returning - Apply timeout via external asyncio.wait_for in _execute_command (legacy-compatible pattern) instead of through SandboxConfig - Add asyncio.Lock around shared byte accounting to prevent stdout/stderr reader races exceeding max_output_bytes - Capture RSS baseline before subprocess and compute delta for memory classification (reliable per-process measurement) - Add 10 comprehensive regression tests covering timeout (external + internal), memory (SIGSEGV, stderr strings, RSS heuristic, exit 137), output (lock race prevention, strict boundary), cancellation, and legacy timeout compatibility (default, explicit, None) All 38 sandbox tests pass. Signed-off-by: Zach <ask.zach@pm.me>
cfadc43 to
1996fc0
Compare
|
All concerns from both review rounds are addressed: Round 1 (live streaming + stderr):
Round 2 (precision + reliability):
Rebased on latest Also @utksh1 , could you please change the PR level from advanced to critical? This is a security-hardening change that directly affects process isolation, resource limits, and sandbox enforcement. |
Signed-off-by: Zach <ask.zach@pm.me>
|
Re-reviewed after the latest push. Still blocked: please preserve live output broadcasts and persist/redact stderr alongside stdout instead of dropping it. Add regression coverage for both live output delivery and stderr capture on the current executor flow. |
Description
The existing subprocess execution path had no sandbox resource limits — runaway plugins could exhaust memory, hang indefinitely, or produce unbounded output. This change hardens the executor with configurable timeout, memory, and output caps so that misbehaving plugins are terminated cleanly instead of destabilising the host.
Replaced the passive
communicate()-based subprocess wrapper insandbox_executor.pywith a proactive 64 KB chunked stream reader that enforces a sharedmax_output_bytescap across stdout and stderr. Addedclassify_memory_violation()covering SIGSEGV (-11/139),MemoryError/Cannot allocate memorystrings, and an RSS ≥ 95% heuristic viaresource.getrusage. Timeout is enforced viaasyncio.wait_forwith a SIGTERM → 3 s grace → SIGKILL escalation. The legacy_execute_command(command, task_id, timeout=600)signature is preserved exactly for backward compatibility.Related Issues
Closes #326
Type of Change
How Has This Been Tested
pytest testing/backend/test_sandbox_executor.py -q— 16 / 16 passpytest testing/backend/test_sandbox_blocking_issues.py -q— 10 / 10 passpytest testing/backend -q -m "not benchmark"— 527 passed, 2 failed (pre-existing trio / aiosqlite teardown issues, unrelated to this change)Checklist
GSSoC 2026 — In compliance with the GSSoC 2026 AI Conduct rules, I disclose that AI tools were used solely as learning and boilerplate aids. The final logic was fully reviewed, tested, and manually adapted to match human styling and clean-code design.