security(executor): isolate custom plugin parsers in a constrained subprocess#369
security(executor): isolate custom plugin parsers in a constrained subprocess#369advikdivekar wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 24d33e5619
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| result = subprocess.run( | ||
| [sys.executable, "-c", bootstrap], | ||
| input=stdin_bytes, | ||
| capture_output=True, | ||
| timeout=timeout_seconds, |
There was a problem hiding this comment.
Enforce the output cap before buffering stdout
When a custom parser writes a very large result (or streams junk to stdout), subprocess.run(capture_output=True) buffers the entire child stdout in the backend process before the later len(stdout_bytes) > max_output_bytes check can run. That means SECUSCAN_PARSER_SANDBOX_MAX_OUTPUT_BYTES does not actually prevent an untrusted parser from exhausting backend memory before timeout; the parent needs to read stdout incrementally and stop/kill the child once the configured byte cap is crossed.
Useful? React with 👍 / 👎.
|
@utksh1 please review all my PRs, people have started to raise duplicate issues and are merging it but mine is not getting merged, please review into this matter, thank you |
utksh1
left a comment
There was a problem hiding this comment.
Requesting changes. Parser subprocess isolation is the right idea, but the current max_output_bytes check happens after subprocess.run(capture_output=True), so a parser can still force the parent to capture excessive stdout in memory before the limit is enforced. Please stream stdout/stderr with a hard cap, terminate on overflow, and add a regression that proves oversized parser output cannot be fully buffered.
|
Thanks for following up. Clarifying the change request so it is actionable: Why this is blocked: What to do next:
|
|
@utksh1 please review this pr, it is ready to merge, let me know if any improvements required otherwise it's good to review and get merged, thank you for assigning me this issue |
utksh1
left a comment
There was a problem hiding this comment.
Re-reviewed latest state. Parser isolation is still not safe enough if output is captured before applying the max-output limit. Please stream stdout/stderr with a hard cap, terminate on overflow, and prove with a regression that oversized parser output is never fully buffered in the parent process.
Plugin parser.py files previously executed via importlib.exec_module inside the main backend process. A malicious or buggy parser could read in-memory secrets, exhaust backend memory, or crash the server. Run each parser in a fresh Python subprocess that receives scanner output over stdin (JSON envelope) and writes its result to stdout (JSON). The child process runs with a stripped environment (no SECUSCAN_* secrets, no API keys), a configurable wall-clock timeout (default 30s), and a configurable stdout size cap (default 8 MB). - parser_sandbox.py: run_parser_in_sandbox(), ParserSandboxError, _sanitised_env(), bootstrap script injected into child via -c - executor.py: _parse_results() replaced importlib.exec_module with run_parser_in_sandbox(); ParserSandboxError caught and logged, falls through to built-in parsers on failure - config.py: parser_sandbox_timeout_seconds, parser_sandbox_max_output_bytes - .env.example: documented SECUSCAN_PARSER_SANDBOX_TIMEOUT_SECONDS and SECUSCAN_PARSER_SANDBOX_MAX_OUTPUT_BYTES - test_parser_sandbox.py: 25 unit tests covering successful parse, timeout, crash, missing function, oversized output, env sanitisation, error metadata Closes utksh1#206
…verflow Replace subprocess.run(capture_output=True) with Popen + background reader threads that terminate the child process as soon as stdout exceeds max_output_bytes, preventing a runaway parser from forcing the parent to buffer the full output in memory before the limit is enforced. Update test_exec_module_called_when_integrity_passes to assert that run_parser_in_sandbox is called (the new code path) rather than exec_module directly. Add regression test proving the child is killed before filling memory when oversized output is produced.
8c1e528 to
c0ac28d
Compare
|
Re-reviewed after the latest push. Still blocked: parser stdout/stderr must be streamed with a hard cap and terminated on overflow. The parent process should never fully buffer oversized parser output before enforcing max_output_bytes; please add a regression proving that. |
|
Re-reviewed after the latest push. Still blocked: parser stdout/stderr must be streamed with a hard cap and terminated on overflow. Please add a regression proving oversized parser output is never fully buffered in the parent process before max_output_bytes is enforced. |
What is the problem
SecuScan executed custom plugin
parser.pyfiles directly inside the backend process usingimportlib.util.exec_module. A malicious or buggy parser could read in-memory secrets, allocate unlimited memory, run indefinitely with no timeout, and access every environment variable includingSECUSCAN_VAULT_KEY. Closes #206.The previous reviewer concern was that
max_output_byteswas checked aftersubprocess.run(capture_output=True), which buffers all stdout in memory before the limit is enforced. This version usessubprocess.Popenwith streaming chunk reads and kills the process as soon as the byte cap is exceeded — the parent never accumulates more than one 64 KB chunk past the limit.What was changed
backend/secuscan/parser_sandbox.pyrun_parser_in_sandbox()spawns a fresh Python subprocess viaPopen(stdout=PIPE), feeds scanner output via stdin as a JSON envelope, reads stdout in 64 KB chunks on a background thread, kills the child and raisesParserSandboxErrorthe instant the byte cap is exceeded, enforces wall-clock timeout, strips all secrets from the child environmentbackend/secuscan/executor.py_parse_results(): replacedimportlib.exec_moduleblock withrun_parser_in_sandbox(); catchesParserSandboxErrorand falls through to built-in parsersbackend/secuscan/config.pyparser_sandbox_timeout_seconds(default 30) andparser_sandbox_max_output_bytes(default 8 MB).env.exampletesting/backend/unit/test_parser_sandbox.pytest_oversized_output_process_killed_before_full_bufferwhich spawns a parser that would produce 20 MB and provesParserSandboxErroris raised before 20 MB accumulates in parent memoryWhy streaming instead of
capture_output=Truesubprocess.run(capture_output=True)buffers all child stdout in parent memory before returning — the size check happens too late and the parent can be OOM-killed first.Popenwithstdout=PIPEand a background reader thread reads in 64 KB chunks, increments a counter, and callsproc.kill()the instant the counter exceedsmax_output_bytes. The parent never holds more thanmax_output_bytes + 65536bytes of parser output in memory regardless of how much the child tries to produce.How to test
parser.py— the parse step runs in a child process (logged asParser sandbox completed successfully for plugin '...')parser.py, submit a scan — fails with sandbox timeout after 30s rather than hanging the backendSECUSCAN_VAULT_KEY=mysecret, addimport os; return {"key": os.environ.get("SECUSCAN_VAULT_KEY")}to a parser — result showsNone(key stripped from child env)python -m pytest testing/backend/unit/test_parser_sandbox.py -v— 26 tests passEdge cases covered
ParserSandboxError, falls through to built-in parsersys.exit:ParserSandboxErrorwith exit code and stderr excerptparse()function: child exits non-zero{"findings": [...]}Noneor non-dict/list:ParserSandboxErrorLines changed
578 insertions, 17 deletions across 5 files — 575 backend tests pass, 0 failures
Verification
importlib.exec_modulereplaced with subprocess isolationtest_oversized_output_process_killed_before_full_bufferproves the cap worksLabels:
type:securitytype:refactorlevel:criticalgssoc:approvedCloses #206
Please review and merge this under GSSoC 2026.