security(executor): isolate custom plugin parsers in a constrained subprocess by advikdivekar · Pull Request #369 · utksh1/SecuScan

advikdivekar · 2026-05-28T06:54:10Z

What is the problem
SecuScan executed custom plugin parser.py files directly inside the backend process using importlib.util.exec_module. A malicious or buggy parser could read in-memory secrets, allocate unlimited memory, run indefinitely with no timeout, and access every environment variable including SECUSCAN_VAULT_KEY. Closes #206.

The previous reviewer concern was that max_output_bytes was checked after subprocess.run(capture_output=True), which buffers all stdout in memory before the limit is enforced. This version uses subprocess.Popen with streaming chunk reads and kills the process as soon as the byte cap is exceeded — the parent never accumulates more than one 64 KB chunk past the limit.

What was changed

File	Change
`backend/secuscan/parser_sandbox.py`	New module: `run_parser_in_sandbox()` spawns a fresh Python subprocess via `Popen(stdout=PIPE)`, feeds scanner output via stdin as a JSON envelope, reads stdout in 64 KB chunks on a background thread, kills the child and raises `ParserSandboxError` the instant the byte cap is exceeded, enforces wall-clock timeout, strips all secrets from the child environment
`backend/secuscan/executor.py`	`_parse_results()`: replaced `importlib.exec_module` block with `run_parser_in_sandbox()`; catches `ParserSandboxError` and falls through to built-in parsers
`backend/secuscan/config.py`	Added `parser_sandbox_timeout_seconds` (default 30) and `parser_sandbox_max_output_bytes` (default 8 MB)
`.env.example`	Documented both new settings
`testing/backend/unit/test_parser_sandbox.py`	26 unit tests including `test_oversized_output_process_killed_before_full_buffer` which spawns a parser that would produce 20 MB and proves `ParserSandboxError` is raised before 20 MB accumulates in parent memory

Why streaming instead of capture_output=True
subprocess.run(capture_output=True) buffers all child stdout in parent memory before returning — the size check happens too late and the parent can be OOM-killed first. Popen with stdout=PIPE and a background reader thread reads in 64 KB chunks, increments a counter, and calls proc.kill() the instant the counter exceeds max_output_bytes. The parent never holds more than max_output_bytes + 65536 bytes of parser output in memory regardless of how much the child tries to produce.

How to test

Run a plugin that has a parser.py — the parse step runs in a child process (logged as Parser sandbox completed successfully for plugin '...')
Add an infinite loop to any parser.py, submit a scan — fails with sandbox timeout after 30s rather than hanging the backend
Set SECUSCAN_VAULT_KEY=mysecret, add import os; return {"key": os.environ.get("SECUSCAN_VAULT_KEY")} to a parser — result shows None (key stripped from child env)
python -m pytest testing/backend/unit/test_parser_sandbox.py -v — 26 tests pass

Edge cases covered

Parser times out: ParserSandboxError, falls through to built-in parser
Parser crashes or calls sys.exit: ParserSandboxError with exit code and stderr excerpt
Parser has a syntax error: caught at import time inside child
Parser missing parse() function: child exits non-zero
Parser returns a list instead of dict: wrapped in {"findings": [...]}
Parser returns None or non-dict/list: ParserSandboxError
Parser produces oversized output: process killed before full output is buffered in parent — proven by regression test
Empty stdout from child: treated as empty result
Environment secrets stripped from child env

Lines changed
578 insertions, 17 deletions across 5 files — 575 backend tests pass, 0 failures

Verification

Root cause fully resolved — importlib.exec_module replaced with subprocess isolation
Streaming output read with hard cap — parent never buffers full oversized output
Regression test test_oversized_output_process_killed_before_full_buffer proves the cap works
All 26 parser sandbox tests pass
575 total backend tests pass — no regressions
Rebased onto latest main — no merge conflicts

Labels: type:security type:refactor level:critical gssoc:approved

Closes #206

Please review and merge this under GSSoC 2026.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24d33e5619

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-28T06:56:23Z

+        result = subprocess.run(
+            [sys.executable, "-c", bootstrap],
+            input=stdin_bytes,
+            capture_output=True,
+            timeout=timeout_seconds,


Enforce the output cap before buffering stdout

When a custom parser writes a very large result (or streams junk to stdout), subprocess.run(capture_output=True) buffers the entire child stdout in the backend process before the later len(stdout_bytes) > max_output_bytes check can run. That means SECUSCAN_PARSER_SANDBOX_MAX_OUTPUT_BYTES does not actually prevent an untrusted parser from exhausting backend memory before timeout; the parent needs to read stdout incrementally and stop/kill the child once the configured byte cap is crossed.

Useful? React with 👍 / 👎.

advikdivekar · 2026-05-28T07:09:35Z

@utksh1 please review all my PRs, people have started to raise duplicate issues and are merging it but mine is not getting merged, please review into this matter, thank you

utksh1

Requesting changes. Parser subprocess isolation is the right idea, but the current max_output_bytes check happens after subprocess.run(capture_output=True), so a parser can still force the parent to capture excessive stdout in memory before the limit is enforced. Please stream stdout/stderr with a hard cap, terminate on overflow, and add a regression that proves oversized parser output cannot be fully buffered.

utksh1 · 2026-05-28T17:42:12Z

Thanks for following up. Clarifying the change request so it is actionable:

Why this is blocked:
Requesting changes. Parser subprocess isolation is the right idea, but the current max_output_bytes check happens after subprocess.run(capture_output=True), so a parser can still force the parent to capture excessive stdout in memory before the limit is enforced. Please stream stdout/stderr with a hard cap, terminate on overflow, and add a regression that proves oversized parser output cannot be fully buffered.

What to do next:

Fix the specific issues called out above.
Push the updated branch and make sure the relevant CI checks pass.
Reply here when ready for re-review.

advikdivekar · 2026-05-28T19:12:31Z

@utksh1 please review this pr, it is ready to merge, let me know if any improvements required otherwise it's good to review and get merged, thank you for assigning me this issue

utksh1

Re-reviewed latest state. Parser isolation is still not safe enough if output is captured before applying the max-output limit. Please stream stdout/stderr with a hard cap, terminate on overflow, and prove with a regression that oversized parser output is never fully buffered in the parent process.

Plugin parser.py files previously executed via importlib.exec_module inside the main backend process. A malicious or buggy parser could read in-memory secrets, exhaust backend memory, or crash the server. Run each parser in a fresh Python subprocess that receives scanner output over stdin (JSON envelope) and writes its result to stdout (JSON). The child process runs with a stripped environment (no SECUSCAN_* secrets, no API keys), a configurable wall-clock timeout (default 30s), and a configurable stdout size cap (default 8 MB). - parser_sandbox.py: run_parser_in_sandbox(), ParserSandboxError, _sanitised_env(), bootstrap script injected into child via -c - executor.py: _parse_results() replaced importlib.exec_module with run_parser_in_sandbox(); ParserSandboxError caught and logged, falls through to built-in parsers on failure - config.py: parser_sandbox_timeout_seconds, parser_sandbox_max_output_bytes - .env.example: documented SECUSCAN_PARSER_SANDBOX_TIMEOUT_SECONDS and SECUSCAN_PARSER_SANDBOX_MAX_OUTPUT_BYTES - test_parser_sandbox.py: 25 unit tests covering successful parse, timeout, crash, missing function, oversized output, env sanitisation, error metadata Closes utksh1#206

…verflow Replace subprocess.run(capture_output=True) with Popen + background reader threads that terminate the child process as soon as stdout exceeds max_output_bytes, preventing a runaway parser from forcing the parent to buffer the full output in memory before the limit is enforced. Update test_exec_module_called_when_integrity_passes to assert that run_parser_in_sandbox is called (the new code path) rather than exec_module directly. Add regression test proving the child is killed before filling memory when oversized output is produced.

utksh1 · 2026-05-30T19:21:21Z

Re-reviewed after the latest push. Still blocked: parser stdout/stderr must be streamed with a hard cap and terminated on overflow. The parent process should never fully buffer oversized parser output before enforcing max_output_bytes; please add a regression proving that.

utksh1 · 2026-05-31T11:48:49Z

Re-reviewed after the latest push. Still blocked: parser stdout/stderr must be streamed with a hard cap and terminated on overflow. Please add a regression proving oversized parser output is never fully buffered in the parent process before max_output_bytes is enforced.

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

utksh1 requested changes May 28, 2026

View reviewed changes

utksh1 requested changes May 29, 2026

View reviewed changes

advikdivekar added 2 commits May 30, 2026 12:30

advikdivekar force-pushed the security/parser-subprocess-isolation branch from 8c1e528 to c0ac28d Compare May 30, 2026 07:00

Merge branch 'main' into security/parser-subprocess-isolation

9337a2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security(executor): isolate custom plugin parsers in a constrained subprocess#369

security(executor): isolate custom plugin parsers in a constrained subprocess#369
advikdivekar wants to merge 3 commits into
utksh1:mainfrom
advikdivekar:security/parser-subprocess-isolation

advikdivekar commented May 28, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 28, 2026

Uh oh!

advikdivekar commented May 28, 2026

Uh oh!

utksh1 left a comment

Uh oh!

utksh1 commented May 28, 2026

Uh oh!

advikdivekar commented May 28, 2026

Uh oh!

utksh1 left a comment

Uh oh!

utksh1 commented May 30, 2026

Uh oh!

utksh1 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

advikdivekar commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

advikdivekar commented May 28, 2026

Uh oh!

utksh1 left a comment

Choose a reason for hiding this comment

Uh oh!

utksh1 commented May 28, 2026

Uh oh!

advikdivekar commented May 28, 2026

Uh oh!

utksh1 left a comment

Choose a reason for hiding this comment

Uh oh!

utksh1 commented May 30, 2026

Uh oh!

utksh1 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

advikdivekar commented May 28, 2026 •

edited

Loading