Skip to content

security(executor): isolate custom plugin parsers in a constrained subprocess#369

Open
advikdivekar wants to merge 3 commits into
utksh1:mainfrom
advikdivekar:security/parser-subprocess-isolation
Open

security(executor): isolate custom plugin parsers in a constrained subprocess#369
advikdivekar wants to merge 3 commits into
utksh1:mainfrom
advikdivekar:security/parser-subprocess-isolation

Conversation

@advikdivekar
Copy link
Copy Markdown
Contributor

@advikdivekar advikdivekar commented May 28, 2026

What is the problem
SecuScan executed custom plugin parser.py files directly inside the backend process using importlib.util.exec_module. A malicious or buggy parser could read in-memory secrets, allocate unlimited memory, run indefinitely with no timeout, and access every environment variable including SECUSCAN_VAULT_KEY. Closes #206.

The previous reviewer concern was that max_output_bytes was checked after subprocess.run(capture_output=True), which buffers all stdout in memory before the limit is enforced. This version uses subprocess.Popen with streaming chunk reads and kills the process as soon as the byte cap is exceeded — the parent never accumulates more than one 64 KB chunk past the limit.

What was changed

File Change
backend/secuscan/parser_sandbox.py New module: run_parser_in_sandbox() spawns a fresh Python subprocess via Popen(stdout=PIPE), feeds scanner output via stdin as a JSON envelope, reads stdout in 64 KB chunks on a background thread, kills the child and raises ParserSandboxError the instant the byte cap is exceeded, enforces wall-clock timeout, strips all secrets from the child environment
backend/secuscan/executor.py _parse_results(): replaced importlib.exec_module block with run_parser_in_sandbox(); catches ParserSandboxError and falls through to built-in parsers
backend/secuscan/config.py Added parser_sandbox_timeout_seconds (default 30) and parser_sandbox_max_output_bytes (default 8 MB)
.env.example Documented both new settings
testing/backend/unit/test_parser_sandbox.py 26 unit tests including test_oversized_output_process_killed_before_full_buffer which spawns a parser that would produce 20 MB and proves ParserSandboxError is raised before 20 MB accumulates in parent memory

Why streaming instead of capture_output=True
subprocess.run(capture_output=True) buffers all child stdout in parent memory before returning — the size check happens too late and the parent can be OOM-killed first. Popen with stdout=PIPE and a background reader thread reads in 64 KB chunks, increments a counter, and calls proc.kill() the instant the counter exceeds max_output_bytes. The parent never holds more than max_output_bytes + 65536 bytes of parser output in memory regardless of how much the child tries to produce.

How to test

  1. Run a plugin that has a parser.py — the parse step runs in a child process (logged as Parser sandbox completed successfully for plugin '...')
  2. Add an infinite loop to any parser.py, submit a scan — fails with sandbox timeout after 30s rather than hanging the backend
  3. Set SECUSCAN_VAULT_KEY=mysecret, add import os; return {"key": os.environ.get("SECUSCAN_VAULT_KEY")} to a parser — result shows None (key stripped from child env)
  4. python -m pytest testing/backend/unit/test_parser_sandbox.py -v — 26 tests pass

Edge cases covered

  • Parser times out: ParserSandboxError, falls through to built-in parser
  • Parser crashes or calls sys.exit: ParserSandboxError with exit code and stderr excerpt
  • Parser has a syntax error: caught at import time inside child
  • Parser missing parse() function: child exits non-zero
  • Parser returns a list instead of dict: wrapped in {"findings": [...]}
  • Parser returns None or non-dict/list: ParserSandboxError
  • Parser produces oversized output: process killed before full output is buffered in parent — proven by regression test
  • Empty stdout from child: treated as empty result
  • Environment secrets stripped from child env

Lines changed
578 insertions, 17 deletions across 5 files — 575 backend tests pass, 0 failures

Verification

  • Root cause fully resolved — importlib.exec_module replaced with subprocess isolation
  • Streaming output read with hard cap — parent never buffers full oversized output
  • Regression test test_oversized_output_process_killed_before_full_buffer proves the cap works
  • All 26 parser sandbox tests pass
  • 575 total backend tests pass — no regressions
  • Rebased onto latest main — no merge conflicts

Labels: type:security type:refactor level:critical gssoc:approved

Closes #206

Please review and merge this under GSSoC 2026.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24d33e5619

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread backend/secuscan/parser_sandbox.py Outdated
Comment on lines +149 to +153
result = subprocess.run(
[sys.executable, "-c", bootstrap],
input=stdin_bytes,
capture_output=True,
timeout=timeout_seconds,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce the output cap before buffering stdout

When a custom parser writes a very large result (or streams junk to stdout), subprocess.run(capture_output=True) buffers the entire child stdout in the backend process before the later len(stdout_bytes) > max_output_bytes check can run. That means SECUSCAN_PARSER_SANDBOX_MAX_OUTPUT_BYTES does not actually prevent an untrusted parser from exhausting backend memory before timeout; the parent needs to read stdout incrementally and stop/kill the child once the configured byte cap is crossed.

Useful? React with 👍 / 👎.

@advikdivekar
Copy link
Copy Markdown
Contributor Author

@utksh1 please review all my PRs, people have started to raise duplicate issues and are merging it but mine is not getting merged, please review into this matter, thank you

@utksh1 utksh1 added area:backend Backend API, database, or service work area:plugins Scanner plugin metadata, schemas, or plugin runtime work area:security Security-sensitive implementation or tests type:security Security work category bonus label type:testing Testing work category bonus label level:critical 80 pts difficulty label for critical or high-impact PRs labels May 28, 2026
Copy link
Copy Markdown
Owner

@utksh1 utksh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes. Parser subprocess isolation is the right idea, but the current max_output_bytes check happens after subprocess.run(capture_output=True), so a parser can still force the parent to capture excessive stdout in memory before the limit is enforced. Please stream stdout/stderr with a hard cap, terminate on overflow, and add a regression that proves oversized parser output cannot be fully buffered.

@utksh1
Copy link
Copy Markdown
Owner

utksh1 commented May 28, 2026

Thanks for following up. Clarifying the change request so it is actionable:

Why this is blocked:
Requesting changes. Parser subprocess isolation is the right idea, but the current max_output_bytes check happens after subprocess.run(capture_output=True), so a parser can still force the parent to capture excessive stdout in memory before the limit is enforced. Please stream stdout/stderr with a hard cap, terminate on overflow, and add a regression that proves oversized parser output cannot be fully buffered.

What to do next:

  • Fix the specific issues called out above.
  • Push the updated branch and make sure the relevant CI checks pass.
  • Reply here when ready for re-review.

@advikdivekar
Copy link
Copy Markdown
Contributor Author

@utksh1 please review this pr, it is ready to merge, let me know if any improvements required otherwise it's good to review and get merged, thank you for assigning me this issue

Copy link
Copy Markdown
Owner

@utksh1 utksh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed latest state. Parser isolation is still not safe enough if output is captured before applying the max-output limit. Please stream stdout/stderr with a hard cap, terminate on overflow, and prove with a regression that oversized parser output is never fully buffered in the parent process.

Plugin parser.py files previously executed via importlib.exec_module inside
the main backend process.  A malicious or buggy parser could read in-memory
secrets, exhaust backend memory, or crash the server.

Run each parser in a fresh Python subprocess that receives scanner output
over stdin (JSON envelope) and writes its result to stdout (JSON).  The child
process runs with a stripped environment (no SECUSCAN_* secrets, no API keys),
a configurable wall-clock timeout (default 30s), and a configurable stdout
size cap (default 8 MB).

- parser_sandbox.py: run_parser_in_sandbox(), ParserSandboxError,
  _sanitised_env(), bootstrap script injected into child via -c
- executor.py: _parse_results() replaced importlib.exec_module with
  run_parser_in_sandbox(); ParserSandboxError caught and logged, falls
  through to built-in parsers on failure
- config.py: parser_sandbox_timeout_seconds, parser_sandbox_max_output_bytes
- .env.example: documented SECUSCAN_PARSER_SANDBOX_TIMEOUT_SECONDS and
  SECUSCAN_PARSER_SANDBOX_MAX_OUTPUT_BYTES
- test_parser_sandbox.py: 25 unit tests covering successful parse, timeout,
  crash, missing function, oversized output, env sanitisation, error metadata

Closes utksh1#206
…verflow

Replace subprocess.run(capture_output=True) with Popen + background reader
threads that terminate the child process as soon as stdout exceeds
max_output_bytes, preventing a runaway parser from forcing the parent to
buffer the full output in memory before the limit is enforced.

Update test_exec_module_called_when_integrity_passes to assert that
run_parser_in_sandbox is called (the new code path) rather than exec_module
directly. Add regression test proving the child is killed before filling
memory when oversized output is produced.
@advikdivekar advikdivekar force-pushed the security/parser-subprocess-isolation branch from 8c1e528 to c0ac28d Compare May 30, 2026 07:00
@utksh1
Copy link
Copy Markdown
Owner

utksh1 commented May 30, 2026

Re-reviewed after the latest push. Still blocked: parser stdout/stderr must be streamed with a hard cap and terminated on overflow. The parent process should never fully buffer oversized parser output before enforcing max_output_bytes; please add a regression proving that.

@utksh1
Copy link
Copy Markdown
Owner

utksh1 commented May 31, 2026

Re-reviewed after the latest push. Still blocked: parser stdout/stderr must be streamed with a hard cap and terminated on overflow. Please add a regression proving oversized parser output is never fully buffered in the parent process before max_output_bytes is enforced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:backend Backend API, database, or service work area:plugins Scanner plugin metadata, schemas, or plugin runtime work area:security Security-sensitive implementation or tests level:critical 80 pts difficulty label for critical or high-impact PRs type:security Security work category bonus label type:testing Testing work category bonus label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SECURITY] Isolate custom plugin parsers in a constrained subprocess

2 participants