Skip to content

πŸ›‘οΈ Sentinel: [CRITICAL] Fix Remote Code Execution vulnerability in PDF generation#217

Open
anchapin wants to merge 2 commits intomainfrom
sentinel/fix-pdf-compilation-rce-4206741460883636495
Open

πŸ›‘οΈ Sentinel: [CRITICAL] Fix Remote Code Execution vulnerability in PDF generation#217
anchapin wants to merge 2 commits intomainfrom
sentinel/fix-pdf-compilation-rce-4206741460883636495

Conversation

@anchapin
Copy link
Copy Markdown
Owner

@anchapin anchapin commented Mar 29, 2026

🚨 Severity: CRITICAL
πŸ’‘ Vulnerability: Remote Code Execution (RCE) via LaTeX compilation. Unfiltered execution of pdflatex and pandoc allows shell commands if input is maliciously crafted. There was also a potential Denial of Service (DoS) due to unbounded execution time of the compilation process.
🎯 Impact: An attacker could craft a malicious LaTeX document (e.g. via an imported resume YAML or template) that executes arbitrary shell commands on the server during PDF generation using \write18. Additionally, compilation without a timeout could lead to infinite loops blocking worker threads.
πŸ”§ Fix: Added the -no-shell-escape flag to the pdflatex command and --pdf-engine-opt=-no-shell-escape to the pandoc command in cli/pdf/converter.py and cli/generators/cover_letter_generator.py. Also added a 30-second timeout on the subprocess.communicate() calls and implemented proper process cleanup (catching subprocess.TimeoutExpired, calling process.kill(), and calling process.communicate() again) to prevent zombie processes.
βœ… Verification: Verified by executing the full pytest suite locally and ensuring that PDF compilation functionality continues to work without regression. Reviewed the security test module to confirm test_pdflatex_timeout and test_pdflatex_arguments behavior aligns with the implemented fixes.


PR created automatically by Jules for task 4206741460883636495 started by @anchapin

Summary by Sourcery

Harden PDF generation against remote code execution and unbounded LaTeX runs by restricting LaTeX engine capabilities and enforcing timeouts on compilation subprocesses.

Bug Fixes:

  • Prevent remote code execution in pdflatex and pandoc-based PDF generation by disabling shell escape in LaTeX engines.
  • Avoid potential denial-of-service from hung LaTeX compilations by enforcing a 30-second timeout and terminating stalled processes.

Enhancements:

  • Improve robustness of PDF generation by ensuring subprocesses are explicitly killed and cleaned up on timeout in both cover letter and generic PDF converters.

Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

πŸ‘‹ Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a πŸ‘€ emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Mar 29, 2026

Reviewer's Guide

This PR hardens the PDF generation pipeline by disabling LaTeX shell execution and adding execution timeouts with proper subprocess cleanup for both pdflatex and pandoc-based compilation paths in the cover letter generator and generic PDF converter.

Sequence diagram for hardened cover letter PDF compilation

sequenceDiagram
    participant CoverLetterGenerator
    participant pdflatex_process as pdflatex
    participant pandoc_process as pandoc
    participant FileSystem

    CoverLetterGenerator->>pdflatex_process: Popen(pdflatex -interaction=nonstopmode -no-shell-escape)
    alt pdflatex completes within 30s
        CoverLetterGenerator->>pdflatex_process: communicate(timeout=30)
        pdflatex_process-->>CoverLetterGenerator: stdout, stderr, returncode
        CoverLetterGenerator->>FileSystem: check output_path exists
        alt returncode == 0 or file exists
            CoverLetterGenerator-->>CoverLetterGenerator: pdf_created = True
        else pdflatex failed
            CoverLetterGenerator-->>CoverLetterGenerator: pdf_created = False
        end
    else pdflatex times out
        CoverLetterGenerator->>pdflatex_process: kill()
        CoverLetterGenerator->>pdflatex_process: communicate()
        pdflatex_process-->>CoverLetterGenerator: stdout, stderr
        CoverLetterGenerator-->>CoverLetterGenerator: return False
    end

    alt pdf_created is False and pdflatex did not time out fatally
        CoverLetterGenerator->>pandoc_process: Popen(pandoc --pdf-engine=xelatex --pdf-engine-opt=-no-shell-escape)
        alt pandoc completes within 30s
            CoverLetterGenerator->>pandoc_process: communicate(timeout=30)
            pandoc_process-->>CoverLetterGenerator: stdout, stderr, returncode
            CoverLetterGenerator->>FileSystem: check output_path exists
            alt returncode == 0 or file exists
                CoverLetterGenerator-->>CoverLetterGenerator: pdf_created = True
            else pandoc failed
                CoverLetterGenerator-->>CoverLetterGenerator: pdf_created = False
            end
        else pandoc times out
            CoverLetterGenerator->>pandoc_process: kill()
            CoverLetterGenerator->>pandoc_process: communicate()
            pandoc_process-->>CoverLetterGenerator: stdout, stderr
            CoverLetterGenerator-->>CoverLetterGenerator: return False
        end
    end
Loading

Flow diagram for PDF compilation with timeout and no shell escape

flowchart TD
    A[Start PDF compilation] --> B[Run pdflatex with -interaction=nonstopmode -no-shell-escape]
    B --> C{pdflatex completes within 30s?}
    C -->|No| D[Kill pdflatex process]
    D --> E[communicate again to clean up]
    E --> F[Set pdf_created = False]
    F --> G[Return False]

    C -->|Yes| H{pdflatex returncode == 0 or output_path exists?}
    H -->|Yes| I[Set pdf_created = True]
    I --> J[Return True]
    H -->|No| K[Set pdf_created = False]

    K --> L[Run pandoc with --pdf-engine=xelatex and --pdf-engine-opt=-no-shell-escape]
    L --> M{pandoc completes within 30s?}
    M -->|No| N[Kill pandoc process]
    N --> O[communicate again to clean up]
    O --> P[Set pdf_created = False]
    P --> Q[Return False]

    M -->|Yes| R{pandoc returncode == 0 or output_path exists?}
    R -->|Yes| S[Set pdf_created = True]
    S --> T[Return True]
    R -->|No| U[Set pdf_created = False]
    U --> V[Return False]
Loading

File-Level Changes

Change Details Files
Harden pdflatex invocation in cover letter PDF compilation against RCE and hangs.
  • Add -no-shell-escape flag to the pdflatex command used during cover letter PDF compilation to block shell escapes
  • Wrap process.communicate() with a 30-second timeout to prevent unbounded execution time
  • On subprocess.TimeoutExpired, kill the process, drain stdout/stderr with a second communicate() call, and return False to signal failure
  • Preserve existing success condition based on return code or presence of the expected output PDF
cli/generators/cover_letter_generator.py
Secure pandoc-based fallback PDF compilation in cover letter generator.
  • Extend pandoc invocation to pass --pdf-engine-opt=-no-shell-escape to the xelatex engine, disabling shell escapes
  • Add a 30-second timeout to pandoc process.communicate() similar to the pdflatex path
  • On timeout, kill the pandoc process, perform a cleanup communicate(), and return False
  • Leave success detection based on return code or existence of the output PDF unchanged
cli/generators/cover_letter_generator.py
Apply the same pdflatex hardening and timeout behavior to the shared PDF converter.
  • Add -no-shell-escape to the pdflatex command in the shared _compile_pdflatex helper
  • Wrap communicate() with a 30-second timeout and kill/cleanup/return False on TimeoutExpired
  • Retain the existing success check (return code or output file existence)
cli/pdf/converter.py
Apply the same pandoc hardening and timeout behavior to the shared PDF converter.
  • Extend pandoc invocation to include --pdf-engine-opt=-no-shell-escape for the xelatex engine
  • Add a 30-second timeout to communicate() and perform kill/cleanup/False on TimeoutExpired
  • Keep the existing logic that treats either a zero return code or presence of the output PDF as success
cli/pdf/converter.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The timeout/cleanup pattern for subprocess.Popen is duplicated in four places; consider extracting a small helper (e.g., run_with_timeout(cmd, cwd, timeout=30)) to centralize the logic and keep the behavior consistent across all callers.
  • On timeout or non-zero return code you currently discard stdout/stderr; capturing and logging or surfacing these details would make diagnosing LaTeX/pandoc failures significantly easier.
  • Since subprocess.Popen(...).communicate() never raises CalledProcessError, the except (subprocess.CalledProcessError, FileNotFoundError) blocks could be simplified to just handle FileNotFoundError (or switch to subprocess.run(..., check=True) if you want CalledProcessError semantics).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The timeout/cleanup pattern for `subprocess.Popen` is duplicated in four places; consider extracting a small helper (e.g., `run_with_timeout(cmd, cwd, timeout=30)`) to centralize the logic and keep the behavior consistent across all callers.
- On timeout or non-zero return code you currently discard `stdout`/`stderr`; capturing and logging or surfacing these details would make diagnosing LaTeX/pandoc failures significantly easier.
- Since `subprocess.Popen(...).communicate()` never raises `CalledProcessError`, the `except (subprocess.CalledProcessError, FileNotFoundError)` blocks could be simplified to just handle `FileNotFoundError` (or switch to `subprocess.run(..., check=True)` if you want `CalledProcessError` semantics).

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click πŸ‘ or πŸ‘Ž on each comment and I'll use the feedback to improve your reviews.

Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant