π‘οΈ Sentinel: [CRITICAL] Fix Remote Code Execution vulnerability in PDF generation#217
π‘οΈ Sentinel: [CRITICAL] Fix Remote Code Execution vulnerability in PDF generation#217
Conversation
Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
|
π Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a π emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideThis PR hardens the PDF generation pipeline by disabling LaTeX shell execution and adding execution timeouts with proper subprocess cleanup for both pdflatex and pandoc-based compilation paths in the cover letter generator and generic PDF converter. Sequence diagram for hardened cover letter PDF compilationsequenceDiagram
participant CoverLetterGenerator
participant pdflatex_process as pdflatex
participant pandoc_process as pandoc
participant FileSystem
CoverLetterGenerator->>pdflatex_process: Popen(pdflatex -interaction=nonstopmode -no-shell-escape)
alt pdflatex completes within 30s
CoverLetterGenerator->>pdflatex_process: communicate(timeout=30)
pdflatex_process-->>CoverLetterGenerator: stdout, stderr, returncode
CoverLetterGenerator->>FileSystem: check output_path exists
alt returncode == 0 or file exists
CoverLetterGenerator-->>CoverLetterGenerator: pdf_created = True
else pdflatex failed
CoverLetterGenerator-->>CoverLetterGenerator: pdf_created = False
end
else pdflatex times out
CoverLetterGenerator->>pdflatex_process: kill()
CoverLetterGenerator->>pdflatex_process: communicate()
pdflatex_process-->>CoverLetterGenerator: stdout, stderr
CoverLetterGenerator-->>CoverLetterGenerator: return False
end
alt pdf_created is False and pdflatex did not time out fatally
CoverLetterGenerator->>pandoc_process: Popen(pandoc --pdf-engine=xelatex --pdf-engine-opt=-no-shell-escape)
alt pandoc completes within 30s
CoverLetterGenerator->>pandoc_process: communicate(timeout=30)
pandoc_process-->>CoverLetterGenerator: stdout, stderr, returncode
CoverLetterGenerator->>FileSystem: check output_path exists
alt returncode == 0 or file exists
CoverLetterGenerator-->>CoverLetterGenerator: pdf_created = True
else pandoc failed
CoverLetterGenerator-->>CoverLetterGenerator: pdf_created = False
end
else pandoc times out
CoverLetterGenerator->>pandoc_process: kill()
CoverLetterGenerator->>pandoc_process: communicate()
pandoc_process-->>CoverLetterGenerator: stdout, stderr
CoverLetterGenerator-->>CoverLetterGenerator: return False
end
end
Flow diagram for PDF compilation with timeout and no shell escapeflowchart TD
A[Start PDF compilation] --> B[Run pdflatex with -interaction=nonstopmode -no-shell-escape]
B --> C{pdflatex completes within 30s?}
C -->|No| D[Kill pdflatex process]
D --> E[communicate again to clean up]
E --> F[Set pdf_created = False]
F --> G[Return False]
C -->|Yes| H{pdflatex returncode == 0 or output_path exists?}
H -->|Yes| I[Set pdf_created = True]
I --> J[Return True]
H -->|No| K[Set pdf_created = False]
K --> L[Run pandoc with --pdf-engine=xelatex and --pdf-engine-opt=-no-shell-escape]
L --> M{pandoc completes within 30s?}
M -->|No| N[Kill pandoc process]
N --> O[communicate again to clean up]
O --> P[Set pdf_created = False]
P --> Q[Return False]
M -->|Yes| R{pandoc returncode == 0 or output_path exists?}
R -->|Yes| S[Set pdf_created = True]
S --> T[Return True]
R -->|No| U[Set pdf_created = False]
U --> V[Return False]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The timeout/cleanup pattern for
subprocess.Popenis duplicated in four places; consider extracting a small helper (e.g.,run_with_timeout(cmd, cwd, timeout=30)) to centralize the logic and keep the behavior consistent across all callers. - On timeout or non-zero return code you currently discard
stdout/stderr; capturing and logging or surfacing these details would make diagnosing LaTeX/pandoc failures significantly easier. - Since
subprocess.Popen(...).communicate()never raisesCalledProcessError, theexcept (subprocess.CalledProcessError, FileNotFoundError)blocks could be simplified to just handleFileNotFoundError(or switch tosubprocess.run(..., check=True)if you wantCalledProcessErrorsemantics).
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The timeout/cleanup pattern for `subprocess.Popen` is duplicated in four places; consider extracting a small helper (e.g., `run_with_timeout(cmd, cwd, timeout=30)`) to centralize the logic and keep the behavior consistent across all callers.
- On timeout or non-zero return code you currently discard `stdout`/`stderr`; capturing and logging or surfacing these details would make diagnosing LaTeX/pandoc failures significantly easier.
- Since `subprocess.Popen(...).communicate()` never raises `CalledProcessError`, the `except (subprocess.CalledProcessError, FileNotFoundError)` blocks could be simplified to just handle `FileNotFoundError` (or switch to `subprocess.run(..., check=True)` if you want `CalledProcessError` semantics).Help me be more useful! Please click π or π on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
π¨ Severity: CRITICAL
π‘ Vulnerability: Remote Code Execution (RCE) via LaTeX compilation. Unfiltered execution of
pdflatexandpandocallows shell commands if input is maliciously crafted. There was also a potential Denial of Service (DoS) due to unbounded execution time of the compilation process.π― Impact: An attacker could craft a malicious LaTeX document (e.g. via an imported resume YAML or template) that executes arbitrary shell commands on the server during PDF generation using
\write18. Additionally, compilation without a timeout could lead to infinite loops blocking worker threads.π§ Fix: Added the
-no-shell-escapeflag to thepdflatexcommand and--pdf-engine-opt=-no-shell-escapeto thepandoccommand incli/pdf/converter.pyandcli/generators/cover_letter_generator.py. Also added a 30-second timeout on thesubprocess.communicate()calls and implemented proper process cleanup (catchingsubprocess.TimeoutExpired, callingprocess.kill(), and callingprocess.communicate()again) to prevent zombie processes.β Verification: Verified by executing the full
pytestsuite locally and ensuring that PDF compilation functionality continues to work without regression. Reviewed the security test module to confirmtest_pdflatex_timeoutandtest_pdflatex_argumentsbehavior aligns with the implemented fixes.PR created automatically by Jules for task 4206741460883636495 started by @anchapin
Summary by Sourcery
Harden PDF generation against remote code execution and unbounded LaTeX runs by restricting LaTeX engine capabilities and enforcing timeouts on compilation subprocesses.
Bug Fixes:
Enhancements: