Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .jules/sentinel.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,8 @@
**Vulnerability:** The `CoverLetterGenerator` used a standard Jinja2 environment (intended for HTML/XML or plain text) to render LaTeX templates. This allowed malicious user input (or AI hallucinations) containing LaTeX control characters (e.g., `\input{...}`) to be injected directly into the LaTeX source, leading to potential Local File Inclusion (LFI) or other exploits.
**Learning:** Jinja2's default `autoescape` is context-aware based on file extensions, but usually only for HTML/XML. It does NOT automatically escape LaTeX special characters. Relying on manual filters (like `| latex_escape`) in templates is error-prone and brittle, as developers might forget to apply them to every variable.
**Prevention:** Always use a dedicated Jinja2 environment for LaTeX generation that enforces auto-escaping via a `finalize` hook (e.g., `tex_env.finalize = latex_escape`). This ensures *all* variable output is sanitized by default, providing defense-in-depth even if the template author forgets explicit filters.

## 2025-02-23 - [Critical] Server-Side Request Forgery in Job Parsing
**Vulnerability:** The `JobParser` fetched user-provided job posting URLs without validating the resolved IP address, allowing an attacker to make the server perform HTTP requests to internal, loopback, or private network addresses (e.g., `http://169.254.169.254`).
**Learning:** Checking the URL string is not sufficient because DNS resolution can point an external-looking hostname to an internal IP address (DNS rebinding / SSRF). Also, following redirects transparently via `requests` bypasses initial URL checks if the redirect points to an internal address.
**Prevention:** Implement safe URL fetching by manually resolving hostnames to IP addresses using `socket.getaddrinfo`, and asserting that they are not restricted IPs using the `ipaddress` module (`ip.is_private`, `ip.is_loopback`, etc.). Turn off automatic redirects (`allow_redirects=False`) and manually validate the new `Location` target for each redirect step.
71 changes: 66 additions & 5 deletions cli/integrations/job_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,11 @@
"""

import hashlib
import ipaddress
import json
import re
import socket
import urllib.parse
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
Expand Down Expand Up @@ -221,9 +224,7 @@ def parse_from_url(self, url: str) -> JobDetails:

# Fetch and parse
try:

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}
response = requests.get(url, headers=headers, timeout=30)
response = self._fetch_url_safe(url)
response.raise_for_status()

job_details = self._parse_html(response.text)
Expand All @@ -234,12 +235,72 @@ def parse_from_url(self, url: str) -> JobDetails:

return job_details

except ImportError:
except ImportError as e:
raise NotImplementedError(
"URL fetching requires 'requests' library. Install with: pip install requests"
)
except requests.RequestException as e:
except (RuntimeError, ValueError) as e:
raise RuntimeError(f"Failed to fetch URL: {e}")
except Exception as e:
if requests is not None and isinstance(e, requests.RequestException):
raise RuntimeError(f"Failed to fetch URL: {e}")
raise

def _fetch_url_safe(self, url: str):
"""
Safely fetch a URL with SSRF protection.

Args:
url: URL to fetch

Returns:
requests.Response object

Raises:
ImportError: If requests is not installed
ValueError: If URL is invalid or unsafe
RuntimeError: If too many redirects
"""
if requests is None:
raise ImportError("requests library is required for URL fetching")

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}
current_url = url
max_redirects = 5

for _ in range(max_redirects):
parsed = urllib.parse.urlparse(current_url)
if parsed.scheme not in ("http", "https"):
raise ValueError(f"Invalid URL scheme: {parsed.scheme}")

try:
addr_info = socket.getaddrinfo(parsed.hostname, None)
except socket.gaierror:
raise ValueError(f"Could not resolve hostname: {parsed.hostname}")

Comment on lines +271 to +280
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Include redirect count in the 'too many redirects' RuntimeError for better diagnostics.

Currently all redirect overflows raise the same generic message. Consider including max_redirects (and optionally the last current_url) in the RuntimeError to aid debugging, e.g. raise RuntimeError(f"Too many redirects (>{max_redirects}) while fetching {current_url}"). This improves diagnostics without changing behavior.

Suggested implementation:

        for _ in range(max_redirects):
            parsed = urllib.parse.urlparse(current_url)
            if parsed.scheme not in ("http", "https"):
                raise ValueError(f"Invalid URL scheme: {parsed.scheme}")

            try:
                addr_info = socket.getaddrinfo(parsed.hostname, None)
            except socket.gaierror:
                raise ValueError(f"Could not resolve hostname: {parsed.hostname}")

            for addr in addr_info:
        raise RuntimeError(f"Too many redirects (>{max_redirects}) while fetching {current_url}")

I assumed there is an existing raise RuntimeError("Too many redirects") after the redirect loop. If the exact message or placement differs, update the SEARCH block accordingly to match the current RuntimeError line that is raised when the redirect loop exits without returning a result, and replace it with the formatted f-string version shown above.

for addr in addr_info:
ip_str = addr[4][0]
ip = ipaddress.ip_address(ip_str)
# Check for IPv4-mapped IPv6 addresses (e.g. ::ffff:127.0.0.1)
if hasattr(ip, 'ipv4_mapped') and ip.ipv4_mapped:
ip = ip.ipv4_mapped

if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
raise ValueError(f"URL resolves to restricted IP: {ip}")

response = requests.get(
current_url, headers=headers, timeout=30, allow_redirects=False
)

if response.is_redirect:
location = response.headers.get("Location")
if not location:
raise ValueError("Redirect without Location header")
current_url = urllib.parse.urljoin(current_url, location)
else:
return response

raise RuntimeError("Too many redirects")

def _parse_html(self, html: str) -> JobDetails:
"""
Expand Down
Loading