anchapin · anchapin · Apr 5, 2026 · sourcery-ai · Apr 5, 2026
diff --git a/.jules/sentinel.md b/.jules/sentinel.md
@@ -7,3 +7,8 @@
 **Vulnerability:** The `CoverLetterGenerator` used a standard Jinja2 environment (intended for HTML/XML or plain text) to render LaTeX templates. This allowed malicious user input (or AI hallucinations) containing LaTeX control characters (e.g., `\input{...}`) to be injected directly into the LaTeX source, leading to potential Local File Inclusion (LFI) or other exploits.
 **Learning:** Jinja2's default `autoescape` is context-aware based on file extensions, but usually only for HTML/XML. It does NOT automatically escape LaTeX special characters. Relying on manual filters (like `| latex_escape`) in templates is error-prone and brittle, as developers might forget to apply them to every variable.
 **Prevention:** Always use a dedicated Jinja2 environment for LaTeX generation that enforces auto-escaping via a `finalize` hook (e.g., `tex_env.finalize = latex_escape`). This ensures *all* variable output is sanitized by default, providing defense-in-depth even if the template author forgets explicit filters.
+
+## 2025-02-23 - [Critical] Server-Side Request Forgery in Job Parsing
+**Vulnerability:** The `JobParser` fetched user-provided job posting URLs without validating the resolved IP address, allowing an attacker to make the server perform HTTP requests to internal, loopback, or private network addresses (e.g., `http://169.254.169.254`).
+**Learning:** Checking the URL string is not sufficient because DNS resolution can point an external-looking hostname to an internal IP address (DNS rebinding / SSRF). Also, following redirects transparently via `requests` bypasses initial URL checks if the redirect points to an internal address.
+**Prevention:** Implement safe URL fetching by manually resolving hostnames to IP addresses using `socket.getaddrinfo`, and asserting that they are not restricted IPs using the `ipaddress` module (`ip.is_private`, `ip.is_loopback`, etc.). Turn off automatic redirects (`allow_redirects=False`) and manually validate the new `Location` target for each redirect step.
diff --git a/cli/integrations/job_parser.py b/cli/integrations/job_parser.py
@@ -15,8 +15,11 @@
 """
 
 import hashlib
+import ipaddress
 import json
 import re
+import socket
+import urllib.parse
 from dataclasses import asdict, dataclass, field
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple
@@ -221,9 +224,7 @@ def parse_from_url(self, url: str) -> JobDetails:
 
         # Fetch and parse
         try:
-
-            headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}
-            response = requests.get(url, headers=headers, timeout=30)
+            response = self._fetch_url_safe(url)
             response.raise_for_status()
 
             job_details = self._parse_html(response.text)
@@ -234,12 +235,72 @@ def parse_from_url(self, url: str) -> JobDetails:
 
             return job_details
 
-        except ImportError:
+        except ImportError as e:
             raise NotImplementedError(
                 "URL fetching requires 'requests' library. Install with: pip install requests"
             )
-        except requests.RequestException as e:
+        except (RuntimeError, ValueError) as e:
             raise RuntimeError(f"Failed to fetch URL: {e}")
+        except Exception as e:
+            if requests is not None and isinstance(e, requests.RequestException):
+                raise RuntimeError(f"Failed to fetch URL: {e}")
+            raise
+
+    def _fetch_url_safe(self, url: str):
+        """
+        Safely fetch a URL with SSRF protection.
+
+        Args:
+            url: URL to fetch
+
+        Returns:
+            requests.Response object
+
+        Raises:
+            ImportError: If requests is not installed
+            ValueError: If URL is invalid or unsafe
+            RuntimeError: If too many redirects
+        """
+        if requests is None:
+            raise ImportError("requests library is required for URL fetching")
+
+        headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}
+        current_url = url
+        max_redirects = 5
+
+        for _ in range(max_redirects):
+            parsed = urllib.parse.urlparse(current_url)
+            if parsed.scheme not in ("http", "https"):
+                raise ValueError(f"Invalid URL scheme: {parsed.scheme}")
+
+            try:
+                addr_info = socket.getaddrinfo(parsed.hostname, None)
+            except socket.gaierror:
+                raise ValueError(f"Could not resolve hostname: {parsed.hostname}")
+
+            for addr in addr_info:
+                ip_str = addr[4][0]
+                ip = ipaddress.ip_address(ip_str)
+                # Check for IPv4-mapped IPv6 addresses (e.g. ::ffff:127.0.0.1)
+                if hasattr(ip, 'ipv4_mapped') and ip.ipv4_mapped:
+                    ip = ip.ipv4_mapped
+
+                if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
+                    raise ValueError(f"URL resolves to restricted IP: {ip}")
+
+            response = requests.get(
+                current_url, headers=headers, timeout=30, allow_redirects=False
+            )
+
+            if response.is_redirect:
+                location = response.headers.get("Location")
+                if not location:
+                    raise ValueError("Redirect without Location header")
+                current_url = urllib.parse.urljoin(current_url, location)
+            else:
+                return response
+
+        raise RuntimeError("Too many redirects")
 
     def _parse_html(self, html: str) -> JobDetails:
         """