π‘οΈ Sentinel: [CRITICAL] Fix SSRF in JobParser URL Fetching#231
π‘οΈ Sentinel: [CRITICAL] Fix SSRF in JobParser URL Fetching#231
Conversation
β¦Parser The `JobParser`'s URL fetcher allowed retrieving internal and restricted network IPs because it relied on `requests.get` to follow redirects and perform fetches without prior host IP authorization. We implemented `_fetch_url_safe` to securely parse the URL scheme, resolve the IP via `socket.getaddrinfo`, properly check if it targets private/loopback domains via `ipaddress`, and safely track redirects, throwing errors to protect against SSRF bypasses. Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
|
π Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a π emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideAdds SSRF-safe URL fetching to JobParser by routing all HTTP requests through a new _fetch_url_safe helper that validates resolved IPs and manually handles redirects, and documents the incident in the Sentinel security journal. Sequence diagram for JobParser SSRF-safe URL fetchingsequenceDiagram
actor User
participant CLI as CLI_Tool
participant JP as JobParser
participant Safe as _fetch_url_safe
participant DNS as socket_getaddrinfo
participant IP as ipaddress_checks
participant HTTP as requests
User->>CLI: invoke job parsing with URL
CLI->>JP: parse_from_url(url)
JP->>Safe: _fetch_url_safe(url)
loop up to max_redirects
Safe->>Safe: urllib_parse_urlparse(current_url)
Safe->>DNS: getaddrinfo(hostname)
DNS-->>Safe: addr_info list or error
alt DNS resolution fails
DNS-->>Safe: socket_gaierror
Safe-->>JP: raise ValueError
JP-->>CLI: RuntimeError Failed to fetch URL
CLI-->>User: error message
else DNS resolution succeeds
Safe->>IP: validate each resolved ip_address
alt IP is private or loopback or link_local or reserved
IP-->>Safe: restricted IP detected
Safe-->>JP: raise ValueError
JP-->>CLI: RuntimeError Failed to fetch URL
CLI-->>User: error message
else IP is allowed
Safe->>HTTP: requests_get(current_url, allow_redirects False)
HTTP-->>Safe: response
alt response is redirect
Safe->>Safe: read Location header
alt Location missing
Safe-->>JP: raise ValueError
JP-->>CLI: RuntimeError Failed to fetch URL
CLI-->>User: error message
else Location present
Safe->>Safe: urllib_parse_urljoin(current_url, Location)
Safe->>Safe: update current_url
end
else response not redirect
Safe-->>JP: return response
JP->>JP: response_raise_for_status()
JP->>JP: _parse_html(response_text)
JP-->>CLI: JobDetails
CLI-->>User: parsed job details
end
end
end
end
alt exceeded max_redirects
Safe-->>JP: raise RuntimeError Too many redirects
JP-->>CLI: RuntimeError Failed to fetch URL
CLI-->>User: error message
end
Class diagram for JobParser with SSRF-safe URL fetchingclassDiagram
class JobParser {
+parse_from_url(url str) JobDetails
-_fetch_url_safe(url str) Response
-_parse_html(html str) JobDetails
}
class RequestsLibrary {
+get(url str, headers dict, timeout int, allow_redirects bool) Response
+RequestException
}
class Response {
+text str
+headers dict
+is_redirect bool
+raise_for_status() void
}
class URLParsing {
+urlparse(url str) ParsedURL
+urljoin(base str, location str) str
}
class ParsedURL {
+scheme str
+hostname str
}
class SocketModule {
+getaddrinfo(hostname str, port int) list
+gaierror
}
class IPAddressModule {
+ip_address(address str) IPAddress
}
class IPAddress {
+ipv4_mapped IPAddress
+is_private bool
+is_loopback bool
+is_link_local bool
+is_reserved bool
}
JobParser ..> RequestsLibrary : uses
JobParser ..> URLParsing : uses
JobParser ..> SocketModule : uses
JobParser ..> IPAddressModule : uses
RequestsLibrary o-- Response
URLParsing o-- ParsedURL
IPAddressModule o-- IPAddress
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
_fetch_url_safe, consider explicitly validatingparsed.hostname(e.g., reject URLs without a hostname such as malformed or relative URLs) before callingsocket.getaddrinfoto avoidNonebeing passed into DNS resolution. - When checking IP safety, you currently block private/loopback/link-local/reserved; you may also want to explicitly reject multicast and unspecified/broadcast addresses (e.g., using
ip.is_multicast,ip.is_unspecified, andip.is_globalchecks) to make the IP filtering more comprehensive. - The error handling in
parse_from_urlmixes broadExceptioncatching with type checks onrequests.RequestException; simplifying this into a narrower set of exception branches (e.g., letting_fetch_url_safeconvertrequestsexceptions into your own types) would make the control flow easier to follow and reason about.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_fetch_url_safe`, consider explicitly validating `parsed.hostname` (e.g., reject URLs without a hostname such as malformed or relative URLs) before calling `socket.getaddrinfo` to avoid `None` being passed into DNS resolution.
- When checking IP safety, you currently block private/loopback/link-local/reserved; you may also want to explicitly reject multicast and unspecified/broadcast addresses (e.g., using `ip.is_multicast`, `ip.is_unspecified`, and `ip.is_global` checks) to make the IP filtering more comprehensive.
- The error handling in `parse_from_url` mixes broad `Exception` catching with type checks on `requests.RequestException`; simplifying this into a narrower set of exception branches (e.g., letting `_fetch_url_safe` convert `requests` exceptions into your own types) would make the control flow easier to follow and reason about.
## Individual Comments
### Comment 1
<location path="cli/integrations/job_parser.py" line_range="271-280" />
<code_context>
+ for _ in range(max_redirects):
</code_context>
<issue_to_address>
**suggestion:** Include redirect count in the 'too many redirects' RuntimeError for better diagnostics.
Currently all redirect overflows raise the same generic message. Consider including `max_redirects` (and optionally the last `current_url`) in the `RuntimeError` to aid debugging, e.g. `raise RuntimeError(f"Too many redirects (>{max_redirects}) while fetching {current_url}")`. This improves diagnostics without changing behavior.
Suggested implementation:
```python
for _ in range(max_redirects):
parsed = urllib.parse.urlparse(current_url)
if parsed.scheme not in ("http", "https"):
raise ValueError(f"Invalid URL scheme: {parsed.scheme}")
try:
addr_info = socket.getaddrinfo(parsed.hostname, None)
except socket.gaierror:
raise ValueError(f"Could not resolve hostname: {parsed.hostname}")
for addr in addr_info:
```
```python
raise RuntimeError(f"Too many redirects (>{max_redirects}) while fetching {current_url}")
```
I assumed there is an existing `raise RuntimeError("Too many redirects")` after the redirect loop. If the exact message or placement differs, update the `SEARCH` block accordingly to match the current `RuntimeError` line that is raised when the redirect loop exits without returning a result, and replace it with the formatted f-string version shown above.
</issue_to_address>Help me be more useful! Please click π or π on each comment and I'll use the feedback to improve your reviews.
| for _ in range(max_redirects): | ||
| parsed = urllib.parse.urlparse(current_url) | ||
| if parsed.scheme not in ("http", "https"): | ||
| raise ValueError(f"Invalid URL scheme: {parsed.scheme}") | ||
|
|
||
| try: | ||
| addr_info = socket.getaddrinfo(parsed.hostname, None) | ||
| except socket.gaierror: | ||
| raise ValueError(f"Could not resolve hostname: {parsed.hostname}") | ||
|
|
There was a problem hiding this comment.
suggestion: Include redirect count in the 'too many redirects' RuntimeError for better diagnostics.
Currently all redirect overflows raise the same generic message. Consider including max_redirects (and optionally the last current_url) in the RuntimeError to aid debugging, e.g. raise RuntimeError(f"Too many redirects (>{max_redirects}) while fetching {current_url}"). This improves diagnostics without changing behavior.
Suggested implementation:
for _ in range(max_redirects):
parsed = urllib.parse.urlparse(current_url)
if parsed.scheme not in ("http", "https"):
raise ValueError(f"Invalid URL scheme: {parsed.scheme}")
try:
addr_info = socket.getaddrinfo(parsed.hostname, None)
except socket.gaierror:
raise ValueError(f"Could not resolve hostname: {parsed.hostname}")
for addr in addr_info: raise RuntimeError(f"Too many redirects (>{max_redirects}) while fetching {current_url}")I assumed there is an existing raise RuntimeError("Too many redirects") after the redirect loop. If the exact message or placement differs, update the SEARCH block accordingly to match the current RuntimeError line that is raised when the redirect loop exits without returning a result, and replace it with the formatted f-string version shown above.
π¨ Severity: CRITICAL
π‘ Vulnerability: Server-Side Request Forgery (SSRF) in the
JobParser.parse_from_urlmethod.π― Impact: An attacker could craft an HTTP/HTTPS URL that directs the server to query internal services (e.g. AWS metadata endpoint
http://169.254.169.254or internal admin endpoints atlocalhost/127.0.0.1), leaking potentially highly sensitive application or server secrets, leading to system compromise.π§ Fix: Introduced the
_fetch_url_safemethod. We intercept URL calls, resolve the hostname usingsocket.getaddrinfo, and run it throughipaddress.ip_address(...)checks to verify it does not fall underis_private,is_loopback,is_link_local, oris_reservedspaces. Additionally, we enforceallow_redirects=Falseinrequests.getto perform the same SSRF validations on any intermediate redirect paths up to 5 hops, preventing redirect bypasses.β Verification: Verified by passing local job-parser integration and tracking tests. Simulated fetching
127.0.0.1locally via temporary test files to ensure theRuntimeErrorwas correctly trapped before any connection was opened.PR created automatically by Jules for task 14987003762701019456 started by @anchapin
Summary by Sourcery
Add SSRF-safe URL fetching to JobParser and document the vulnerability and mitigation in the security notes.
Bug Fixes:
Enhancements:
Documentation: