Skip to content

πŸ›‘οΈ Sentinel: [CRITICAL] Fix SSRF in JobParser URL Fetching#231

Open
anchapin wants to merge 1 commit intomainfrom
sentinel-fix-job-parser-ssrf-14987003762701019456
Open

πŸ›‘οΈ Sentinel: [CRITICAL] Fix SSRF in JobParser URL Fetching#231
anchapin wants to merge 1 commit intomainfrom
sentinel-fix-job-parser-ssrf-14987003762701019456

Conversation

@anchapin
Copy link
Copy Markdown
Owner

@anchapin anchapin commented Apr 5, 2026

🚨 Severity: CRITICAL
πŸ’‘ Vulnerability: Server-Side Request Forgery (SSRF) in the JobParser.parse_from_url method.
🎯 Impact: An attacker could craft an HTTP/HTTPS URL that directs the server to query internal services (e.g. AWS metadata endpoint http://169.254.169.254 or internal admin endpoints at localhost/127.0.0.1), leaking potentially highly sensitive application or server secrets, leading to system compromise.
πŸ”§ Fix: Introduced the _fetch_url_safe method. We intercept URL calls, resolve the hostname using socket.getaddrinfo, and run it through ipaddress.ip_address(...) checks to verify it does not fall under is_private, is_loopback, is_link_local, or is_reserved spaces. Additionally, we enforce allow_redirects=False in requests.get to perform the same SSRF validations on any intermediate redirect paths up to 5 hops, preventing redirect bypasses.
βœ… Verification: Verified by passing local job-parser integration and tracking tests. Simulated fetching 127.0.0.1 locally via temporary test files to ensure the RuntimeError was correctly trapped before any connection was opened.


PR created automatically by Jules for task 14987003762701019456 started by @anchapin

Summary by Sourcery

Add SSRF-safe URL fetching to JobParser and document the vulnerability and mitigation in the security notes.

Bug Fixes:

  • Prevent JobParser from fetching URLs that resolve to private, loopback, link-local, or reserved IP addresses, including across HTTP redirects.

Enhancements:

  • Introduce a dedicated _fetch_url_safe helper that enforces URL scheme validation, bounded redirect handling, and stricter error handling for URL fetching in JobParser.

Documentation:

  • Extend sentinel security documentation with details of the JobParser SSRF vulnerability, its root cause, and the applied prevention strategy.

…Parser

The `JobParser`'s URL fetcher allowed retrieving internal and restricted network IPs because it relied on `requests.get` to follow redirects and perform fetches without prior host IP authorization.
We implemented `_fetch_url_safe` to securely parse the URL scheme, resolve the IP via `socket.getaddrinfo`, properly check if it targets private/loopback domains via `ipaddress`, and safely track redirects, throwing errors to protect against SSRF bypasses.

Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

πŸ‘‹ Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a πŸ‘€ emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Apr 5, 2026

Reviewer's Guide

Adds SSRF-safe URL fetching to JobParser by routing all HTTP requests through a new _fetch_url_safe helper that validates resolved IPs and manually handles redirects, and documents the incident in the Sentinel security journal.

Sequence diagram for JobParser SSRF-safe URL fetching

sequenceDiagram
    actor User
    participant CLI as CLI_Tool
    participant JP as JobParser
    participant Safe as _fetch_url_safe
    participant DNS as socket_getaddrinfo
    participant IP as ipaddress_checks
    participant HTTP as requests

    User->>CLI: invoke job parsing with URL
    CLI->>JP: parse_from_url(url)
    JP->>Safe: _fetch_url_safe(url)

    loop up to max_redirects
        Safe->>Safe: urllib_parse_urlparse(current_url)
        Safe->>DNS: getaddrinfo(hostname)
        DNS-->>Safe: addr_info list or error
        alt DNS resolution fails
            DNS-->>Safe: socket_gaierror
            Safe-->>JP: raise ValueError
            JP-->>CLI: RuntimeError Failed to fetch URL
            CLI-->>User: error message
        else DNS resolution succeeds
            Safe->>IP: validate each resolved ip_address
            alt IP is private or loopback or link_local or reserved
                IP-->>Safe: restricted IP detected
                Safe-->>JP: raise ValueError
                JP-->>CLI: RuntimeError Failed to fetch URL
                CLI-->>User: error message
            else IP is allowed
                Safe->>HTTP: requests_get(current_url, allow_redirects False)
                HTTP-->>Safe: response
                alt response is redirect
                    Safe->>Safe: read Location header
                    alt Location missing
                        Safe-->>JP: raise ValueError
                        JP-->>CLI: RuntimeError Failed to fetch URL
                        CLI-->>User: error message
                    else Location present
                        Safe->>Safe: urllib_parse_urljoin(current_url, Location)
                        Safe->>Safe: update current_url
                    end
                else response not redirect
                    Safe-->>JP: return response
                    JP->>JP: response_raise_for_status()
                    JP->>JP: _parse_html(response_text)
                    JP-->>CLI: JobDetails
                    CLI-->>User: parsed job details
                end
            end
        end
    end

    alt exceeded max_redirects
        Safe-->>JP: raise RuntimeError Too many redirects
        JP-->>CLI: RuntimeError Failed to fetch URL
        CLI-->>User: error message
    end
Loading

Class diagram for JobParser with SSRF-safe URL fetching

classDiagram
    class JobParser {
        +parse_from_url(url str) JobDetails
        -_fetch_url_safe(url str) Response
        -_parse_html(html str) JobDetails
    }

    class RequestsLibrary {
        +get(url str, headers dict, timeout int, allow_redirects bool) Response
        +RequestException
    }

    class Response {
        +text str
        +headers dict
        +is_redirect bool
        +raise_for_status() void
    }

    class URLParsing {
        +urlparse(url str) ParsedURL
        +urljoin(base str, location str) str
    }

    class ParsedURL {
        +scheme str
        +hostname str
    }

    class SocketModule {
        +getaddrinfo(hostname str, port int) list
        +gaierror
    }

    class IPAddressModule {
        +ip_address(address str) IPAddress
    }

    class IPAddress {
        +ipv4_mapped IPAddress
        +is_private bool
        +is_loopback bool
        +is_link_local bool
        +is_reserved bool
    }

    JobParser ..> RequestsLibrary : uses
    JobParser ..> URLParsing : uses
    JobParser ..> SocketModule : uses
    JobParser ..> IPAddressModule : uses
    RequestsLibrary o-- Response
    URLParsing o-- ParsedURL
    IPAddressModule o-- IPAddress
Loading

File-Level Changes

Change Details Files
Route JobParser URL fetching through a hardened helper that performs SSRF protections and refines error handling.
  • Replace direct requests.get in parse_from_url with a call to self._fetch_url_safe
  • Broaden and reclassify exceptions so ImportError, validation/SSRF errors, and requests-specific failures are surfaced as RuntimeError with clearer messaging while preserving non-requests exceptions
cli/integrations/job_parser.py
Introduce _fetch_url_safe to validate URL schemes, resolve hostnames, block restricted IP ranges, and perform bounded, validated manual redirects.
  • Validate that only http/https schemes are allowed and ensure requests is installed before fetching
  • Resolve hostnames with socket.getaddrinfo and reject addresses that are private, loopback, link-local, reserved, or IPv4-mapped loopback/private IPv6
  • Disable automatic redirects in requests.get and manually follow up to 5 redirects, validating each Location target and failing on missing Location or redirect loops
cli/integrations/job_parser.py
Document the SSRF vulnerability and remediation in the Sentinel security log.
  • Add a new 2025-02-23 Sentinel entry describing the JobParser SSRF issue, its root cause, and the safe URL fetching pattern for prevention
.jules/sentinel.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In _fetch_url_safe, consider explicitly validating parsed.hostname (e.g., reject URLs without a hostname such as malformed or relative URLs) before calling socket.getaddrinfo to avoid None being passed into DNS resolution.
  • When checking IP safety, you currently block private/loopback/link-local/reserved; you may also want to explicitly reject multicast and unspecified/broadcast addresses (e.g., using ip.is_multicast, ip.is_unspecified, and ip.is_global checks) to make the IP filtering more comprehensive.
  • The error handling in parse_from_url mixes broad Exception catching with type checks on requests.RequestException; simplifying this into a narrower set of exception branches (e.g., letting _fetch_url_safe convert requests exceptions into your own types) would make the control flow easier to follow and reason about.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_fetch_url_safe`, consider explicitly validating `parsed.hostname` (e.g., reject URLs without a hostname such as malformed or relative URLs) before calling `socket.getaddrinfo` to avoid `None` being passed into DNS resolution.
- When checking IP safety, you currently block private/loopback/link-local/reserved; you may also want to explicitly reject multicast and unspecified/broadcast addresses (e.g., using `ip.is_multicast`, `ip.is_unspecified`, and `ip.is_global` checks) to make the IP filtering more comprehensive.
- The error handling in `parse_from_url` mixes broad `Exception` catching with type checks on `requests.RequestException`; simplifying this into a narrower set of exception branches (e.g., letting `_fetch_url_safe` convert `requests` exceptions into your own types) would make the control flow easier to follow and reason about.

## Individual Comments

### Comment 1
<location path="cli/integrations/job_parser.py" line_range="271-280" />
<code_context>
+        for _ in range(max_redirects):
</code_context>
<issue_to_address>
**suggestion:** Include redirect count in the 'too many redirects' RuntimeError for better diagnostics.

Currently all redirect overflows raise the same generic message. Consider including `max_redirects` (and optionally the last `current_url`) in the `RuntimeError` to aid debugging, e.g. `raise RuntimeError(f"Too many redirects (>{max_redirects}) while fetching {current_url}")`. This improves diagnostics without changing behavior.

Suggested implementation:

```python
        for _ in range(max_redirects):
            parsed = urllib.parse.urlparse(current_url)
            if parsed.scheme not in ("http", "https"):
                raise ValueError(f"Invalid URL scheme: {parsed.scheme}")

            try:
                addr_info = socket.getaddrinfo(parsed.hostname, None)
            except socket.gaierror:
                raise ValueError(f"Could not resolve hostname: {parsed.hostname}")

            for addr in addr_info:

```

```python
        raise RuntimeError(f"Too many redirects (>{max_redirects}) while fetching {current_url}")

```

I assumed there is an existing `raise RuntimeError("Too many redirects")` after the redirect loop. If the exact message or placement differs, update the `SEARCH` block accordingly to match the current `RuntimeError` line that is raised when the redirect loop exits without returning a result, and replace it with the formatted f-string version shown above.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click πŸ‘ or πŸ‘Ž on each comment and I'll use the feedback to improve your reviews.

Comment on lines +271 to +280
for _ in range(max_redirects):
parsed = urllib.parse.urlparse(current_url)
if parsed.scheme not in ("http", "https"):
raise ValueError(f"Invalid URL scheme: {parsed.scheme}")

try:
addr_info = socket.getaddrinfo(parsed.hostname, None)
except socket.gaierror:
raise ValueError(f"Could not resolve hostname: {parsed.hostname}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Include redirect count in the 'too many redirects' RuntimeError for better diagnostics.

Currently all redirect overflows raise the same generic message. Consider including max_redirects (and optionally the last current_url) in the RuntimeError to aid debugging, e.g. raise RuntimeError(f"Too many redirects (>{max_redirects}) while fetching {current_url}"). This improves diagnostics without changing behavior.

Suggested implementation:

        for _ in range(max_redirects):
            parsed = urllib.parse.urlparse(current_url)
            if parsed.scheme not in ("http", "https"):
                raise ValueError(f"Invalid URL scheme: {parsed.scheme}")

            try:
                addr_info = socket.getaddrinfo(parsed.hostname, None)
            except socket.gaierror:
                raise ValueError(f"Could not resolve hostname: {parsed.hostname}")

            for addr in addr_info:
        raise RuntimeError(f"Too many redirects (>{max_redirects}) while fetching {current_url}")

I assumed there is an existing raise RuntimeError("Too many redirects") after the redirect loop. If the exact message or placement differs, update the SEARCH block accordingly to match the current RuntimeError line that is raised when the redirect loop exits without returning a result, and replace it with the formatted f-string version shown above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant