🛡️ Sentinel: Fix SSRF via DNS resolution in URL validation#72
🛡️ Sentinel: Fix SSRF via DNS resolution in URL validation#72google-labs-jules[bot] wants to merge 11 commits intomainfrom
Conversation
Modified `validate_folder_url` in `main.py` to resolve domain names using `socket.getaddrinfo`. This prevents SSRF attacks where a malicious domain resolves to a private IP address (e.g., localhost). The check now verifies that all resolved IPs are not private or loopback addresses before allowing the URL. Fail-closed behavior is implemented for DNS resolution failures. Security Journal updated with this finding.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
This PR fixes an SSRF vulnerability in URL validation by adding DNS resolution checks to prevent attackers from using domain names that resolve to private IP addresses to bypass security controls.
Key Changes:
- Added DNS resolution using
socket.getaddrinfo()to validate that domain names don't resolve to private/loopback IPs - Updated exception handling to treat DNS resolution failures as unsafe (fail-closed approach)
- Documented the vulnerability and prevention approach in the Sentinel security log
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| main.py | Added socket import and DNS resolution logic in validate_folder_url to check if domains resolve to private/loopback IPs |
| .jules/sentinel.md | Documented the SSRF via DNS resolution vulnerability, learning, and prevention measures |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
main.py
Outdated
| if resolved_ip.is_private or resolved_ip.is_loopback: | ||
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to private IP {resolved_ip}): {sanitize_for_log(url)}") | ||
| return False | ||
| except socket.gaierror: |
There was a problem hiding this comment.
The exception handling only catches socket.gaierror, but socket.getaddrinfo() can also raise other exceptions such as OSError (for network issues) or socket.timeout. While these would be caught by the outer exception handler at line 227, they would result in a generic error message rather than being treated as DNS resolution failures. Consider catching (socket.gaierror, OSError) for more comprehensive error handling.
| except socket.gaierror: | |
| except (socket.gaierror, OSError): |
main.py
Outdated
| for family, kind, proto, canonname, sockaddr in addr_infos: | ||
| ip_str = sockaddr[0] | ||
| resolved_ip = ipaddress.ip_address(ip_str) | ||
| if resolved_ip.is_private or resolved_ip.is_loopback: |
There was a problem hiding this comment.
The security check is incomplete. In addition to is_private and is_loopback, you should also check for other dangerous IP ranges:
is_link_local: Covers IPv6 link-local addresses (fe80::/10) and IPv4 link-local (169.254.0.0/16)is_reserved: Includes reserved IP ranges that shouldn't be accessedis_multicast: Multicast addresses could potentially be abused
Consider updating the condition to: if resolved_ip.is_private or resolved_ip.is_loopback or resolved_ip.is_link_local or resolved_ip.is_reserved or resolved_ip.is_multicast:
| try: | ||
| # Resolve hostname to IPs | ||
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: | ||
| ip_str = sockaddr[0] | ||
| resolved_ip = ipaddress.ip_address(ip_str) | ||
| if resolved_ip.is_private or resolved_ip.is_loopback: | ||
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to private IP {resolved_ip}): {sanitize_for_log(url)}") | ||
| return False |
There was a problem hiding this comment.
This implementation has a Time-Of-Check-Time-Of-Use (TOCTOU) vulnerability. DNS resolution happens during validation, but the actual HTTP request occurs later. An attacker could exploit DNS rebinding by having their DNS server return a safe IP during validation and then a private IP when the actual request is made moments later. While this is a known limitation of DNS-based SSRF protection, consider documenting this limitation and potentially adding a note about DNS caching or re-validation closer to the actual request time if feasible.
main.py
Outdated
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: | ||
| ip_str = sockaddr[0] | ||
| resolved_ip = ipaddress.ip_address(ip_str) |
There was a problem hiding this comment.
For IPv6 addresses with zone IDs (e.g., fe80::1%eth0), the sockaddr[0] extraction might include the zone identifier, which could cause ipaddress.ip_address() to raise a ValueError. While this would be caught by the outer exception handler, it would result in an unclear error message. Consider stripping the zone identifier if present by using ip_str.split('%')[0] before passing to ipaddress.ip_address().
| resolved_ip = ipaddress.ip_address(ip_str) | |
| # Strip IPv6 zone identifier (e.g., "%eth0") if present | |
| ip_no_zone = ip_str.split('%', 1)[0] | |
| resolved_ip = ipaddress.ip_address(ip_no_zone) |
|
Hi @copilot, please open a new pull request to apply changes based on the comments in this thread. Thank you! |
|
@abhimehro I've opened a new pull request, #85, to work on those changes. Once the pull request is ready, I'll request review from you. |
| try: | ||
| # Resolve hostname to IPs | ||
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: |
Check warning
Code scanning / Prospector (reported by Codacy)
Unused variable 'family' (unused-variable)
main.py
Outdated
| ip_str = sockaddr[0] | ||
| resolved_ip = ipaddress.ip_address(ip_str) | ||
| if resolved_ip.is_private or resolved_ip.is_loopback: | ||
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to private IP {resolved_ip}): {sanitize_for_log(url)}") |
Check warning
Code scanning / Prospector (reported by Codacy)
Use lazy % formatting in logging functions (logging-fstring-interpolation)
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to private IP {resolved_ip}): {sanitize_for_log(url)}") | ||
| return False | ||
| except socket.gaierror: | ||
| log.warning(f"Could not resolve hostname {sanitize_for_log(hostname)}, treating as unsafe.") |
Check warning
Code scanning / Prospector (reported by Codacy)
Use lazy % formatting in logging functions (logging-fstring-interpolation)
| 1. Parse URLs and check hostnames against `localhost` and private IP ranges using `ipaddress` module. | ||
| 2. Enforce strict length limits on user inputs (e.g., profile IDs) to prevent resource exhaustion or buffer abuse. | ||
|
|
||
| ## 2025-01-24 - [SSRF via DNS Resolution] |
Check notice
Code scanning / Remark-lint (reported by Codacy)
Warn when references to undefined definitions are found.
| 1. Parse URLs and check hostnames against `localhost` and private IP ranges using `ipaddress` module. | ||
| 2. Enforce strict length limits on user inputs (e.g., profile IDs) to prevent resource exhaustion or buffer abuse. | ||
|
|
||
| ## 2025-01-24 - [SSRF via DNS Resolution] |
Check notice
Code scanning / Remark-lint (reported by Codacy)
Warn when shortcut reference links are used.
main.py
Outdated
| ip_str = sockaddr[0] | ||
| resolved_ip = ipaddress.ip_address(ip_str) | ||
| if resolved_ip.is_private or resolved_ip.is_loopback: | ||
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to private IP {resolved_ip}): {sanitize_for_log(url)}") |
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (159/100)
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to private IP {resolved_ip}): {sanitize_for_log(url)}") | ||
| return False | ||
| except socket.gaierror: | ||
| log.warning(f"Could not resolve hostname {sanitize_for_log(hostname)}, treating as unsafe.") |
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (108/100)
| try: | ||
| # Resolve hostname to IPs | ||
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: |
Check notice
Code scanning / Pylint (reported by Codacy)
Unused variable 'canonname'
| try: | ||
| # Resolve hostname to IPs | ||
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: |
Check notice
Code scanning / Pylint (reported by Codacy)
Unused variable 'proto'
| try: | ||
| # Resolve hostname to IPs | ||
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: |
Check notice
Code scanning / Pylint (reported by Codacy)
Unused variable 'kind'
| try: | ||
| # Resolve hostname to IPs | ||
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: |
Check notice
Code scanning / Pylint (reported by Codacy)
Unused variable 'family'
main.py
Outdated
| ip_str = sockaddr[0] | ||
| resolved_ip = ipaddress.ip_address(ip_str) | ||
| if resolved_ip.is_private or resolved_ip.is_loopback: | ||
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to private IP {resolved_ip}): {sanitize_for_log(url)}") |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (159/100)
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to private IP {resolved_ip}): {sanitize_for_log(url)}") | ||
| return False | ||
| except socket.gaierror: | ||
| log.warning(f"Could not resolve hostname {sanitize_for_log(hostname)}, treating as unsafe.") |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (108/100)
| try: | ||
| # Resolve hostname to IPs | ||
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused variable 'canonname'
| try: | ||
| # Resolve hostname to IPs | ||
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused variable 'kind'
| try: | ||
| # Resolve hostname to IPs | ||
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused variable 'proto'
| try: | ||
| # Resolve hostname to IPs | ||
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused variable 'family'
main.py
Outdated
| ip_str = sockaddr[0] | ||
| resolved_ip = ipaddress.ip_address(ip_str) | ||
| if resolved_ip.is_private or resolved_ip.is_loopback: | ||
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to private IP {resolved_ip}): {sanitize_for_log(url)}") |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Use lazy % formatting in logging functions
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to private IP {resolved_ip}): {sanitize_for_log(url)}") | ||
| return False | ||
| except socket.gaierror: | ||
| log.warning(f"Could not resolve hostname {sanitize_for_log(hostname)}, treating as unsafe.") |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Use lazy % formatting in logging functions
I have implemented the requested changes from the review thread:
|
Modified `validate_folder_url` in `main.py` to resolve domain names using `socket.getaddrinfo`. This prevents SSRF attacks where a malicious domain resolves to a private or restricted IP address. Key changes: - Added `socket` import and DNS resolution logic. - Validates resolved IPs against private, loopback, link-local, reserved, and multicast ranges. - Implemented secure fail-closed behavior for DNS resolution errors (`socket.gaierror`, `OSError`). - Added handling for IPv6 zone identifiers. - Added code comment acknowledging TOCTOU limitations (DNS rebinding). - Updated security journal.
- Enhanced exception handling to catch OSError in addition to socket.gaierror - Added comprehensive IP range checks (link_local, reserved, multicast) - Strip IPv6 zone identifiers before validation - Document TOCTOU vulnerability limitation in code comments and sentinel.md Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
…copilot/sub-pr-72
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| addr_infos = socket.getaddrinfo(hostname, None) | ||
| for family, kind, proto, canonname, sockaddr in addr_infos: | ||
| ip_str = sockaddr[0] | ||
|
|
||
| # Strip IPv6 zone identifier if present (e.g., fe80::1%eth0) | ||
| if '%' in ip_str: | ||
| ip_str = ip_str.split('%', 1)[0] | ||
|
|
||
| resolved_ip = ipaddress.ip_address(ip_str) | ||
|
|
||
| if (resolved_ip.is_private or | ||
| resolved_ip.is_loopback or | ||
| resolved_ip.is_link_local or | ||
| resolved_ip.is_reserved or | ||
| resolved_ip.is_multicast): | ||
| log.warning(f"Skipping unsafe URL (domain {sanitize_for_log(hostname)} resolves to restricted IP {resolved_ip}): {sanitize_for_log(url)}") | ||
| return False |
There was a problem hiding this comment.
If socket.getaddrinfo returns an empty list (which is theoretically possible in edge cases), the loop will not execute and the function will proceed to return True at line 243, allowing the potentially unsafe URL. While this scenario is unlikely in practice, consider adding an explicit check after the getaddrinfo call to ensure at least one address was resolved, or track whether any iteration occurred to ensure the validation actually ran.
main.py
Outdated
| if '%' in ip_str: | ||
| ip_str = ip_str.split('%', 1)[0] | ||
|
|
||
| resolved_ip = ipaddress.ip_address(ip_str) |
There was a problem hiding this comment.
The exception handling for ipaddress.ip_address(ip_str) on line 226 is not explicitly caught. If the resolved IP address string is malformed or invalid, this could raise a ValueError that would be caught by the outer exception handler on line 239, but it would not be logged with a specific message. Consider adding explicit exception handling around the ipaddress.ip_address() call to log parsing errors more clearly, or ensure that any IP string format issues from socket.getaddrinfo are properly handled.
| resolved_ip = ipaddress.ip_address(ip_str) | |
| try: | |
| resolved_ip = ipaddress.ip_address(ip_str) | |
| except ValueError as ve: | |
| log.warning( | |
| f"Failed to parse resolved IP {sanitize_for_log(ip_str)} for hostname " | |
| f"{sanitize_for_log(hostname)}: {ve}. Treating URL as unsafe." | |
| ) | |
| return False |
Enhance SSRF DNS validation with comprehensive IP checks and IPv6 support
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Superseded: Security improvements already in main. |
Understood. Acknowledging that this work is now superseded and stopping work on this task. |
Implemented DNS resolution check in
validate_folder_urlto prevent SSRF via domain names resolving to private IPs.PR created automatically by Jules for task 5407761294063338854 started by @abhimehro