🛡️ Sentinel: Harden SSRF protection and add security tests#145
🛡️ Sentinel: Harden SSRF protection and add security tests#145
Conversation
Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Merging to
|
Summary of ChangesHello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly hardens the SSRF protection mechanism by eliminating a potential TOCTOU vulnerability related to DNS caching and by introducing a robust suite of security tests. The changes ensure that URL validations are always performed immediately before use, preventing malicious DNS record changes from being exploited, and that the system correctly blocks various forms of unauthorized access attempts. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| """Security tests for SSRF protection in validate_folder_url.""" | ||
|
|
||
| def test_rejects_non_https(self): | ||
| assert main.validate_folder_url("http://example.com/foo") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
|
|
||
| def test_rejects_non_https(self): | ||
| assert main.validate_folder_url("http://example.com/foo") is False | ||
| assert main.validate_folder_url("ftp://example.com/foo") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| def test_rejects_non_https(self): | ||
| assert main.validate_folder_url("http://example.com/foo") is False | ||
| assert main.validate_folder_url("ftp://example.com/foo") is False | ||
| assert main.validate_folder_url("file:///etc/passwd") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| def test_accepts_valid_public_ip(self, mock_gai): | ||
| # Mock domain resolving to public IP (8.8.8.8) | ||
| mock_gai.side_effect = lambda host, *args, **kwargs: mock_getaddrinfo_ipv4("8.8.8.8") | ||
| assert main.validate_folder_url("https://example.com/foo") is True |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
|
|
||
| def test_rejects_localhost_literal(self): | ||
| # Test explicit localhost hostname check | ||
| assert main.validate_folder_url("https://localhost/foo") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| # IP literals are checked directly via ipaddress module, bypassing DNS | ||
| assert main.validate_folder_url("https://192.168.1.1/foo") is False | ||
| assert main.validate_folder_url("https://10.0.0.1/foo") is False | ||
| assert main.validate_folder_url("https://127.0.0.1/foo") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| assert main.validate_folder_url("https://192.168.1.1/foo") is False | ||
| assert main.validate_folder_url("https://10.0.0.1/foo") is False | ||
| assert main.validate_folder_url("https://127.0.0.1/foo") is False | ||
| assert main.validate_folder_url("https://[::1]/foo") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| assert main.validate_folder_url("https://[::1]/foo") is False | ||
|
|
||
| def test_accepts_ip_literal_public(self): | ||
| assert main.validate_folder_url("https://8.8.8.8/foo") is True |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
|
|
||
| def test_accepts_ip_literal_public(self): | ||
| assert main.validate_folder_url("https://8.8.8.8/foo") is True | ||
| assert main.validate_folder_url("https://[2001:4860:4860::8888]/foo") is True |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| def test_dns_resolution_failure_is_safe(self, mock_gai): | ||
| # Ensure that if DNS fails, we default to False (fail closed) | ||
| mock_gai.side_effect = socket.gaierror("Name or service not known") | ||
| assert main.validate_folder_url("https://nonexistent.com/foo") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
There was a problem hiding this comment.
Code Review
This pull request effectively hardens the application against Server-Side Request Forgery (SSRF) attacks by removing caching from the URL validation logic, which mitigates a Time-of-Check Time-of-Use (TOCTOU) vulnerability. The changes in main.py are correct and well-commented. The addition of a comprehensive security test suite in tests/test_security.py is a significant improvement, ensuring the SSRF protections are robust and verifiable. My review includes a few suggestions to further improve the new tests by using parametrization, which will enhance their readability and maintainability.
| def test_rejects_non_https(self): | ||
| assert main.validate_folder_url("http://example.com/foo") is False | ||
| assert main.validate_folder_url("ftp://example.com/foo") is False | ||
| assert main.validate_folder_url("file:///etc/passwd") is False |
There was a problem hiding this comment.
This test can be made more concise and easier to extend by using pytest.mark.parametrize. This converts multiple assertions into a data-driven test, which is cleaner and provides better failure messages for individual cases.
| def test_rejects_non_https(self): | |
| assert main.validate_folder_url("http://example.com/foo") is False | |
| assert main.validate_folder_url("ftp://example.com/foo") is False | |
| assert main.validate_folder_url("file:///etc/passwd") is False | |
| @pytest.mark.parametrize("url", [ | |
| "http://example.com/foo", | |
| "ftp://example.com/foo", | |
| "file:///etc/passwd", | |
| ]) | |
| def test_rejects_non_https(self, url): | |
| assert main.validate_folder_url(url) is False |
| @patch('socket.getaddrinfo') | ||
| def test_accepts_valid_public_ip(self, mock_gai): | ||
| # Mock domain resolving to public IP (8.8.8.8) | ||
| mock_gai.side_effect = lambda host, *args, **kwargs: mock_getaddrinfo_ipv4("8.8.8.8") |
There was a problem hiding this comment.
For a mock that should always return the same value regardless of arguments, using return_value is simpler and more direct than side_effect with a lambda. This improves readability.
| mock_gai.side_effect = lambda host, *args, **kwargs: mock_getaddrinfo_ipv4("8.8.8.8") | |
| mock_gai.return_value = mock_getaddrinfo_ipv4("8.8.8.8") |
| @patch('socket.getaddrinfo') | ||
| def test_rejects_private_ip_resolution(self, mock_gai): | ||
| # Mock domain resolving to private IP (192.168.1.1) | ||
| mock_gai.side_effect = lambda host, *args, **kwargs: mock_getaddrinfo_ipv4("192.168.1.1") | ||
| assert main.validate_folder_url("https://internal.corp/foo") is False | ||
|
|
||
| @patch('socket.getaddrinfo') | ||
| def test_rejects_loopback_ip_resolution(self, mock_gai): | ||
| # Mock domain resolving to loopback IP (127.0.0.1) | ||
| mock_gai.side_effect = lambda host, *args, **kwargs: mock_getaddrinfo_ipv4("127.0.0.1") | ||
| assert main.validate_folder_url("https://evil.com/foo") is False | ||
|
|
||
| @patch('socket.getaddrinfo') | ||
| def test_rejects_ipv6_loopback_resolution(self, mock_gai): | ||
| # Mock domain resolving to IPv6 loopback (::1) | ||
| mock_gai.side_effect = lambda host, *args, **kwargs: mock_getaddrinfo_ipv6("::1") | ||
| assert main.validate_folder_url("https://ipv6.local/foo") is False |
There was a problem hiding this comment.
The tests for private, loopback, and IPv6 loopback resolutions are very similar. They can be combined into a single, parametrized test. This reduces code duplication and makes the test suite easier to maintain and extend with more non-public IP cases in the future. This suggestion also simplifies the mock setup by using return_value instead of a side_effect with a lambda, which is more direct for this use case.
| @patch('socket.getaddrinfo') | |
| def test_rejects_private_ip_resolution(self, mock_gai): | |
| # Mock domain resolving to private IP (192.168.1.1) | |
| mock_gai.side_effect = lambda host, *args, **kwargs: mock_getaddrinfo_ipv4("192.168.1.1") | |
| assert main.validate_folder_url("https://internal.corp/foo") is False | |
| @patch('socket.getaddrinfo') | |
| def test_rejects_loopback_ip_resolution(self, mock_gai): | |
| # Mock domain resolving to loopback IP (127.0.0.1) | |
| mock_gai.side_effect = lambda host, *args, **kwargs: mock_getaddrinfo_ipv4("127.0.0.1") | |
| assert main.validate_folder_url("https://evil.com/foo") is False | |
| @patch('socket.getaddrinfo') | |
| def test_rejects_ipv6_loopback_resolution(self, mock_gai): | |
| # Mock domain resolving to IPv6 loopback (::1) | |
| mock_gai.side_effect = lambda host, *args, **kwargs: mock_getaddrinfo_ipv6("::1") | |
| assert main.validate_folder_url("https://ipv6.local/foo") is False | |
| @pytest.mark.parametrize("ip, mock_fn, url", [ | |
| ("192.168.1.1", mock_getaddrinfo_ipv4, "https://internal.corp/foo"), | |
| ("127.0.0.1", mock_getaddrinfo_ipv4, "https://evil.com/foo"), | |
| ("::1", mock_getaddrinfo_ipv6, "https://ipv6.local/foo"), | |
| ]) | |
| @patch('socket.getaddrinfo') | |
| def test_rejects_non_public_ip_resolution(self, mock_gai, ip, mock_fn, url): | |
| """Tests that domains resolving to private or loopback IPs are rejected.""" | |
| mock_gai.return_value = mock_fn(ip) | |
| assert main.validate_folder_url(url) is False |
| def test_rejects_ip_literal_private(self): | ||
| # IP literals are checked directly via ipaddress module, bypassing DNS | ||
| assert main.validate_folder_url("https://192.168.1.1/foo") is False | ||
| assert main.validate_folder_url("https://10.0.0.1/foo") is False | ||
| assert main.validate_folder_url("https://127.0.0.1/foo") is False | ||
| assert main.validate_folder_url("https://[::1]/foo") is False |
There was a problem hiding this comment.
This test can be made more concise and easier to extend by using pytest.mark.parametrize. This converts multiple assertions into a data-driven test, which is cleaner and provides better failure messages for individual cases.
| def test_rejects_ip_literal_private(self): | |
| # IP literals are checked directly via ipaddress module, bypassing DNS | |
| assert main.validate_folder_url("https://192.168.1.1/foo") is False | |
| assert main.validate_folder_url("https://10.0.0.1/foo") is False | |
| assert main.validate_folder_url("https://127.0.0.1/foo") is False | |
| assert main.validate_folder_url("https://[::1]/foo") is False | |
| @pytest.mark.parametrize("url", [ | |
| "https://192.168.1.1/foo", | |
| "https://10.0.0.1/foo", | |
| "https://127.0.0.1/foo", | |
| "https://[::1]/foo", | |
| ]) | |
| def test_rejects_ip_literal_private(self, url): | |
| # IP literals are checked directly via ipaddress module, bypassing DNS | |
| assert main.validate_folder_url(url) is False |
| def test_accepts_ip_literal_public(self): | ||
| assert main.validate_folder_url("https://8.8.8.8/foo") is True | ||
| assert main.validate_folder_url("https://[2001:4860:4860::8888]/foo") is True |
There was a problem hiding this comment.
This test can be made more concise and easier to extend by using pytest.mark.parametrize. This converts multiple assertions into a data-driven test.
| def test_accepts_ip_literal_public(self): | |
| assert main.validate_folder_url("https://8.8.8.8/foo") is True | |
| assert main.validate_folder_url("https://[2001:4860:4860::8888]/foo") is True | |
| @pytest.mark.parametrize("url", [ | |
| "https://8.8.8.8/foo", | |
| "https://[2001:4860:4860::8888]/foo", | |
| ]) | |
| def test_accepts_ip_literal_public(self, url): | |
| assert main.validate_folder_url(url) is True |
There was a problem hiding this comment.
Pylintpython3 (reported by Codacy) found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Pylint (reported by Codacy) found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
| """Security tests for SSRF protection in validate_folder_url.""" | ||
|
|
||
| def test_rejects_non_https(self): | ||
| assert main.validate_folder_url("http://example.com/foo") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
|
|
||
| def test_rejects_non_https(self): | ||
| assert main.validate_folder_url("http://example.com/foo") is False | ||
| assert main.validate_folder_url("ftp://example.com/foo") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| def test_rejects_non_https(self): | ||
| assert main.validate_folder_url("http://example.com/foo") is False | ||
| assert main.validate_folder_url("ftp://example.com/foo") is False | ||
| assert main.validate_folder_url("file:///etc/passwd") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| def test_accepts_valid_public_ip(self, mock_gai): | ||
| # Mock domain resolving to public IP (8.8.8.8) | ||
| mock_gai.side_effect = lambda host, *args, **kwargs: mock_getaddrinfo_ipv4("8.8.8.8") | ||
| assert main.validate_folder_url("https://example.com/foo") is True |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
|
|
||
| def test_rejects_localhost_literal(self): | ||
| # Test explicit localhost hostname check | ||
| assert main.validate_folder_url("https://localhost/foo") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| # IP literals are checked directly via ipaddress module, bypassing DNS | ||
| assert main.validate_folder_url("https://192.168.1.1/foo") is False | ||
| assert main.validate_folder_url("https://10.0.0.1/foo") is False | ||
| assert main.validate_folder_url("https://127.0.0.1/foo") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| assert main.validate_folder_url("https://192.168.1.1/foo") is False | ||
| assert main.validate_folder_url("https://10.0.0.1/foo") is False | ||
| assert main.validate_folder_url("https://127.0.0.1/foo") is False | ||
| assert main.validate_folder_url("https://[::1]/foo") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| assert main.validate_folder_url("https://[::1]/foo") is False | ||
|
|
||
| def test_accepts_ip_literal_public(self): | ||
| assert main.validate_folder_url("https://8.8.8.8/foo") is True |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
|
|
||
| def test_accepts_ip_literal_public(self): | ||
| assert main.validate_folder_url("https://8.8.8.8/foo") is True | ||
| assert main.validate_folder_url("https://[2001:4860:4860::8888]/foo") is True |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| def test_dns_resolution_failure_is_safe(self, mock_gai): | ||
| # Ensure that if DNS fails, we default to False (fail closed) | ||
| mock_gai.side_effect = socket.gaierror("Name or service not known") | ||
| assert main.validate_folder_url("https://nonexistent.com/foo") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
There was a problem hiding this comment.
Pull request overview
This PR strengthens SSRF (Server-Side Request Forgery) protection by removing the @lru_cache decorator from validate_folder_url to eliminate TOCTOU (Time-of-Check Time-of-Use) vulnerabilities arising from DNS rebinding attacks. The change ensures DNS resolution happens at validation time rather than using potentially stale cached results. Comprehensive security tests are added to verify the SSRF protection mechanisms.
Changes:
- Removed
@lru_cachedecorator and import to prevent caching of validation results - Updated security comments explaining TOCTOU/DNS rebinding risks
- Added comprehensive test suite covering HTTPS enforcement, private/loopback IP rejection (IPv4/IPv6), public IP acceptance, and DNS failure handling
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| main.py | Removed lru_cache import, decorator, and cache clearing; added security comments explaining TOCTOU mitigation |
| tests/test_security.py | Added comprehensive SSRF protection tests including protocol validation, IP address checks, DNS resolution scenarios, and failure handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # SECURITY: Removed cache clearing because we removed the cache on validate_folder_url | ||
| # to mitigate TOCTOU (DNS Rebinding) risks. Validation now happens per-fetch. |
There was a problem hiding this comment.
The comment states "Validation now happens per-fetch" but this is not entirely accurate. Looking at the _fetch_if_valid function below (lines 1058-1060), URLs that are already in the data cache (_cache) are returned directly without re-validation. This means validation does not happen per-fetch for cached URLs, which could still present a TOCTOU risk if DNS records change between syncs within the same process. Consider updating the comment to clarify that validation happens per-fetch for non-cached URLs, or document the accepted risk of using cached data without re-validation.
…itization, add dry-run plan details Incorporates the best changes from 36 Jules PRs, addressing review feedback: Bolt (Performance) - from PR #173: - Pre-compile PROFILE_ID_PATTERN and RULE_PATTERN at module level - Use compiled patterns in is_valid_profile_id_format, validate_profile_id, and is_valid_rule - Supersedes PRs: #140, #143, #152, #155, #158, #161, #167, #170, #173 Sentinel (Security) - from PR #172 with review feedback: - Enhance sanitize_for_log to redact Basic Auth credentials in URLs - Redact sensitive query parameters (token, key, secret, password, etc.) - Handle fragment separators (#) per Gemini Code Assist review - Use [^&#\s]* pattern per Copilot reviewer suggestion - Update docstring per reviewer suggestion - Supersedes PRs: #142, #145, #148, #151, #154, #157, #160, #169, #172 Palette (UX) - from PR #174 with lint fixes: - Add print_plan_details function for dry-run visibility - Fix duplicate render_progress_bar definition bug - Supersedes PRs: #139, #141, #144, #147, #150, #153, #156, #159, #162, #165, #168, #171, #174 Also: #146, #149, #164 (parallel folder deletion) and #166 (auto-fix .env perms) are independent features not consolidated here. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
…itization, add dry-run plan details Incorporates the best changes from 36 Jules PRs, addressing review feedback: Bolt (Performance) - from PR #173: - Pre-compile PROFILE_ID_PATTERN and RULE_PATTERN at module level - Use compiled patterns in is_valid_profile_id_format, validate_profile_id, and is_valid_rule - Supersedes PRs: #140, #143, #152, #155, #158, #161, #167, #170, #173 Sentinel (Security) - from PR #172 with review feedback: - Enhance sanitize_for_log to redact Basic Auth credentials in URLs - Redact sensitive query parameters (token, key, secret, password, etc.) - Handle fragment separators (#) per Gemini Code Assist review - Use [^&#\s]* pattern per Copilot reviewer suggestion - Update docstring per reviewer suggestion - Supersedes PRs: #142, #145, #148, #151, #154, #157, #160, #169, #172 Palette (UX) - from PR #174 with lint fixes: - Add print_plan_details function for dry-run visibility - Fix duplicate render_progress_bar definition bug - Supersedes PRs: #139, #141, #144, #147, #150, #153, #156, #159, #162, #165, #168, #171, #174 Also: #146, #149, #164 (parallel folder deletion) and #166 (auto-fix .env perms) are independent features not consolidated here. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Removed
@lru_cachefromvalidate_folder_urlinmain.pyto mitigate TOCTOU (Time-of-Check Time-of-Use) vulnerabilities where DNS records could change between validation and fetch.Added comprehensive tests in
tests/test_security.py(merged with existing tests) to verify:This ensures the SSRF protection mechanism is robust and verified by CI.
PR created automatically by Jules for task 13165274204186978288 started by @abhimehro