⚡ Bolt: Skip redundant DNS validation for cached blocklists#423
⚡ Bolt: Skip redundant DNS validation for cached blocklists#423
Conversation
Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Merging to
|
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Summary of ChangesHello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the startup performance of the Bolt application by optimizing the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| @@ -0,0 +1,98 @@ | |||
| import time | |||
| import unittest | |||
| from unittest.mock import patch, MagicMock | |||
Check warning
Code scanning / Prospector (reported by Codacy)
Unused MagicMock imported from unittest.mock (unused-import) Warning test
| # Add root to path to import main | ||
| sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | ||
|
|
||
| import main |
Check warning
Code scanning / Prospector (reported by Codacy)
Import "import main" should be placed at the top of the module (wrong-import-position) Warning test
|
|
||
| class TestWarmUpCachePerf(unittest.TestCase): | ||
| def setUp(self): | ||
| main._cache.clear() |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _cache of a client class (protected-access) Warning test
| class TestWarmUpCachePerf(unittest.TestCase): | ||
| def setUp(self): | ||
| main._cache.clear() | ||
| main._disk_cache.clear() |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
| main.validate_folder_url.cache_clear() | ||
|
|
||
| def tearDown(self): | ||
| main._cache.clear() |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _cache of a client class (protected-access) Warning test
|
|
||
| def tearDown(self): | ||
| main._cache.clear() | ||
| main._disk_cache.clear() |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
| test_data = {"group": {"group": "Test Folder"}, "domains": ["example.com"]} | ||
|
|
||
| # Populate disk cache with fresh entry | ||
| main._disk_cache[test_url] = { |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
|
|
||
| # Populate disk cache with STALE entry | ||
| stale_time = time.time() - (main.CACHE_TTL_SECONDS + 100) | ||
| main._disk_cache[test_url] = { |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
| # Verify _gh_get was called | ||
| mock_gh_get.assert_called_with(test_url) | ||
|
|
||
| if __name__ == '__main__': |
Check warning
Code scanning / Prospector (reported by Codacy)
expected 2 blank lines after class or function definition, found 1 (E305) Warning test
| # Verify _gh_get was called | ||
| mock_gh_get.assert_called_with(test_url) | ||
|
|
||
| def test_warm_up_calls_validation_for_missing_cache(self): |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Method could be a function Warning test
| # Check in-memory cache first (though warm_up_cache filters these out, | ||
| # having it here makes the helper more robust) | ||
| with _cache_lock: | ||
| if url in _cache: |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (108/100) Warning
| # OPTIMIZATION: Combine validation (DNS) and fetching (HTTP) in one task | ||
| # to allow validation latency to be parallelized. | ||
| def _validate_and_fetch(url: str): | ||
| # Optimization: Skip DNS validation if cache is fresh |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (120/100) Warning
| @@ -0,0 +1,98 @@ | |||
| import time | |||
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing module docstring Warning test
| # This assertion will FAIL before the fix | ||
| mock_validate.assert_not_called() | ||
|
|
||
| def test_warm_up_calls_validation_for_stale_cache(self): |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Method could be a function Warning test
|
|
||
| def tearDown(self): | ||
| main._cache.clear() | ||
| main._disk_cache.clear() |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
| main.validate_folder_url.cache_clear() | ||
|
|
||
| def tearDown(self): | ||
| main._cache.clear() |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _cache of a client class Note test
| class TestWarmUpCachePerf(unittest.TestCase): | ||
| def setUp(self): | ||
| main._cache.clear() | ||
| main._disk_cache.clear() |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
|
|
||
| class TestWarmUpCachePerf(unittest.TestCase): | ||
| def setUp(self): | ||
| main._cache.clear() |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _cache of a client class Note test
| @@ -0,0 +1,98 @@ | |||
| import time | |||
| import unittest | |||
| from unittest.mock import patch, MagicMock | |||
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused MagicMock imported from unittest.mock Note test
| # Check in-memory cache first (though warm_up_cache filters these out, | ||
| # having it here makes the helper more robust) | ||
| with _cache_lock: | ||
| if url in _cache: |
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (108/100) Warning
| # OPTIMIZATION: Combine validation (DNS) and fetching (HTTP) in one task | ||
| # to allow validation latency to be parallelized. | ||
| def _validate_and_fetch(url: str): | ||
| # Optimization: Skip DNS validation if cache is fresh |
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (120/100) Warning
| @@ -0,0 +1,98 @@ | |||
| import time | |||
Check warning
Code scanning / Pylint (reported by Codacy)
Missing module docstring Warning test
| # Add root to path to import main | ||
| sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | ||
|
|
||
| import main |
Check warning
Code scanning / Pylint (reported by Codacy)
Import "import main" should be placed at the top of the module Warning test
|
|
||
| import main | ||
|
|
||
| class TestWarmUpCachePerf(unittest.TestCase): |
Check warning
Code scanning / Pylint (reported by Codacy)
Missing class docstring Warning test
| class TestWarmUpCachePerf(unittest.TestCase): | ||
| def setUp(self): | ||
| main._cache.clear() | ||
| main._disk_cache.clear() |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
| main.validate_folder_url.cache_clear() | ||
|
|
||
| def tearDown(self): | ||
| main._cache.clear() |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _cache of a client class Note test
|
|
||
| def tearDown(self): | ||
| main._cache.clear() | ||
| main._disk_cache.clear() |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
| test_data = {"group": {"group": "Test Folder"}, "domains": ["example.com"]} | ||
|
|
||
| # Populate disk cache with fresh entry | ||
| main._disk_cache[test_url] = { |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
|
|
||
| # Populate disk cache with STALE entry | ||
| stale_time = time.time() - (main.CACHE_TTL_SECONDS + 100) | ||
| main._disk_cache[test_url] = { |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
There was a problem hiding this comment.
Pull request overview
Optimizes startup cache warming by avoiding redundant DNS validation when blocklist data can be served from a fresh persistent disk cache, and repairs corrupted logic in validate_folder_data.
Changes:
- Add
_is_cache_fresh()and updatewarm_up_cache()to skipvalidate_folder_url()for URLs with fresh cached entries. - Fix syntax/corruption in
validate_folder_data()and ensure it returnsTrueon the success path. - Add
tests/test_warm_up_cache_perf.pyto verify validation is skipped for fresh cache entries and executed for stale/missing ones.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
main.py |
Adds cache-freshness helper + warm-up optimization; fixes validation logic corruption and restores correct return behavior. |
tests/test_warm_up_cache_perf.py |
Adds regression tests to ensure DNS validation is skipped only when disk cache entries are within TTL. |
| entry = _disk_cache.get(url) | ||
| if entry: | ||
| last_validated = entry.get("last_validated", 0) | ||
| if time.time() - last_validated < CACHE_TTL_SECONDS: | ||
| return True |
There was a problem hiding this comment.
_is_cache_fresh() subtracts time.time() - last_validated without validating the type. Since load_disk_cache() only guarantees a data field, a malformed cache file could contain last_validated: null or a string, causing a TypeError and breaking warm_up_cache startup. Defensive fix: coerce/validate last_validated (int/float) before subtraction and treat invalid values as stale (same hardening should be applied in _gh_get() which uses the same subtraction).
| def _is_cache_fresh(url: str) -> bool: | ||
| """Checks if the URL is in the persistent cache and within TTL.""" | ||
| # Check in-memory cache first (though warm_up_cache filters these out, | ||
| # having it here makes the helper more robust) | ||
| with _cache_lock: | ||
| if url in _cache: | ||
| return True | ||
|
|
There was a problem hiding this comment.
Docstring says this checks the “persistent cache”, but the function also returns True when the URL exists in the in-memory _cache. Update the docstring to reflect both caches (or drop the in-memory check if you want the helper to be strictly about disk freshness).
| @@ -0,0 +1,98 @@ | |||
| import time | |||
| import unittest | |||
| from unittest.mock import patch, MagicMock | |||
There was a problem hiding this comment.
MagicMock is imported but never used in this test file. Removing the unused import keeps the test code tidy and avoids confusing future readers about intended mocking behavior.
| from unittest.mock import patch, MagicMock | |
| from unittest.mock import patch |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant optimization by skipping redundant DNS validation for blocklists that are already present and fresh in the persistent disk cache. However, this optimization reveals a critical security control bypass: the application trusts pre-fetched data in the in-memory cache without performing necessary security validations (e.g., checking for malicious folder names). This allows potentially malicious content from remote blocklists to bypass security checks. Additionally, a critical syntax error and an incorrect variable reference in the validate_folder_data function have been fixed, improving code robustness and debuggability. New performance tests (test_warm_up_cache_perf.py) provide excellent coverage for the caching logic.
| def _is_cache_fresh(url: str) -> bool: | ||
| """Checks if the URL is in the persistent cache and within TTL.""" | ||
| # Check in-memory cache first (though warm_up_cache filters these out, | ||
| # having it here makes the helper more robust) | ||
| with _cache_lock: | ||
| if url in _cache: | ||
| return True | ||
|
|
||
| entry = _disk_cache.get(url) | ||
| if entry: | ||
| last_validated = entry.get("last_validated", 0) | ||
| if time.time() - last_validated < CACHE_TTL_SECONDS: | ||
| return True | ||
| return False |
| import time | ||
| import unittest | ||
| from unittest.mock import patch, MagicMock | ||
| import sys | ||
| import os | ||
|
|
||
| # Add root to path to import main | ||
| sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | ||
|
|
||
| import main | ||
|
|
||
| class TestWarmUpCachePerf(unittest.TestCase): | ||
| def setUp(self): | ||
| main._cache.clear() | ||
| main._disk_cache.clear() | ||
| main.validate_folder_url.cache_clear() | ||
|
|
||
| def tearDown(self): | ||
| main._cache.clear() | ||
| main._disk_cache.clear() | ||
| main.validate_folder_url.cache_clear() | ||
|
|
||
| def test_warm_up_skips_validation_for_fresh_cache(self): | ||
| """ | ||
| Test that warm_up_cache skips validate_folder_url if the URL is in disk cache and fresh. | ||
| """ | ||
| test_url = "https://example.com/test.json" | ||
| test_data = {"group": {"group": "Test Folder"}, "domains": ["example.com"]} | ||
|
|
||
| # Populate disk cache with fresh entry | ||
| main._disk_cache[test_url] = { | ||
| "data": test_data, | ||
| "last_validated": time.time(), # Fresh | ||
| "fetched_at": time.time(), | ||
| } | ||
|
|
||
| # Mock validate_folder_url to ensure it is NOT called | ||
| # Mock _gh_get to verify it IS called (which will use the cache) | ||
| with patch('main.validate_folder_url') as mock_validate: | ||
| with patch('main._gh_get', return_value=test_data) as mock_gh_get: | ||
|
|
||
| main.warm_up_cache([test_url]) | ||
|
|
||
| # Verify _gh_get was called (it handles cache retrieval) | ||
| mock_gh_get.assert_called_with(test_url) | ||
|
|
||
| # Verify validate_folder_url was NOT called | ||
| # This assertion will FAIL before the fix | ||
| mock_validate.assert_not_called() | ||
|
|
||
| def test_warm_up_calls_validation_for_stale_cache(self): | ||
| """ | ||
| Test that warm_up_cache calls validate_folder_url if the URL is stale in disk cache. | ||
| """ | ||
| test_url = "https://example.com/test.json" | ||
| test_data = {"group": {"group": "Test Folder"}, "domains": ["example.com"]} | ||
|
|
||
| # Populate disk cache with STALE entry | ||
| stale_time = time.time() - (main.CACHE_TTL_SECONDS + 100) | ||
| main._disk_cache[test_url] = { | ||
| "data": test_data, | ||
| "last_validated": stale_time, | ||
| "fetched_at": stale_time, | ||
| } | ||
|
|
||
| with patch('main.validate_folder_url', return_value=True) as mock_validate: | ||
| with patch('main._gh_get', return_value=test_data) as mock_gh_get: | ||
|
|
||
| main.warm_up_cache([test_url]) | ||
|
|
||
| # Verify validate_folder_url WAS called | ||
| mock_validate.assert_called_with(test_url) | ||
|
|
||
| # Verify _gh_get was called | ||
| mock_gh_get.assert_called_with(test_url) | ||
|
|
||
| def test_warm_up_calls_validation_for_missing_cache(self): | ||
| """ | ||
| Test that warm_up_cache calls validate_folder_url if the URL is not in disk cache. | ||
| """ | ||
| test_url = "https://example.com/test.json" | ||
| test_data = {"group": {"group": "Test Folder"}, "domains": ["example.com"]} | ||
|
|
||
| # Cache is empty | ||
|
|
||
| with patch('main.validate_folder_url', return_value=True) as mock_validate: | ||
| with patch('main._gh_get', return_value=test_data) as mock_gh_get: | ||
|
|
||
| main.warm_up_cache([test_url]) | ||
|
|
||
| # Verify validate_folder_url WAS called | ||
| mock_validate.assert_called_with(test_url) | ||
|
|
||
| # Verify _gh_get was called | ||
| mock_gh_get.assert_called_with(test_url) | ||
|
|
||
| if __name__ == '__main__': | ||
| unittest.main() |
There was a problem hiding this comment.
|
🤖 Automated PR Review — Closing as duplicate This PR bundles cache warm-up optimization with the ubiquitous syntax fix. The cache warm-up concept is good but this implementation mixes concerns. The syntax fix is in #412. Closing — consider re-submitting the cache warm-up optimization as a standalone PR after the syntax fix lands on |
Understood. Acknowledging that this PR is closed due to mixed concerns with the syntax fix. I am stopping work on this task for now. |
💡 What: Optimized
warm_up_cacheto skip DNS validation (validate_folder_url) when the blocklist is already present in the persistent disk cache and within the TTL (24h).🎯 Why: DNS lookups are blocking I/O operations (even when threaded) and are unnecessary overhead when we serve content directly from a proven-safe local cache. This reduces startup latency for subsequent runs.
📊 Impact: Eliminates N DNS lookups on warm start, where N is the number of blocklists (default 23).
🔬 Measurement: Verified with
tests/test_warm_up_cache_perf.pywhich confirmsvalidate_folder_urlis skipped for fresh cache entries but called for stale/missing ones.Also fixed a syntax error/corruption in
validate_folder_data.PR created automatically by Jules for task 1269213969337536655 started by @abhimehro