⚡ Bolt: Optimize cache warm-up and fix validation bug#416
⚡ Bolt: Optimize cache warm-up and fix validation bug#416
Conversation
- Optimized `warm_up_cache` to skip DNS validation for fresh cache entries. - Fixed a bug in `validate_folder_data` where it returned `None` instead of `True`. - Fixed corrupted code in `main.py`. - Added `tests/test_warm_up_cache_perf.py` to verify optimization. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Merging to
|
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Summary of ChangesHello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Bolt system's performance and reliability. It introduces an optimization to the cache warm-up process by intelligently skipping redundant DNS validations for recently validated cached data, thereby accelerating application startup. Concurrently, it resolves a critical bug in the data validation logic that previously caused sync failures due to incorrect return values and addresses minor syntax corruption. These changes collectively aim to improve the efficiency and robustness of data handling. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| **Learning:** Python's `ThreadPoolExecutor` incurs measurable overhead (thread creation/shutdown) when created/destroyed repeatedly inside loops, even with small worker counts. | ||
| **Action:** Lift `ThreadPoolExecutor` creation to the highest possible scope and pass it down as a dependency (using `contextlib.nullcontext` for flexible ownership). | ||
|
|
||
| ## 2026-02-22 - [Stale Module References in Tests] |
Check notice
Code scanning / Remark-lint (reported by Codacy)
Warn when references to undefined definitions are found. Note
| **Learning:** Python's `ThreadPoolExecutor` incurs measurable overhead (thread creation/shutdown) when created/destroyed repeatedly inside loops, even with small worker counts. | ||
| **Action:** Lift `ThreadPoolExecutor` creation to the highest possible scope and pass it down as a dependency (using `contextlib.nullcontext` for flexible ownership). | ||
|
|
||
| ## 2026-02-22 - [Stale Module References in Tests] |
Check notice
Code scanning / Remark-lint (reported by Codacy)
Warn when shortcut reference links are used. Note
main.py
Outdated
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (120/100) Warning
| # We do NOT import main at top level to avoid holding a stale reference | ||
| # if other tests reload the module. | ||
|
|
||
| class TestWarmUpCachePerf(unittest.TestCase): |
Check warning
Code scanning / Pylint (reported by Codacy)
Missing class docstring Warning test
| URL is already fresh in the disk cache. | ||
| """ | ||
| import unittest | ||
| from unittest.mock import patch, MagicMock |
Check notice
Code scanning / Pylint (reported by Codacy)
Unused MagicMock imported from unittest.mock Note test
| import main | ||
| self.main = main | ||
|
|
||
| self.main._cache.clear() |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _cache of a client class Note test
| self.main = main | ||
|
|
||
| self.main._cache.clear() | ||
| self.main._disk_cache.clear() |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
| test_data = {"group": {"group": "Fresh Folder"}, "rules": []} | ||
|
|
||
| # Populate disk cache with a FRESH entry | ||
| with self.main._cache_lock: |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _cache_lock of a client class Note test
|
|
||
| # Populate disk cache with a FRESH entry | ||
| with self.main._cache_lock: | ||
| self.main._disk_cache[test_url] = { |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
|
|
||
| # Populate disk cache with a STALE entry (older than TTL) | ||
| stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL | ||
| with self.main._cache_lock: |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _cache_lock of a client class Note test
| # Populate disk cache with a STALE entry (older than TTL) | ||
| stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL | ||
| with self.main._cache_lock: | ||
| self.main._disk_cache[test_url] = { |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
| test_data = {"group": {"group": "Missing Folder"}, "rules": []} | ||
|
|
||
| # Ensure cache is empty | ||
| self.assertNotIn(test_url, self.main._disk_cache) |
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
| URL is already fresh in the disk cache. | ||
| """ | ||
| import unittest | ||
| from unittest.mock import patch, MagicMock |
Check warning
Code scanning / Prospector (reported by Codacy)
Unused MagicMock imported from unittest.mock (unused-import) Warning test
| if 'main' in sys.modules: | ||
| self.main = sys.modules['main'] | ||
| else: | ||
| import main |
Check warning
Code scanning / Prospector (reported by Codacy)
Import "import main" should be placed at the top of the module (wrong-import-position) Warning test
| import main | ||
| self.main = main | ||
|
|
||
| self.main._cache.clear() |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _cache of a client class (protected-access) Warning test
| self.main = main | ||
|
|
||
| self.main._cache.clear() | ||
| self.main._disk_cache.clear() |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
|
|
||
| def tearDown(self): | ||
| """Clean up after each test.""" | ||
| self.main._cache.clear() |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _cache of a client class (protected-access) Warning test
|
|
||
| # Populate disk cache with a FRESH entry | ||
| with self.main._cache_lock: | ||
| self.main._disk_cache[test_url] = { |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
|
|
||
| # Populate disk cache with a STALE entry (older than TTL) | ||
| stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL | ||
| with self.main._cache_lock: |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _cache_lock of a client class (protected-access) Warning test
| # Populate disk cache with a STALE entry (older than TTL) | ||
| stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL | ||
| with self.main._cache_lock: | ||
| self.main._disk_cache[test_url] = { |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
| test_data = {"group": {"group": "Missing Folder"}, "rules": []} | ||
|
|
||
| # Ensure cache is empty | ||
| self.assertNotIn(test_url, self.main._disk_cache) |
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
| # VERIFICATION: validate_folder_url SHOULD be called for missing entry | ||
| mock_validate.assert_called_with(test_url) | ||
|
|
||
| if __name__ == '__main__': |
Check warning
Code scanning / Prospector (reported by Codacy)
expected 2 blank lines after class or function definition, found 1 (E305) Warning test
main.py
Outdated
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (120/100) Warning
| # We do NOT import main at top level to avoid holding a stale reference | ||
| # if other tests reload the module. | ||
|
|
||
| class TestWarmUpCachePerf(unittest.TestCase): |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing class docstring Warning test
| if 'main' in sys.modules: | ||
| self.main = sys.modules['main'] | ||
| else: | ||
| import main |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Import outside toplevel (main) Warning test
| URL is already fresh in the disk cache. | ||
| """ | ||
| import unittest | ||
| from unittest.mock import patch, MagicMock |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused MagicMock imported from unittest.mock Note test
| import main | ||
| self.main = main | ||
|
|
||
| self.main._cache.clear() |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _cache of a client class Note test
| test_data = {"group": {"group": "Fresh Folder"}, "rules": []} | ||
|
|
||
| # Populate disk cache with a FRESH entry | ||
| with self.main._cache_lock: |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _cache_lock of a client class Note test
|
|
||
| # Populate disk cache with a FRESH entry | ||
| with self.main._cache_lock: | ||
| self.main._disk_cache[test_url] = { |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
|
|
||
| # Populate disk cache with a STALE entry (older than TTL) | ||
| stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL | ||
| with self.main._cache_lock: |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _cache_lock of a client class Note test
| # Populate disk cache with a STALE entry (older than TTL) | ||
| stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL | ||
| with self.main._cache_lock: | ||
| self.main._disk_cache[test_url] = { |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
| test_data = {"group": {"group": "Missing Folder"}, "rules": []} | ||
|
|
||
| # Ensure cache is empty | ||
| self.assertNotIn(test_url, self.main._disk_cache) |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
There was a problem hiding this comment.
Code Review
This pull request introduces significant optimizations to the cache warm-up process and fixes a critical validation bug. However, a new caching optimization introduces a high-severity Server-Side Request Forgery (SSRF) flaw. This vulnerability allows a bypass of critical hostname validation if a local cache file is tampered with, and it is recommended to remove the optimization that skips the validation step to mitigate this risk. Additionally, the validate_folder_data function has been corrected to return True on successful validation, resolving a bug that caused sync failures. A new test file tests/test_warm_up_cache_perf.py has been added, and the .jules/bolt.md file has been updated with a new learning and action item regarding stale module references in tests.
| if _is_cache_fresh(url): | ||
| return _gh_get(url) |
There was a problem hiding this comment.
While this optimization aims to improve startup performance by skipping DNS validation if the cache is fresh, it introduces a high-severity Server-Side Request Forgery (SSRF) vulnerability. The _is_cache_fresh function trusts the last_validated timestamp from a disk-based cache file (blocklists.json) to bypass the validate_folder_url security check. An attacker with local file system access could tamper with this cache file, leading to the application making requests to attacker-specified URLs. It is critical that validation is not skipped based on a potentially user-controllable cache file.
| def _is_cache_fresh(url: str) -> bool: | ||
| """ | ||
| Checks if the URL is in the persistent cache and within TTL. | ||
|
|
||
| This optimization allows skipping expensive DNS validation for | ||
| content that is already known to be safe (validated at fetch time). | ||
| """ | ||
| # Check in-memory cache first | ||
| with _cache_lock: | ||
| if url in _cache: | ||
| return True | ||
|
|
||
| # Check disk cache | ||
| entry = _disk_cache.get(url) | ||
| if entry: | ||
| last_validated = entry.get("last_validated", 0) | ||
| if time.time() - last_validated < CACHE_TTL_SECONDS: | ||
| return True | ||
| return False |
| """ | ||
| Tests for the warm_up_cache optimization. | ||
|
|
||
| This module verifies that warm_up_cache skips DNS validation when the | ||
| URL is already fresh in the disk cache. | ||
| """ | ||
| import unittest | ||
| from unittest.mock import patch, MagicMock | ||
| import sys | ||
| import os | ||
| import time | ||
|
|
||
| # Add root to path to import main | ||
| sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | ||
|
|
||
| # We do NOT import main at top level to avoid holding a stale reference | ||
| # if other tests reload the module. | ||
|
|
||
| class TestWarmUpCachePerf(unittest.TestCase): | ||
| def setUp(self): | ||
| """Reset cache state before each test.""" | ||
| # Get the current main module from sys.modules | ||
| # If not present, import it | ||
| if 'main' in sys.modules: | ||
| self.main = sys.modules['main'] | ||
| else: | ||
| import main | ||
| self.main = main | ||
|
|
||
| self.main._cache.clear() | ||
| self.main._disk_cache.clear() | ||
| self.main.validate_folder_url.cache_clear() | ||
|
|
||
| def tearDown(self): | ||
| """Clean up after each test.""" | ||
| self.main._cache.clear() | ||
| self.main._disk_cache.clear() | ||
| self.main.validate_folder_url.cache_clear() | ||
|
|
||
| def test_warm_up_skips_validation_for_fresh_cache(self): | ||
| """ | ||
| Test that warm_up_cache does NOT call validate_folder_url | ||
| if the URL is present in _disk_cache and fresh (within TTL). | ||
| """ | ||
| test_url = "https://example.com/fresh.json" | ||
| test_data = {"group": {"group": "Fresh Folder"}, "rules": []} | ||
|
|
||
| # Populate disk cache with a FRESH entry | ||
| with self.main._cache_lock: | ||
| self.main._disk_cache[test_url] = { | ||
| "data": test_data, | ||
| "fetched_at": time.time(), | ||
| "last_validated": time.time(), # Just now | ||
| "etag": "123", | ||
| "last_modified": "Tue, 15 Nov 1994 12:45:26 GMT" | ||
| } | ||
|
|
||
| # Mock validate_folder_url to track calls | ||
| with patch('main.validate_folder_url') as mock_validate: | ||
| # Mock _gh_get to return data without network | ||
| with patch('main._gh_get', return_value=test_data) as mock_gh_get: | ||
|
|
||
| self.main.warm_up_cache([test_url]) | ||
|
|
||
| # VERIFICATION: validate_folder_url should NOT be called | ||
| mock_validate.assert_not_called() | ||
|
|
||
| # _gh_get should still be called (it handles cache retrieval) | ||
| mock_gh_get.assert_called_with(test_url) | ||
|
|
||
| def test_warm_up_calls_validation_for_stale_cache(self): | ||
| """ | ||
| Test that warm_up_cache DOES call validate_folder_url | ||
| if the URL is present in _disk_cache but STALE (expired TTL). | ||
| """ | ||
| test_url = "https://example.com/stale.json" | ||
| test_data = {"group": {"group": "Stale Folder"}, "rules": []} | ||
|
|
||
| # Populate disk cache with a STALE entry (older than TTL) | ||
| stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL | ||
| with self.main._cache_lock: | ||
| self.main._disk_cache[test_url] = { | ||
| "data": test_data, | ||
| "fetched_at": stale_time, | ||
| "last_validated": stale_time, | ||
| "etag": "123", | ||
| "last_modified": "Tue, 15 Nov 1994 12:45:26 GMT" | ||
| } | ||
|
|
||
| with patch('main.validate_folder_url', return_value=True) as mock_validate: | ||
| with patch('main._gh_get', return_value=test_data): | ||
|
|
||
| self.main.warm_up_cache([test_url]) | ||
|
|
||
| # VERIFICATION: validate_folder_url SHOULD be called for stale entry | ||
| mock_validate.assert_called_with(test_url) | ||
|
|
||
| def test_warm_up_calls_validation_for_missing_cache(self): | ||
| """ | ||
| Test that warm_up_cache DOES call validate_folder_url | ||
| if the URL is NOT present in _disk_cache. | ||
| """ | ||
| test_url = "https://example.com/missing.json" | ||
| test_data = {"group": {"group": "Missing Folder"}, "rules": []} | ||
|
|
||
| # Ensure cache is empty | ||
| self.assertNotIn(test_url, self.main._disk_cache) | ||
|
|
||
| with patch('main.validate_folder_url', return_value=True) as mock_validate: | ||
| with patch('main._gh_get', return_value=test_data): | ||
|
|
||
| self.main.warm_up_cache([test_url]) | ||
|
|
||
| # VERIFICATION: validate_folder_url SHOULD be called for missing entry | ||
| mock_validate.assert_called_with(test_url) | ||
|
|
||
| if __name__ == '__main__': | ||
| unittest.main() |
There was a problem hiding this comment.
There was a problem hiding this comment.
Pull request overview
Optimizes cache warm-up behavior to reduce startup DNS validation overhead, fixes a validation return-value bug that caused sync failures, and records a testing lesson in Bolt notes.
Changes:
- Added
_is_cache_fresh()and used it inwarm_up_cache()to skip DNS validation when the cache is within TTL. - Fixed corrupted syntax in
validate_folder_data()and ensured it returnsTrueon successful validation. - Added a new test module covering the warm-up optimization behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
main.py |
Adds cache-freshness helper, changes warm-up validation flow, fixes validate_folder_data syntax/return value. |
tests/test_warm_up_cache_perf.py |
Adds tests verifying warm-up skips/doesn’t skip URL validation depending on disk-cache freshness. |
.jules/bolt.md |
Documents a testing pitfall around stale module references when reloading modules. |
| # Optimization: Skip DNS validation if cache is fresh | ||
| # This saves blocking I/O for known-good content | ||
| if _is_cache_fresh(url): | ||
| return _gh_get(url) | ||
|
|
||
| if validate_folder_url(url): | ||
| return _gh_get(url) |
There was a problem hiding this comment.
warm_up_cache currently warms the in-memory cache via _gh_get(), which does not validate the folder JSON schema. Since sync_profile’s _fetch_if_valid path returns cached entries directly (bypassing fetch_folder_data/validate_folder_data), a malformed disk cache entry (or unexpected remote JSON) can be propagated into the sync without structural validation. Consider warming using fetch_folder_data(url) (or explicitly calling validate_folder_data on the returned JSON) so cached entries are guaranteed to be schema-valid before sync_profile reuses them.
| # Optimization: Skip DNS validation if cache is fresh | |
| # This saves blocking I/O for known-good content | |
| if _is_cache_fresh(url): | |
| return _gh_get(url) | |
| if validate_folder_url(url): | |
| return _gh_get(url) | |
| # Optimization: Skip DNS validation if cache is fresh. | |
| # We still route through fetch_folder_data so that: | |
| # - Cached entries are loaded via the same path as normal sync. | |
| # - Schema validation (via validate_folder_data) is consistently applied | |
| # whether data comes from disk cache or the network. | |
| if _is_cache_fresh(url): | |
| # This should load and validate from cache without forcing a re-download. | |
| # Any validation errors will surface as exceptions and be logged by the | |
| # outer warm_up_cache loop. | |
| fetch_folder_data(url) | |
| return None | |
| # For non-fresh entries, validate the URL (DNS/format) first to avoid | |
| # unnecessary network calls, then fetch+validate the folder data. | |
| if validate_folder_url(url): | |
| fetch_folder_data(url) |
| # Optimization: Skip DNS validation if cache is fresh | ||
| # This saves blocking I/O for known-good content | ||
| if _is_cache_fresh(url): | ||
| return _gh_get(url) | ||
|
|
||
| if validate_folder_url(url): | ||
| return _gh_get(url) | ||
| return None |
There was a problem hiding this comment.
warm_up_cache now skips validate_folder_url entirely when _is_cache_fresh(url) is true. Since load_disk_cache only sanitizes structure (it does not validate that cache keys are https URLs with a hostname), a corrupted/hand-edited disk cache could cause an unsafe/non-https URL to be accepted from cache without any URL-level validation. Consider still performing lightweight URL/scheme/hostname validation (without DNS) before returning _gh_get(url), or splitting validate_folder_url into a cheap parse/scheme check plus an optional DNS-resolution check so only the DNS part is skipped.
| def _is_cache_fresh(url: str) -> bool: | ||
| """ | ||
| Checks if the URL is in the persistent cache and within TTL. | ||
|
|
||
| This optimization allows skipping expensive DNS validation for | ||
| content that is already known to be safe (validated at fetch time). | ||
| """ | ||
| # Check in-memory cache first | ||
| with _cache_lock: | ||
| if url in _cache: | ||
| return True | ||
|
|
There was a problem hiding this comment.
_is_cache_fresh() currently returns True for any URL present in the in-memory _cache, even though the docstring says it checks the persistent cache and TTL. This mismatch can be misleading for future callers and makes it unclear whether “fresh” means “no network I/O needed” vs “within TTL”. Consider renaming (or updating the docstring) to reflect the actual semantics, and/or checking TTL for the in-memory entry if that’s the intended meaning.
| URL is already fresh in the disk cache. | ||
| """ | ||
| import unittest | ||
| from unittest.mock import patch, MagicMock |
There was a problem hiding this comment.
tests/test_warm_up_cache_perf.py imports MagicMock but never uses it, which adds noise and can hide real unused-import issues. Remove the unused import to keep the test focused.
| from unittest.mock import patch, MagicMock | |
| from unittest.mock import patch |
Understood. Acknowledging that this PR is a duplicate and being closed as superseded. I will stop work on this task. |
warm_up_cacheto skip DNS validation for fresh cache entries. Fixed a critical bug invalidate_folder_datawhere it returnedNoneinstead ofTrue(causing sync failures). Fixed corrupted syntax inmain.py.KeyErrorcrashes during sync.tests/test_warm_up_cache_perf.pywhich confirms validation is skipped for fresh cache.PR created automatically by Jules for task 4821111787222158843 started by @abhimehro