Skip to content

⚡ Bolt: Optimize cache warm-up and fix validation bug#416

Closed
abhimehro wants to merge 1 commit intomainfrom
bolt/optimize-warm-up-cache-4821111787222158843
Closed

⚡ Bolt: Optimize cache warm-up and fix validation bug#416
abhimehro wants to merge 1 commit intomainfrom
bolt/optimize-warm-up-cache-4821111787222158843

Conversation

@abhimehro
Copy link
Owner

  • 💡 What: Optimized warm_up_cache to skip DNS validation for fresh cache entries. Fixed a critical bug in validate_folder_data where it returned None instead of True (causing sync failures). Fixed corrupted syntax in main.py.
  • 🎯 Why: Avoids blocking DNS lookups during startup when data is already cached locally, improving startup performance. Ensures sync process reliability.
  • 📊 Impact: Reduces startup time for subsequent runs by avoiding network I/O for cached items. Fixes KeyError crashes during sync.
  • 🔬 Measurement: Verified with new test tests/test_warm_up_cache_perf.py which confirms validation is skipped for fresh cache.

PR created automatically by Jules for task 4821111787222158843 started by @abhimehro

- Optimized `warm_up_cache` to skip DNS validation for fresh cache entries.
- Fixed a bug in `validate_folder_data` where it returned `None` instead of `True`.
- Fixed corrupted code in `main.py`.
- Added `tests/test_warm_up_cache_perf.py` to verify optimization.

Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
@google-labs-jules
Copy link

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copilot AI review requested due to automatic review settings February 22, 2026 15:01
@trunk-io
Copy link

trunk-io bot commented Feb 22, 2026

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

@gemini-code-assist
Copy link

Summary of Changes

Hello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Bolt system's performance and reliability. It introduces an optimization to the cache warm-up process by intelligently skipping redundant DNS validations for recently validated cached data, thereby accelerating application startup. Concurrently, it resolves a critical bug in the data validation logic that previously caused sync failures due to incorrect return values and addresses minor syntax corruption. These changes collectively aim to improve the efficiency and robustness of data handling.

Highlights

  • Cache Warm-up Optimization: Implemented a new _is_cache_fresh function and integrated it into warm_up_cache to skip expensive DNS validation for URLs already present and fresh in the disk cache, significantly reducing startup time.
  • Data Validation Bug Fix: Corrected a critical bug in validate_folder_data where it failed to explicitly return True upon successful validation, leading to sync failures. Also fixed corrupted syntax within this function.
  • New Performance Test: Added tests/test_warm_up_cache_perf.py to verify the warm_up_cache optimization, ensuring that DNS validation is correctly skipped for fresh cache entries and performed for stale or missing ones.
  • Documentation Update: Appended a new learning entry to .jules/bolt.md detailing issues with stale module references in tests and recommended actions to avoid them.
Changelog
  • .jules/bolt.md
    • Added a new learning entry about stale module references in tests.
  • main.py
    • Introduced _is_cache_fresh function to check cache freshness based on TTL.
    • Corrected syntax errors and ensured validate_folder_data explicitly returns True on successful validation.
    • Modified warm_up_cache to utilize _is_cache_fresh for conditional DNS validation.
  • tests/test_warm_up_cache_perf.py
    • Added a new test file to verify the warm_up_cache optimization.
    • Included tests for fresh, stale, and missing cache scenarios.
Activity
  • No human activity recorded for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

**Learning:** Python's `ThreadPoolExecutor` incurs measurable overhead (thread creation/shutdown) when created/destroyed repeatedly inside loops, even with small worker counts.
**Action:** Lift `ThreadPoolExecutor` creation to the highest possible scope and pass it down as a dependency (using `contextlib.nullcontext` for flexible ownership).

## 2026-02-22 - [Stale Module References in Tests]

Check notice

Code scanning / Remark-lint (reported by Codacy)

Warn when references to undefined definitions are found. Note

[no-undefined-references] Found reference to undefined definition
**Learning:** Python's `ThreadPoolExecutor` incurs measurable overhead (thread creation/shutdown) when created/destroyed repeatedly inside loops, even with small worker counts.
**Action:** Lift `ThreadPoolExecutor` creation to the highest possible scope and pass it down as a dependency (using `contextlib.nullcontext` for flexible ownership).

## 2026-02-22 - [Stale Module References in Tests]

Check notice

Code scanning / Remark-lint (reported by Codacy)

Warn when shortcut reference links are used. Note

[no-shortcut-reference-link] Use the trailing [] on reference links
main.py Outdated

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (120/100) Warning

Line too long (120/100)
# We do NOT import main at top level to avoid holding a stale reference
# if other tests reload the module.

class TestWarmUpCachePerf(unittest.TestCase):

Check warning

Code scanning / Pylint (reported by Codacy)

Missing class docstring Warning test

Missing class docstring
URL is already fresh in the disk cache.
"""
import unittest
from unittest.mock import patch, MagicMock

Check notice

Code scanning / Pylint (reported by Codacy)

Unused MagicMock imported from unittest.mock Note test

Unused MagicMock imported from unittest.mock
import main
self.main = main

self.main._cache.clear()

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _cache of a client class Note test

Access to a protected member _cache of a client class
self.main = main

self.main._cache.clear()
self.main._disk_cache.clear()

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class
test_data = {"group": {"group": "Fresh Folder"}, "rules": []}

# Populate disk cache with a FRESH entry
with self.main._cache_lock:

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _cache_lock of a client class Note test

Access to a protected member _cache_lock of a client class

# Populate disk cache with a FRESH entry
with self.main._cache_lock:
self.main._disk_cache[test_url] = {

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class

# Populate disk cache with a STALE entry (older than TTL)
stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL
with self.main._cache_lock:

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _cache_lock of a client class Note test

Access to a protected member _cache_lock of a client class
# Populate disk cache with a STALE entry (older than TTL)
stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL
with self.main._cache_lock:
self.main._disk_cache[test_url] = {

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class
test_data = {"group": {"group": "Missing Folder"}, "rules": []}

# Ensure cache is empty
self.assertNotIn(test_url, self.main._disk_cache)

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class
URL is already fresh in the disk cache.
"""
import unittest
from unittest.mock import patch, MagicMock

Check warning

Code scanning / Prospector (reported by Codacy)

Unused MagicMock imported from unittest.mock (unused-import) Warning test

Unused MagicMock imported from unittest.mock (unused-import)
if 'main' in sys.modules:
self.main = sys.modules['main']
else:
import main

Check warning

Code scanning / Prospector (reported by Codacy)

Import "import main" should be placed at the top of the module (wrong-import-position) Warning test

Import "import main" should be placed at the top of the module (wrong-import-position)
import main
self.main = main

self.main._cache.clear()

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _cache of a client class (protected-access) Warning test

Access to a protected member _cache of a client class (protected-access)
self.main = main

self.main._cache.clear()
self.main._disk_cache.clear()

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _disk_cache of a client class (protected-access) Warning test

Access to a protected member _disk_cache of a client class (protected-access)

def tearDown(self):
"""Clean up after each test."""
self.main._cache.clear()

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _cache of a client class (protected-access) Warning test

Access to a protected member _cache of a client class (protected-access)

# Populate disk cache with a FRESH entry
with self.main._cache_lock:
self.main._disk_cache[test_url] = {

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _disk_cache of a client class (protected-access) Warning test

Access to a protected member _disk_cache of a client class (protected-access)

# Populate disk cache with a STALE entry (older than TTL)
stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL
with self.main._cache_lock:

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _cache_lock of a client class (protected-access) Warning test

Access to a protected member _cache_lock of a client class (protected-access)
# Populate disk cache with a STALE entry (older than TTL)
stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL
with self.main._cache_lock:
self.main._disk_cache[test_url] = {

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _disk_cache of a client class (protected-access) Warning test

Access to a protected member _disk_cache of a client class (protected-access)
test_data = {"group": {"group": "Missing Folder"}, "rules": []}

# Ensure cache is empty
self.assertNotIn(test_url, self.main._disk_cache)

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _disk_cache of a client class (protected-access) Warning test

Access to a protected member _disk_cache of a client class (protected-access)
# VERIFICATION: validate_folder_url SHOULD be called for missing entry
mock_validate.assert_called_with(test_url)

if __name__ == '__main__':

Check warning

Code scanning / Prospector (reported by Codacy)

expected 2 blank lines after class or function definition, found 1 (E305) Warning test

expected 2 blank lines after class or function definition, found 1 (E305)
main.py Outdated

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (120/100) Warning

Line too long (120/100)
# We do NOT import main at top level to avoid holding a stale reference
# if other tests reload the module.

class TestWarmUpCachePerf(unittest.TestCase):

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Missing class docstring Warning test

Missing class docstring
if 'main' in sys.modules:
self.main = sys.modules['main']
else:
import main

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Import outside toplevel (main) Warning test

Import outside toplevel (main)
URL is already fresh in the disk cache.
"""
import unittest
from unittest.mock import patch, MagicMock

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Unused MagicMock imported from unittest.mock Note test

Unused MagicMock imported from unittest.mock
import main
self.main = main

self.main._cache.clear()

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Access to a protected member _cache of a client class Note test

Access to a protected member _cache of a client class
test_data = {"group": {"group": "Fresh Folder"}, "rules": []}

# Populate disk cache with a FRESH entry
with self.main._cache_lock:

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Access to a protected member _cache_lock of a client class Note test

Access to a protected member _cache_lock of a client class

# Populate disk cache with a FRESH entry
with self.main._cache_lock:
self.main._disk_cache[test_url] = {

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class

# Populate disk cache with a STALE entry (older than TTL)
stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL
with self.main._cache_lock:

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Access to a protected member _cache_lock of a client class Note test

Access to a protected member _cache_lock of a client class
# Populate disk cache with a STALE entry (older than TTL)
stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL
with self.main._cache_lock:
self.main._disk_cache[test_url] = {

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class
test_data = {"group": {"group": "Missing Folder"}, "rules": []}

# Ensure cache is empty
self.assertNotIn(test_url, self.main._disk_cache)

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant optimizations to the cache warm-up process and fixes a critical validation bug. However, a new caching optimization introduces a high-severity Server-Side Request Forgery (SSRF) flaw. This vulnerability allows a bypass of critical hostname validation if a local cache file is tampered with, and it is recommended to remove the optimization that skips the validation step to mitigate this risk. Additionally, the validate_folder_data function has been corrected to return True on successful validation, resolving a bug that caused sync failures. A new test file tests/test_warm_up_cache_perf.py has been added, and the .jules/bolt.md file has been updated with a new learning and action item regarding stale module references in tests.

Comment on lines +1817 to +1818
if _is_cache_fresh(url):
return _gh_get(url)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

While this optimization aims to improve startup performance by skipping DNS validation if the cache is fresh, it introduces a high-severity Server-Side Request Forgery (SSRF) vulnerability. The _is_cache_fresh function trusts the last_validated timestamp from a disk-based cache file (blocklists.json) to bypass the validate_folder_url security check. An attacker with local file system access could tamper with this cache file, leading to the application making requests to attacker-specified URLs. It is critical that validation is not skipped based on a potentially user-controllable cache file.

Comment on lines +683 to +701
def _is_cache_fresh(url: str) -> bool:
"""
Checks if the URL is in the persistent cache and within TTL.

This optimization allows skipping expensive DNS validation for
content that is already known to be safe (validated at fetch time).
"""
# Check in-memory cache first
with _cache_lock:
if url in _cache:
return True

# Check disk cache
entry = _disk_cache.get(url)
if entry:
last_validated = entry.get("last_validated", 0)
if time.time() - last_validated < CACHE_TTL_SECONDS:
return True
return False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _is_cache_fresh function is a good optimization to avoid unnecessary DNS lookups for cached entries. This will improve performance, especially during warm-up.

Comment on lines +1 to +118
"""
Tests for the warm_up_cache optimization.

This module verifies that warm_up_cache skips DNS validation when the
URL is already fresh in the disk cache.
"""
import unittest
from unittest.mock import patch, MagicMock
import sys
import os
import time

# Add root to path to import main
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

# We do NOT import main at top level to avoid holding a stale reference
# if other tests reload the module.

class TestWarmUpCachePerf(unittest.TestCase):
def setUp(self):
"""Reset cache state before each test."""
# Get the current main module from sys.modules
# If not present, import it
if 'main' in sys.modules:
self.main = sys.modules['main']
else:
import main
self.main = main

self.main._cache.clear()
self.main._disk_cache.clear()
self.main.validate_folder_url.cache_clear()

def tearDown(self):
"""Clean up after each test."""
self.main._cache.clear()
self.main._disk_cache.clear()
self.main.validate_folder_url.cache_clear()

def test_warm_up_skips_validation_for_fresh_cache(self):
"""
Test that warm_up_cache does NOT call validate_folder_url
if the URL is present in _disk_cache and fresh (within TTL).
"""
test_url = "https://example.com/fresh.json"
test_data = {"group": {"group": "Fresh Folder"}, "rules": []}

# Populate disk cache with a FRESH entry
with self.main._cache_lock:
self.main._disk_cache[test_url] = {
"data": test_data,
"fetched_at": time.time(),
"last_validated": time.time(), # Just now
"etag": "123",
"last_modified": "Tue, 15 Nov 1994 12:45:26 GMT"
}

# Mock validate_folder_url to track calls
with patch('main.validate_folder_url') as mock_validate:
# Mock _gh_get to return data without network
with patch('main._gh_get', return_value=test_data) as mock_gh_get:

self.main.warm_up_cache([test_url])

# VERIFICATION: validate_folder_url should NOT be called
mock_validate.assert_not_called()

# _gh_get should still be called (it handles cache retrieval)
mock_gh_get.assert_called_with(test_url)

def test_warm_up_calls_validation_for_stale_cache(self):
"""
Test that warm_up_cache DOES call validate_folder_url
if the URL is present in _disk_cache but STALE (expired TTL).
"""
test_url = "https://example.com/stale.json"
test_data = {"group": {"group": "Stale Folder"}, "rules": []}

# Populate disk cache with a STALE entry (older than TTL)
stale_time = time.time() - (self.main.CACHE_TTL_SECONDS + 3600) # 1 hour past TTL
with self.main._cache_lock:
self.main._disk_cache[test_url] = {
"data": test_data,
"fetched_at": stale_time,
"last_validated": stale_time,
"etag": "123",
"last_modified": "Tue, 15 Nov 1994 12:45:26 GMT"
}

with patch('main.validate_folder_url', return_value=True) as mock_validate:
with patch('main._gh_get', return_value=test_data):

self.main.warm_up_cache([test_url])

# VERIFICATION: validate_folder_url SHOULD be called for stale entry
mock_validate.assert_called_with(test_url)

def test_warm_up_calls_validation_for_missing_cache(self):
"""
Test that warm_up_cache DOES call validate_folder_url
if the URL is NOT present in _disk_cache.
"""
test_url = "https://example.com/missing.json"
test_data = {"group": {"group": "Missing Folder"}, "rules": []}

# Ensure cache is empty
self.assertNotIn(test_url, self.main._disk_cache)

with patch('main.validate_folder_url', return_value=True) as mock_validate:
with patch('main._gh_get', return_value=test_data):

self.main.warm_up_cache([test_url])

# VERIFICATION: validate_folder_url SHOULD be called for missing entry
mock_validate.assert_called_with(test_url)

if __name__ == '__main__':
unittest.main()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new test file effectively verifies the cache warm-up optimization. It covers fresh, stale, and missing cache scenarios, ensuring that validate_folder_url is called only when necessary. This is a good addition to the test suite, improving confidence in the new caching logic.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes cache warm-up behavior to reduce startup DNS validation overhead, fixes a validation return-value bug that caused sync failures, and records a testing lesson in Bolt notes.

Changes:

  • Added _is_cache_fresh() and used it in warm_up_cache() to skip DNS validation when the cache is within TTL.
  • Fixed corrupted syntax in validate_folder_data() and ensured it returns True on successful validation.
  • Added a new test module covering the warm-up optimization behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
main.py Adds cache-freshness helper, changes warm-up validation flow, fixes validate_folder_data syntax/return value.
tests/test_warm_up_cache_perf.py Adds tests verifying warm-up skips/doesn’t skip URL validation depending on disk-cache freshness.
.jules/bolt.md Documents a testing pitfall around stale module references when reloading modules.

Comment on lines +1815 to 1821
# Optimization: Skip DNS validation if cache is fresh
# This saves blocking I/O for known-good content
if _is_cache_fresh(url):
return _gh_get(url)

if validate_folder_url(url):
return _gh_get(url)
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warm_up_cache currently warms the in-memory cache via _gh_get(), which does not validate the folder JSON schema. Since sync_profile’s _fetch_if_valid path returns cached entries directly (bypassing fetch_folder_data/validate_folder_data), a malformed disk cache entry (or unexpected remote JSON) can be propagated into the sync without structural validation. Consider warming using fetch_folder_data(url) (or explicitly calling validate_folder_data on the returned JSON) so cached entries are guaranteed to be schema-valid before sync_profile reuses them.

Suggested change
# Optimization: Skip DNS validation if cache is fresh
# This saves blocking I/O for known-good content
if _is_cache_fresh(url):
return _gh_get(url)
if validate_folder_url(url):
return _gh_get(url)
# Optimization: Skip DNS validation if cache is fresh.
# We still route through fetch_folder_data so that:
# - Cached entries are loaded via the same path as normal sync.
# - Schema validation (via validate_folder_data) is consistently applied
# whether data comes from disk cache or the network.
if _is_cache_fresh(url):
# This should load and validate from cache without forcing a re-download.
# Any validation errors will surface as exceptions and be logged by the
# outer warm_up_cache loop.
fetch_folder_data(url)
return None
# For non-fresh entries, validate the URL (DNS/format) first to avoid
# unnecessary network calls, then fetch+validate the folder data.
if validate_folder_url(url):
fetch_folder_data(url)

Copilot uses AI. Check for mistakes.
Comment on lines +1815 to 1822
# Optimization: Skip DNS validation if cache is fresh
# This saves blocking I/O for known-good content
if _is_cache_fresh(url):
return _gh_get(url)

if validate_folder_url(url):
return _gh_get(url)
return None
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warm_up_cache now skips validate_folder_url entirely when _is_cache_fresh(url) is true. Since load_disk_cache only sanitizes structure (it does not validate that cache keys are https URLs with a hostname), a corrupted/hand-edited disk cache could cause an unsafe/non-https URL to be accepted from cache without any URL-level validation. Consider still performing lightweight URL/scheme/hostname validation (without DNS) before returning _gh_get(url), or splitting validate_folder_url into a cheap parse/scheme check plus an optional DNS-resolution check so only the DNS part is skipped.

Copilot uses AI. Check for mistakes.
Comment on lines +683 to +694
def _is_cache_fresh(url: str) -> bool:
"""
Checks if the URL is in the persistent cache and within TTL.

This optimization allows skipping expensive DNS validation for
content that is already known to be safe (validated at fetch time).
"""
# Check in-memory cache first
with _cache_lock:
if url in _cache:
return True

Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_is_cache_fresh() currently returns True for any URL present in the in-memory _cache, even though the docstring says it checks the persistent cache and TTL. This mismatch can be misleading for future callers and makes it unclear whether “fresh” means “no network I/O needed” vs “within TTL”. Consider renaming (or updating the docstring) to reflect the actual semantics, and/or checking TTL for the in-memory entry if that’s the intended meaning.

Copilot uses AI. Check for mistakes.
URL is already fresh in the disk cache.
"""
import unittest
from unittest.mock import patch, MagicMock
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/test_warm_up_cache_perf.py imports MagicMock but never uses it, which adds noise and can hide real unused-import issues. Remove the unused import to keep the test focused.

Suggested change
from unittest.mock import patch, MagicMock
from unittest.mock import patch

Copilot uses AI. Check for mistakes.
@abhimehro
Copy link
Owner Author

🤖 Automated PR Review — Closing as duplicate

This mixes cache warm-up optimization with the syntax fix. The syntax fix is in #425, and a cleaner cache warm-up optimization is in #423. Closing as superseded.

@abhimehro abhimehro closed this Feb 28, 2026
@google-labs-jules
Copy link

🤖 Automated PR Review — Closing as duplicate

This mixes cache warm-up optimization with the syntax fix. The syntax fix is in #425, and a cleaner cache warm-up optimization is in #423. Closing as superseded.

Understood. Acknowledging that this PR is a duplicate and being closed as superseded. I will stop work on this task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants