Skip to content

⚡ Bolt: Skip redundant DNS validation for cached blocklists#423

Closed
abhimehro wants to merge 1 commit intomainfrom
bolt-cache-warmup-opt-1269213969337536655
Closed

⚡ Bolt: Skip redundant DNS validation for cached blocklists#423
abhimehro wants to merge 1 commit intomainfrom
bolt-cache-warmup-opt-1269213969337536655

Conversation

@abhimehro
Copy link
Owner

💡 What: Optimized warm_up_cache to skip DNS validation (validate_folder_url) when the blocklist is already present in the persistent disk cache and within the TTL (24h).
🎯 Why: DNS lookups are blocking I/O operations (even when threaded) and are unnecessary overhead when we serve content directly from a proven-safe local cache. This reduces startup latency for subsequent runs.
📊 Impact: Eliminates N DNS lookups on warm start, where N is the number of blocklists (default 23).
🔬 Measurement: Verified with tests/test_warm_up_cache_perf.py which confirms validate_folder_url is skipped for fresh cache entries but called for stale/missing ones.

Also fixed a syntax error/corruption in validate_folder_data.


PR created automatically by Jules for task 1269213969337536655 started by @abhimehro

Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 24, 2026 15:12
@google-labs-jules
Copy link

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@trunk-io
Copy link

trunk-io bot commented Feb 24, 2026

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

@gemini-code-assist
Copy link

Summary of Changes

Hello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the startup performance of the Bolt application by optimizing the warm_up_cache function. It introduces a check to bypass unnecessary DNS lookups for blocklists that are already stored in a fresh, local disk cache, thereby reducing I/O operations and startup latency. Additionally, a minor syntax error in the validate_folder_data function has been corrected for improved code robustness.

Highlights

  • Performance Optimization: Implemented a mechanism to skip redundant DNS validation (validate_folder_url) during warm_up_cache if blocklist data is already present in the persistent disk cache and is within its TTL.
  • Code Correction: Fixed a syntax error and formatting issues within the validate_folder_data function, specifically in the rule group validation logic.
  • Testing: Added a new performance test file (tests/test_warm_up_cache_perf.py) to confirm that DNS validation is correctly skipped for fresh cache entries and performed for stale or missing ones.
Changelog
  • main.py
    • Corrected indentation and syntax errors in the validate_folder_data function's loop for validating rule group rules.
    • Introduced a new private helper function _is_cache_fresh to determine if a given URL's data is present and fresh in either the in-memory or disk cache.
    • Modified the _validate_and_fetch nested function within warm_up_cache to call _is_cache_fresh and conditionally skip validate_folder_url if the cache entry is fresh.
  • tests/test_warm_up_cache_perf.py
    • Added a new test file containing unittest cases to verify the performance optimization in warm_up_cache.
    • Included tests to assert that validate_folder_url is not called for fresh cache entries, but is called for stale or missing ones.
Activity
  • The pull request was automatically created by Jules for a task initiated by @abhimehro.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@@ -0,0 +1,98 @@
import time
import unittest
from unittest.mock import patch, MagicMock

Check warning

Code scanning / Prospector (reported by Codacy)

Unused MagicMock imported from unittest.mock (unused-import) Warning test

Unused MagicMock imported from unittest.mock (unused-import)
# Add root to path to import main
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

import main

Check warning

Code scanning / Prospector (reported by Codacy)

Import "import main" should be placed at the top of the module (wrong-import-position) Warning test

Import "import main" should be placed at the top of the module (wrong-import-position)

class TestWarmUpCachePerf(unittest.TestCase):
def setUp(self):
main._cache.clear()

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _cache of a client class (protected-access) Warning test

Access to a protected member _cache of a client class (protected-access)
class TestWarmUpCachePerf(unittest.TestCase):
def setUp(self):
main._cache.clear()
main._disk_cache.clear()

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _disk_cache of a client class (protected-access) Warning test

Access to a protected member _disk_cache of a client class (protected-access)
main.validate_folder_url.cache_clear()

def tearDown(self):
main._cache.clear()

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _cache of a client class (protected-access) Warning test

Access to a protected member _cache of a client class (protected-access)

def tearDown(self):
main._cache.clear()
main._disk_cache.clear()

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _disk_cache of a client class (protected-access) Warning test

Access to a protected member _disk_cache of a client class (protected-access)
test_data = {"group": {"group": "Test Folder"}, "domains": ["example.com"]}

# Populate disk cache with fresh entry
main._disk_cache[test_url] = {

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _disk_cache of a client class (protected-access) Warning test

Access to a protected member _disk_cache of a client class (protected-access)

# Populate disk cache with STALE entry
stale_time = time.time() - (main.CACHE_TTL_SECONDS + 100)
main._disk_cache[test_url] = {

Check warning

Code scanning / Prospector (reported by Codacy)

Access to a protected member _disk_cache of a client class (protected-access) Warning test

Access to a protected member _disk_cache of a client class (protected-access)
# Verify _gh_get was called
mock_gh_get.assert_called_with(test_url)

if __name__ == '__main__':

Check warning

Code scanning / Prospector (reported by Codacy)

expected 2 blank lines after class or function definition, found 1 (E305) Warning test

expected 2 blank lines after class or function definition, found 1 (E305)
# Verify _gh_get was called
mock_gh_get.assert_called_with(test_url)

def test_warm_up_calls_validation_for_missing_cache(self):

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Method could be a function Warning test

Method could be a function
# Check in-memory cache first (though warm_up_cache filters these out,
# having it here makes the helper more robust)
with _cache_lock:
if url in _cache:

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (108/100) Warning

Line too long (108/100)
# OPTIMIZATION: Combine validation (DNS) and fetching (HTTP) in one task
# to allow validation latency to be parallelized.
def _validate_and_fetch(url: str):
# Optimization: Skip DNS validation if cache is fresh

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (120/100) Warning

Line too long (120/100)
@@ -0,0 +1,98 @@
import time

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Missing module docstring Warning test

Missing module docstring
# This assertion will FAIL before the fix
mock_validate.assert_not_called()

def test_warm_up_calls_validation_for_stale_cache(self):

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Method could be a function Warning test

Method could be a function

def tearDown(self):
main._cache.clear()
main._disk_cache.clear()

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class
main.validate_folder_url.cache_clear()

def tearDown(self):
main._cache.clear()

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Access to a protected member _cache of a client class Note test

Access to a protected member _cache of a client class
class TestWarmUpCachePerf(unittest.TestCase):
def setUp(self):
main._cache.clear()
main._disk_cache.clear()

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class

class TestWarmUpCachePerf(unittest.TestCase):
def setUp(self):
main._cache.clear()

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Access to a protected member _cache of a client class Note test

Access to a protected member _cache of a client class
@@ -0,0 +1,98 @@
import time
import unittest
from unittest.mock import patch, MagicMock

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Unused MagicMock imported from unittest.mock Note test

Unused MagicMock imported from unittest.mock
# Check in-memory cache first (though warm_up_cache filters these out,
# having it here makes the helper more robust)
with _cache_lock:
if url in _cache:

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (108/100) Warning

Line too long (108/100)
# OPTIMIZATION: Combine validation (DNS) and fetching (HTTP) in one task
# to allow validation latency to be parallelized.
def _validate_and_fetch(url: str):
# Optimization: Skip DNS validation if cache is fresh

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (120/100) Warning

Line too long (120/100)
@@ -0,0 +1,98 @@
import time

Check warning

Code scanning / Pylint (reported by Codacy)

Missing module docstring Warning test

Missing module docstring
# Add root to path to import main
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

import main

Check warning

Code scanning / Pylint (reported by Codacy)

Import "import main" should be placed at the top of the module Warning test

Import "import main" should be placed at the top of the module

import main

class TestWarmUpCachePerf(unittest.TestCase):

Check warning

Code scanning / Pylint (reported by Codacy)

Missing class docstring Warning test

Missing class docstring
class TestWarmUpCachePerf(unittest.TestCase):
def setUp(self):
main._cache.clear()
main._disk_cache.clear()

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class
main.validate_folder_url.cache_clear()

def tearDown(self):
main._cache.clear()

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _cache of a client class Note test

Access to a protected member _cache of a client class

def tearDown(self):
main._cache.clear()
main._disk_cache.clear()

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class
test_data = {"group": {"group": "Test Folder"}, "domains": ["example.com"]}

# Populate disk cache with fresh entry
main._disk_cache[test_url] = {

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class

# Populate disk cache with STALE entry
stale_time = time.time() - (main.CACHE_TTL_SECONDS + 100)
main._disk_cache[test_url] = {

Check notice

Code scanning / Pylint (reported by Codacy)

Access to a protected member _disk_cache of a client class Note test

Access to a protected member _disk_cache of a client class
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes startup cache warming by avoiding redundant DNS validation when blocklist data can be served from a fresh persistent disk cache, and repairs corrupted logic in validate_folder_data.

Changes:

  • Add _is_cache_fresh() and update warm_up_cache() to skip validate_folder_url() for URLs with fresh cached entries.
  • Fix syntax/corruption in validate_folder_data() and ensure it returns True on the success path.
  • Add tests/test_warm_up_cache_perf.py to verify validation is skipped for fresh cache entries and executed for stale/missing ones.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
main.py Adds cache-freshness helper + warm-up optimization; fixes validation logic corruption and restores correct return behavior.
tests/test_warm_up_cache_perf.py Adds regression tests to ensure DNS validation is skipped only when disk cache entries are within TTL.

Comment on lines +1782 to +1786
entry = _disk_cache.get(url)
if entry:
last_validated = entry.get("last_validated", 0)
if time.time() - last_validated < CACHE_TTL_SECONDS:
return True
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_is_cache_fresh() subtracts time.time() - last_validated without validating the type. Since load_disk_cache() only guarantees a data field, a malformed cache file could contain last_validated: null or a string, causing a TypeError and breaking warm_up_cache startup. Defensive fix: coerce/validate last_validated (int/float) before subtraction and treat invalid values as stale (same hardening should be applied in _gh_get() which uses the same subtraction).

Copilot uses AI. Check for mistakes.
Comment on lines +1774 to +1781
def _is_cache_fresh(url: str) -> bool:
"""Checks if the URL is in the persistent cache and within TTL."""
# Check in-memory cache first (though warm_up_cache filters these out,
# having it here makes the helper more robust)
with _cache_lock:
if url in _cache:
return True

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says this checks the “persistent cache”, but the function also returns True when the URL exists in the in-memory _cache. Update the docstring to reflect both caches (or drop the in-memory check if you want the helper to be strictly about disk freshness).

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,98 @@
import time
import unittest
from unittest.mock import patch, MagicMock
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MagicMock is imported but never used in this test file. Removing the unused import keeps the test code tidy and avoids confusing future readers about intended mocking behavior.

Suggested change
from unittest.mock import patch, MagicMock
from unittest.mock import patch

Copilot uses AI. Check for mistakes.
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant optimization by skipping redundant DNS validation for blocklists that are already present and fresh in the persistent disk cache. However, this optimization reveals a critical security control bypass: the application trusts pre-fetched data in the in-memory cache without performing necessary security validations (e.g., checking for malicious folder names). This allows potentially malicious content from remote blocklists to bypass security checks. Additionally, a critical syntax error and an incorrect variable reference in the validate_folder_data function have been fixed, improving code robustness and debuggability. New performance tests (test_warm_up_cache_perf.py) provide excellent coverage for the caching logic.

Comment on lines +1774 to +1787
def _is_cache_fresh(url: str) -> bool:
"""Checks if the URL is in the persistent cache and within TTL."""
# Check in-memory cache first (though warm_up_cache filters these out,
# having it here makes the helper more robust)
with _cache_lock:
if url in _cache:
return True

entry = _disk_cache.get(url)
if entry:
last_validated = entry.get("last_validated", 0)
if time.time() - last_validated < CACHE_TTL_SECONDS:
return True
return False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new _is_cache_fresh helper function is a well-encapsulated and clear addition. It correctly checks both the in-memory and disk caches for freshness based on the CACHE_TTL_SECONDS, which is central to the PR's optimization goal.

Comment on lines +1 to +98
import time
import unittest
from unittest.mock import patch, MagicMock
import sys
import os

# Add root to path to import main
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

import main

class TestWarmUpCachePerf(unittest.TestCase):
def setUp(self):
main._cache.clear()
main._disk_cache.clear()
main.validate_folder_url.cache_clear()

def tearDown(self):
main._cache.clear()
main._disk_cache.clear()
main.validate_folder_url.cache_clear()

def test_warm_up_skips_validation_for_fresh_cache(self):
"""
Test that warm_up_cache skips validate_folder_url if the URL is in disk cache and fresh.
"""
test_url = "https://example.com/test.json"
test_data = {"group": {"group": "Test Folder"}, "domains": ["example.com"]}

# Populate disk cache with fresh entry
main._disk_cache[test_url] = {
"data": test_data,
"last_validated": time.time(), # Fresh
"fetched_at": time.time(),
}

# Mock validate_folder_url to ensure it is NOT called
# Mock _gh_get to verify it IS called (which will use the cache)
with patch('main.validate_folder_url') as mock_validate:
with patch('main._gh_get', return_value=test_data) as mock_gh_get:

main.warm_up_cache([test_url])

# Verify _gh_get was called (it handles cache retrieval)
mock_gh_get.assert_called_with(test_url)

# Verify validate_folder_url was NOT called
# This assertion will FAIL before the fix
mock_validate.assert_not_called()

def test_warm_up_calls_validation_for_stale_cache(self):
"""
Test that warm_up_cache calls validate_folder_url if the URL is stale in disk cache.
"""
test_url = "https://example.com/test.json"
test_data = {"group": {"group": "Test Folder"}, "domains": ["example.com"]}

# Populate disk cache with STALE entry
stale_time = time.time() - (main.CACHE_TTL_SECONDS + 100)
main._disk_cache[test_url] = {
"data": test_data,
"last_validated": stale_time,
"fetched_at": stale_time,
}

with patch('main.validate_folder_url', return_value=True) as mock_validate:
with patch('main._gh_get', return_value=test_data) as mock_gh_get:

main.warm_up_cache([test_url])

# Verify validate_folder_url WAS called
mock_validate.assert_called_with(test_url)

# Verify _gh_get was called
mock_gh_get.assert_called_with(test_url)

def test_warm_up_calls_validation_for_missing_cache(self):
"""
Test that warm_up_cache calls validate_folder_url if the URL is not in disk cache.
"""
test_url = "https://example.com/test.json"
test_data = {"group": {"group": "Test Folder"}, "domains": ["example.com"]}

# Cache is empty

with patch('main.validate_folder_url', return_value=True) as mock_validate:
with patch('main._gh_get', return_value=test_data) as mock_gh_get:

main.warm_up_cache([test_url])

# Verify validate_folder_url WAS called
mock_validate.assert_called_with(test_url)

# Verify _gh_get was called
mock_gh_get.assert_called_with(test_url)

if __name__ == '__main__':
unittest.main()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new test file provides excellent coverage for the cache warm-up optimization. The tests clearly verify that validate_folder_url is correctly skipped for fresh cache entries and called for stale or missing ones. This ensures the new logic behaves as intended and prevents regressions.

@abhimehro
Copy link
Owner Author

🤖 Automated PR Review — Closing as duplicate

This PR bundles cache warm-up optimization with the ubiquitous syntax fix. The cache warm-up concept is good but this implementation mixes concerns. The syntax fix is in #412. Closing — consider re-submitting the cache warm-up optimization as a standalone PR after the syntax fix lands on main.

@abhimehro abhimehro closed this Feb 28, 2026
@google-labs-jules
Copy link

🤖 Automated PR Review — Closing as duplicate

This PR bundles cache warm-up optimization with the ubiquitous syntax fix. The cache warm-up concept is good but this implementation mixes concerns. The syntax fix is in #412. Closing — consider re-submitting the cache warm-up optimization as a standalone PR after the syntax fix lands on main.

Understood. Acknowledging that this PR is closed due to mixed concerns with the syntax fix. I am stopping work on this task for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants