Add cache TTL and --clear-cache flag to persistent blocklist cache#390
Add cache TTL and --clear-cache flag to persistent blocklist cache#390
Conversation
|
😎 Merged manually by @abhimehro - details. |
Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
| with _cache_lock: | ||
| _cache[url] = data | ||
| _cache_stats["hits"] += 1 | ||
| log.debug(f"Disk cache hit (within TTL) for {sanitize_for_log(url)}") |
Check warning
Code scanning / Prospector (reported by Codacy)
Use lazy % formatting in logging functions (logging-fstring-interpolation) Warning
| else: | ||
| print(f"{Colors.CYAN}ℹ No cache file found, nothing to clear{Colors.ENDC}") | ||
| _disk_cache.clear() | ||
| exit(0) |
Check warning
Code scanning / Prospector (reported by Codacy)
Consider using sys.exit() (consider-using-sys-exit) Warning
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _cache_stats of a client class (protected-access) Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _cache_stats of a client class (protected-access) Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _gh_get of a client class (protected-access) Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _cache_stats of a client class (protected-access) Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _cache_stats of a client class (protected-access) Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Prospector (reported by Codacy)
Access to a protected member _disk_cache of a client class (protected-access) Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Trailing whitespace Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Trailing whitespace Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Trailing whitespace Warning test
| else: | ||
| print(f"{Colors.CYAN}ℹ No cache file found, nothing to clear{Colors.ENDC}") | ||
| _disk_cache.clear() | ||
| exit(0) |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Consider using sys.exit() Warning
| print(f"{Colors.GREEN}✓ Cleared blocklist cache: {cache_file}{Colors.ENDC}") | ||
| except OSError as e: | ||
| print(f"{Colors.FAIL}✗ Failed to clear cache: {e}{Colors.ENDC}") | ||
| exit(1) |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Consider using sys.exit() Warning
tests/test_disk_cache.py
Outdated
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
tests/test_disk_cache.py
Outdated
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
tests/test_disk_cache.py
Outdated
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Access to a protected member _gh of a client class Note test
|
|
||
| # Handle --clear-cache: delete cache file and exit immediately | ||
| if args.clear_cache: | ||
| global _disk_cache |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Using the global statement Note
| with _cache_lock: | ||
| _cache[url] = data | ||
| _cache_stats["hits"] += 1 | ||
| log.debug(f"Disk cache hit (within TTL) for {sanitize_for_log(url)}") |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Use lazy % formatting in logging functions Note
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Pylint (reported by Codacy)
Instance of 'PurePath' has no 'exists' member Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Pylint (reported by Codacy)
Instance of 'PurePath' has no 'unlink' member Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Pylint (reported by Codacy)
Instance of 'PurePath' has no 'exists' member Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Pylint (reported by Codacy)
Instance of 'PurePath' has no 'write_text' member Warning test
tests/test_disk_cache.py
Outdated
Check warning
Code scanning / Pylint (reported by Codacy)
Trailing whitespace Warning test
tests/test_disk_cache.py
Outdated
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _gh_get of a client class Note test
tests/test_disk_cache.py
Outdated
Check notice
Code scanning / Pylint (reported by Codacy)
Unused argument 'url' Note test
tests/test_disk_cache.py
Outdated
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _cache_stats of a client class Note test
tests/test_disk_cache.py
Outdated
Check notice
Code scanning / Pylint (reported by Codacy)
Unused argument 'method' Note test
tests/test_disk_cache.py
Outdated
Check notice
Code scanning / Pylint (reported by Codacy)
Access to a protected member _disk_cache of a client class Note test
There was a problem hiding this comment.
Pull request overview
This pull request adds cache TTL enforcement and a manual cache clearing mechanism to the existing persistent disk cache infrastructure. The implementation enables blocklists to be served directly from disk within a 24-hour TTL window without any HTTP requests, reducing network traffic and improving cold-start performance for frequent syncs.
Changes:
- Added 24-hour TTL for disk cache entries to avoid unnecessary HTTP requests within the TTL window
- Added
--clear-cacheCLI flag to manually invalidate and delete the persistent cache file - Added comprehensive tests for TTL behavior (within/beyond TTL) and cache clearing functionality
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| main.py | Added CACHE_TTL_SECONDS constant, TTL check in _gh_get() before making HTTP requests, --clear-cache argument and handler |
| tests/test_disk_cache.py | Added three new tests: TTL cache hit without HTTP request, TTL expiry triggering conditional request, and cache file deletion |
| uv.lock | Removed empty metadata section (automated cleanup) |
Comments suppressed due to low confidence (3)
main.py:1274
- The
blocklist_fetchescounter is incremented at line 1260 before checking if a TTL cache hit occurs. When an entry is within TTL (lines 1267-1274), no HTTP request is made, but the counter has already been incremented. This makes the "Blocklist fetches" statistic misleading.
The counter should only be incremented when an actual HTTP request is made to fetch or validate a blocklist. Move the counter increment to after the TTL check, so it's only incremented when we proceed to make a network request.
# Track that we're about to make a blocklist fetch
with _cache_lock:
_api_stats["blocklist_fetches"] += 1
# Check disk cache for TTL-based hit or conditional request headers
headers = {}
cached_entry = _disk_cache.get(url)
if cached_entry:
last_validated = cached_entry.get("last_validated", 0)
if time.time() - last_validated < CACHE_TTL_SECONDS:
# Within TTL: return cached data directly without any HTTP request
data = cached_entry["data"]
with _cache_lock:
_cache[url] = data
_cache_stats["hits"] += 1
log.debug(f"Disk cache hit (within TTL) for {sanitize_for_log(url)}")
return data
tests/test_disk_cache.py:339
- The test verifies that no HTTP request is made and that cache stats are correct, but it doesn't verify that the
blocklist_fetchescounter is not incremented. Since this counter is currently being incremented even for TTL cache hits (see the related bug in main.py), this test should also verify that_api_stats["blocklist_fetches"]remains at 0 when serving from TTL cache.
def test_ttl_within_ttl_returns_disk_cache_without_request(self):
"""Test that disk cache entries within TTL are returned without an HTTP request."""
test_url = "https://example.com/test.json"
test_data = {"group": {"group": "Test"}, "domains": ["example.com"]}
# Pre-populate disk cache with a recent last_validated timestamp (within TTL)
main._disk_cache[test_url] = {
"data": test_data,
"etag": "fresh123",
"last_modified": None,
"fetched_at": time.time(),
"last_validated": time.time(), # Just validated - within TTL
}
http_called = []
def mock_stream(method, url, headers=None):
http_called.append(url)
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.headers = {"Content-Type": "application/json"}
mock_response.__enter__ = MagicMock(return_value=mock_response)
mock_response.__exit__ = MagicMock(return_value=False)
return mock_response
with patch.object(main._gh, 'stream', side_effect=mock_stream):
result = main._gh_get(test_url)
# HTTP should NOT have been called (within TTL)
self.assertEqual(len(http_called), 0)
# Result should be the cached data
self.assertEqual(result, test_data)
# Should count as a hit
self.assertEqual(main._cache_stats["hits"], 1)
self.assertEqual(main._cache_stats["misses"], 0)
self.assertEqual(main._cache_stats["validations"], 0)
tests/test_disk_cache.py:373
- The test verifies the conditional request behavior when TTL is expired, but it doesn't verify the
blocklist_fetchescounter. This test should verify that_api_stats["blocklist_fetches"]is incremented by 1 when a conditional request is sent (even though it results in a 304 response), since an HTTP request was actually made.
def test_ttl_expired_sends_conditional_request(self):
"""Test that disk cache entries beyond TTL trigger a conditional HTTP request."""
test_url = "https://example.com/test.json"
test_data = {"group": {"group": "Test"}, "domains": ["example.com"]}
# Pre-populate disk cache with an old last_validated (beyond TTL)
main._disk_cache[test_url] = {
"data": test_data,
"etag": "stale123",
"last_modified": None,
"fetched_at": 0.0, # very old
"last_validated": 0.0, # very old - beyond any TTL
}
def mock_stream(method, url, headers=None):
# Conditional request should be sent with If-None-Match
self.assertEqual(headers.get("If-None-Match"), "stale123")
mock_response = MagicMock()
mock_response.status_code = 304
mock_response.raise_for_status = MagicMock()
mock_response.headers = {}
mock_response.__enter__ = MagicMock(return_value=mock_response)
mock_response.__exit__ = MagicMock(return_value=False)
return mock_response
with patch.object(main._gh, 'stream', side_effect=mock_stream):
result = main._gh_get(test_url)
# Should return cached data (304 response)
self.assertEqual(result, test_data)
# Should count as validation (conditional request)
self.assertEqual(main._cache_stats["validations"], 1)
self.assertEqual(main._cache_stats["hits"], 0)
Every sync downloaded all blocklists from scratch with no reuse between runs. The disk cache infrastructure (ETag/304 handling, platform-aware storage) was already in place but lacked TTL enforcement and a manual invalidation path.
Changes
Cache TTL (
main.py)CACHE_TTL_SECONDS = 24 * 60 * 60(24h default)_gh_get()now checkslast_validatedagainst TTL before making any HTTP request:hits)If-None-Match/If-Modified-Since); still avoids full re-download on 304--clear-cacheflag (main.py)blocklists.jsonfrom the platform cache dir, clears in-memory_disk_cache, exits 0load_disk_cache(), before any sync work beginsTests (
tests/test_disk_cache.py)test_ttl_within_ttl_returns_disk_cache_without_request— fresh entry, no HTTP call madetest_ttl_expired_sends_conditional_request— stale entry triggersIf-None-Matchheadertest_clear_cache_deletes_file— cache file removed and in-memory map clearedOriginal prompt
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.