Skip to content

Add --no-cache flag and document persistent blocklist cache#408

Closed
Copilot wants to merge 2 commits intomainfrom
copilot/implement-persistent-disk-cache
Closed

Add --no-cache flag and document persistent blocklist cache#408
Copilot wants to merge 2 commits intomainfrom
copilot/implement-persistent-disk-cache

Conversation

Copy link

Copilot AI commented Feb 20, 2026

Every sync cold-starts by re-downloading all blocklists; the persistent disk cache already existed in the codebase but lacked a way to bypass it and had no user documentation.

Changes

main.py

  • Added _no_cache: bool = False module-level flag
  • Added --no-cache CLI argument to parse_args()
  • In main(): sets _no_cache = True, clears the just-loaded disk cache, and consolidated global declarations
  • save_disk_cache() returns early when _no_cache is set
  • Both _disk_cache[url] = ... write paths in _gh_get() guarded by if not _no_cache:

tests/test_disk_cache.py

  • Added _no_cache = False resets to setUp/tearDown to prevent cross-test pollution
  • test_no_cache_skips_disk_cache_write — verifies _disk_cache is not populated during a fetch
  • test_no_cache_skips_save — verifies no cache file is written to disk

README.md

  • Added Blocklist Cache section documenting platform cache paths, the TTL → conditional request → full download validation chain, and both --clear-cache / --no-cache flags
python main.py --no-cache       # fetch fresh, don't read or write cache
python main.py --clear-cache    # delete cache file and exit
Original prompt

This section details on the original issue you should resolve

<issue_title>[Code Quality] Implement persistent disk cache for blocklist data</issue_title>
<issue_description>### Description

Currently, every sync operation performs a cold start - downloading and validating all blocklists from scratch. This results in unnecessary network I/O and longer sync times, especially for users running frequent syncs. Implementing a persistent disk cache with smart invalidation would reduce sync time by 30-50% on subsequent runs.

Problem

From Performance Research (Discussion #219):

Startup Performance

  • Cache warmup performs sequential DNS validation + HTTP fetches
  • No persistent caching between runs
  • Cold start downloads all blocklists every time

Impact:

  • Slower user experience on repeated syncs
  • Unnecessary bandwidth usage
  • Increased load on blocklist providers
  • No benefit from unchanged blocklists

Suggested Changes

  1. Implement persistent disk cache:

    • Store validated blocklist data in ~/.cache/ctrld-sync/ or /tmp/ctrld-sync-cache/
    • Cache key: blocklist URL + hash of content
    • Store: parsed rules, validation results, metadata
  2. Smart cache invalidation:

    • Use HTTP ETag header if available
    • Use Last-Modified header if available
    • Fallback: TTL-based expiration (e.g., 24 hours)
    • Manual invalidation via --no-cache flag
  3. Cache structure (example):

    ~/.cache/ctrld-sync/
    ├── blocklists/
    │   ├── (url_hash).json          # parsed rules
    │   ├── (url_hash).meta          # ETag, Last-Modified, timestamp
    ├── dns_cache/
    │   └── (hostname_hash).json     # DNS validation results
    └── cache.db                      # SQLite for metadata (optional)
    
  4. Implementation approach:

    • Add cache module or functions to main.py
    • Check cache before HTTP fetch
    • Validate cache freshness via headers
    • Store new data after successful fetch
    • Add --clear-cache CLI option

Files Affected

  • main.py (add cache logic or create separate cache module)
  • New: cache.py (optional - if extracted to separate module)
  • README.md (document cache location and CLI flags)

Success Criteria

  • ✅ Cold start performance unchanged or improved
  • ✅ Warm cache reduces sync time by 30-50%
  • ✅ Cache automatically invalidates when blocklists update
  • ✅ Cache respects ETag/Last-Modified headers
  • ✅ Memory usage stays same or lower
  • ✅ Add --no-cache and --clear-cache CLI options
  • ✅ Cache location documented in README
  • ✅ All existing tests pass
  • ✅ Add cache-specific tests

Priority

Medium - Significant performance improvement for user experience

Estimated Effort

1-2 days

Source

Extracted from Daily Perf Improver discussion abhimehro/ctrld-sync#219

Performance Research Findings:

1. User Experience Performance

Target: Reduce cold-start sync time by 30-50%

Opportunities:

  • Implement persistent disk cache for validated blocklist data
  • Add cache invalidation based on ETag/Last-Modified headers
  • Optimize DNS validation with result caching
  • Streaming downloads for large files instead of full load into memory

Success Metrics:

  • Time to first folder sync (cold start)
  • Time to first folder sync (warm cache)
  • Memory usage during large folder processing

Implementation Priority (from #219):

High Priority (Weeks 1-2):

  1. CI dependency caching (low-hanging fruit)
  2. Persistent disk cache for blocklists ← This issue
  3. Parallel pytest execution

🔍 Task mining by Discussion Task Miner - Code Quality Improvement Agent

To install this workflow, run gh aw add github/gh-aw/.github/workflows/discussion-task-miner.md@94662b1dee8ce96c876ba9f33b3ab8be32de82a4. View source at https://github.com/github/gh-aw/tree/94662b1dee8ce96c876ba9f33b3ab8be32de82a4/.github/workflows/discussion-task-miner.md.

  • expires on Feb 20, 2026, 8:57 PM UTC

Comments on the Issue (you are @copilot in this section)

Custom agent used: Development Partner
The Development Partner Protocol


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@trunk-io
Copy link

trunk-io bot commented Feb 20, 2026

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement persistent disk cache for blocklist data Add --no-cache flag and document persistent blocklist cache Feb 20, 2026
Copilot AI requested a review from abhimehro February 20, 2026 03:53
@github-actions
Copy link

This pull request was automatically closed because it expired on 2026-02-20T20:57:37.606Z.

Closed by Workflow

@github-actions github-actions bot closed this Feb 20, 2026
@abhimehro abhimehro deleted the copilot/implement-persistent-disk-cache branch February 22, 2026 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Code Quality] Implement persistent disk cache for blocklist data

2 participants