Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,7 @@
## 2024-05-24 - Pass Local State to Avoid Redundant Reads
**Learning:** When a process involves modifying remote state (e.g. deleting folders) and then querying it (e.g. getting rules from remaining folders), maintaining a local replica of the state avoids redundant API calls. If you know what you deleted, you don't need to ask the server "what's left?".
**Action:** Identify sequences of "Read -> Modify -> Read" and optimize to "Read -> Modify (update local) -> Use local".

## 2024-05-24 - Parallelize Validation with Fetching
**Learning:** Sequential validation (especially if it involves network IO like DNS lookups) before parallel fetching creates a bottleneck. Combining validation and fetching into a single task within a `ThreadPoolExecutor` allows validation latency to be absorbed by parallelism.
**Action:** Look for patterns like `[url for url in urls if validate(url)]` followed by `ThreadPoolExecutor`. Move the `validate(url)` check inside the executor task.
Comment on lines +35 to +37
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization contradicts the learning documented in .jules/bolt.md lines 15-17 about thread safety. The journal explicitly states "When parallelizing IO operations that update a shared collection (like a set of existing rules), always use a threading.Lock for the write operations." The _cache dictionary is a shared collection being updated from multiple threads without synchronization, which violates this established principle.

Copilot uses AI. Check for mistakes.
29 changes: 22 additions & 7 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -469,17 +469,24 @@ def fetch_folder_data(url: str) -> Dict[str, Any]:

def warm_up_cache(urls: Sequence[str]) -> None:
urls = list(set(urls))
urls_to_fetch = [u for u in urls if u not in _cache and validate_folder_url(u)]
if not urls_to_fetch:
urls_to_process = [u for u in urls if u not in _cache]
if not urls_to_process:
return

total = len(urls_to_fetch)
total = len(urls_to_process)
if not USE_COLORS:
log.info(f"Warming up cache for {total} URLs...")

# OPTIMIZATION: Combine validation (DNS) and fetching (HTTP) in one task
# to allow validation latency to be parallelized.
def _validate_and_fetch(url: str):
if validate_folder_url(url):
return _gh_get(url)
return None
Comment on lines +482 to +485
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parallelization of validation and fetching is not covered by tests. Given that this repository has comprehensive test coverage for other functionality, consider adding tests to verify the thread-safe behavior of the parallel validation and fetching, including scenarios where multiple threads attempt to fetch the same URL or where validation fails for some URLs.

Copilot uses AI. Check for mistakes.
Comment on lines +482 to +485
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _validate_and_fetch function calls _gh_get which accesses the _cache dictionary (line 319, 355, 359) from multiple threads without synchronization. This creates a race condition where multiple threads could simultaneously check if url not in _cache and proceed to fetch and write to the cache, potentially causing data corruption or redundant network requests. Consider adding a threading lock around cache access in _gh_get, or use threading.Lock() to protect the check-and-set pattern.

Copilot uses AI. Check for mistakes.

completed = 0
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = {executor.submit(_gh_get, url): url for url in urls_to_fetch}
futures = {executor.submit(_validate_and_fetch, url): url for url in urls_to_process}

if USE_COLORS:
sys.stderr.write(f"\r{Colors.CYAN}⏳ Warming up cache: 0/{total}...{Colors.ENDC}")
Expand Down Expand Up @@ -735,15 +742,23 @@ def sync_profile(
try:
# Fetch all folder data first
folder_data_list = []
valid_urls = [url for url in folder_urls if validate_folder_url(url)]

# OPTIMIZATION: Move validation inside the thread pool to parallelize DNS lookups.
# Previously, sequential validation blocked the main thread.
def _fetch_if_valid(url: str):
if validate_folder_url(url):
return fetch_folder_data(url)
return None
Comment on lines +748 to +751
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _fetch_if_valid function calls fetch_folder_data which in turn calls _gh_get. The _gh_get function accesses the _cache dictionary from multiple threads without synchronization. This creates a race condition where multiple threads could simultaneously check if url not in _cache and proceed to fetch and write to the cache, potentially causing data corruption or redundant network requests. Consider adding a threading lock around cache access in _gh_get, or use threading.Lock() to protect the check-and-set pattern.

Copilot uses AI. Check for mistakes.
Comment on lines +748 to +751
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parallelization of validation and fetching is not covered by tests. Given that this repository has comprehensive test coverage for other functionality, consider adding tests to verify the thread-safe behavior of the parallel validation and fetching, including scenarios where multiple threads attempt to fetch the same URL or where validation fails for some URLs.

Copilot uses AI. Check for mistakes.

with concurrent.futures.ThreadPoolExecutor() as executor:
future_to_url = {executor.submit(fetch_folder_data, url): url for url in valid_urls}
future_to_url = {executor.submit(_fetch_if_valid, url): url for url in folder_urls}

for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
folder_data_list.append(future.result())
result = future.result()
if result:
folder_data_list.append(result)
except (httpx.HTTPError, KeyError, ValueError) as e:
log.error(f"Failed to fetch folder data from {sanitize_for_log(url)}: {sanitize_for_log(e)}")
continue
Expand Down
Loading