-
Notifications
You must be signed in to change notification settings - Fork 1
⚡ Bolt: Parallelize folder data fetching #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
201eb34
80ed686
6ab3e6f
a830d61
90edba7
f418f5b
0676020
5925c38
a08b97a
35af64d
2bfde42
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,7 @@ | |
| import os | ||
| import logging | ||
| import time | ||
| import concurrent.futures | ||
| import re | ||
| from typing import Dict, List, Optional, Any, Set, Sequence | ||
|
|
||
|
|
@@ -350,14 +351,31 @@ def sync_profile( | |
| try: | ||
| # Fetch all folder data first | ||
| folder_data_list = [] | ||
| for url in folder_urls: | ||
| if not validate_folder_url(url): | ||
| continue | ||
|
|
||
| # Validate URLs first | ||
| valid_urls = [url for url in folder_urls if validate_folder_url(url)] | ||
|
|
||
| invalid_count = len(folder_urls) - len(valid_urls) | ||
| if invalid_count > 0: | ||
| log.warning(f"Filtered out {invalid_count} invalid URL(s)") | ||
|
|
||
| if not valid_urls: | ||
| log.error("No valid folder URLs to fetch") | ||
| return False | ||
|
|
||
| def safe_fetch(url): | ||
| try: | ||
| folder_data_list.append(fetch_folder_data(url)) | ||
| return fetch_folder_data(url) | ||
| except (httpx.HTTPError, KeyError) as e: | ||
| log.error(f"Failed to fetch folder data from {url}: {e}") | ||
| continue | ||
| return None | ||
|
Comment on lines
+366
to
+371
|
||
|
|
||
| # Fetch folder data in parallel to speed up startup | ||
| max_workers = min(10, len(valid_urls)) | ||
| with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor: | ||
| results = executor.map(safe_fetch, valid_urls) | ||
|
|
||
| folder_data_list = [r for r in results if r is not None] | ||
|
|
||
| if not folder_data_list: | ||
| log.error("No valid folder data found") | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
safe_fetchfunction is accessing shared mutable state (_cachedictionary) from multiple threads without synchronization. The_gh_getfunction called byfetch_folder_dataperforms a check-then-act operation on the cache that is not thread-safe. Multiple threads could simultaneously check if a URL is in the cache, both find it's missing, and then both attempt to fetch and store it, potentially causing race conditions or inconsistent state. Consider adding a lock around the cache access in_gh_getor using a thread-safe caching mechanism likefunctools.lru_cachewith appropriate thread safety guarantees.