⚡ Bolt: Parallelize rule pushing and deduplicate hostnames#107
⚡ Bolt: Parallelize rule pushing and deduplicate hostnames#107
Conversation
Parallelize batch pushing in `push_rules` using `ThreadPoolExecutor` (3 workers) to reduce I/O wait time. Add hostname deduplication to prevent sending duplicate rules to the API. Benchmarks show ~2x speedup for rule pushing (from ~0.4s to ~0.2s for 2000 rules with 100ms latency).
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Merging to
|
| for i, start in enumerate(range(0, len(filtered_hostnames), BATCH_SIZE), 1): | ||
| batch = filtered_hostnames[start : start + BATCH_SIZE] | ||
| # Prepare batches | ||
| batches = [] |
Check warning
Code scanning / Pylint (reported by Codacy)
Variable name "h" doesn't conform to snake_case naming style Warning
| sanitize_for_log(folder_name), i, len(batch) | ||
| sanitize_for_log(folder_name), batch_idx, len(batch_data) | ||
| ) | ||
| successful_batches += 1 |
Check warning
Code scanning / Pylint (reported by Codacy)
Missing function docstring Warning
| if future.result(): | ||
| successful_batches += 1 | ||
|
|
||
| if successful_batches == total_batches: |
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (124/100) Warning
| for i, start in enumerate(range(0, len(filtered_hostnames), BATCH_SIZE), 1): | ||
| batch = filtered_hostnames[start : start + BATCH_SIZE] | ||
| # Prepare batches | ||
| batches = [] |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Variable name "h" doesn't conform to snake_case naming style Warning
| if future.result(): | ||
| successful_batches += 1 | ||
|
|
||
| if successful_batches == total_batches: |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (124/100) Warning
| if future.result(): | ||
| successful_batches += 1 | ||
|
|
||
| if successful_batches == total_batches: |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Use lazy % formatting in logging functions Note
💡 What:
push_rulesto send batches of rules to the API in parallel usingconcurrent.futures.ThreadPoolExecutorwith 3 workers.🎯 Why:
The previous implementation pushed rule batches sequentially. For large blocklists, this meant the script spent most of its time waiting for API responses. By parallelizing the batches, we can utilize the network more efficiently. Deduplication further reduces the payload size.
📊 Impact:
🔬 Measurement:
Verified using a local benchmark script (
benchmark_push_rules.py) mocking thehttpxclient with simulated latency.Baseline (Sequential): ~0.40s
Optimized (Parallel): ~0.21s
Verified functionality using
python3 main.py --helpto ensure no import errors.PR created automatically by Jules for task 2918633131029543303 started by @abhimehro