⚡ Bolt: Avoid thread pool overhead for small rule updates#192
⚡ Bolt: Avoid thread pool overhead for small rule updates#192
Conversation
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Merging to
|
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
| **Learning:** Re-validating resource properties (like DNS/IP) when using *cached content* is pure overhead. If the content is served from memory (proven safe at fetch time), checking the *current* state of the source is disconnected from the data being used. | ||
| **Action:** When using a multi-stage pipeline (Warmup -> Process), ensure validation state persists alongside the data cache. Avoid clearing validation caches between stages if the data cache is not also cleared. | ||
|
|
||
| ## 2026-01-28 - [Avoid ThreadPoolExecutor Overhead] |
Check notice
Code scanning / Remark-lint (reported by Codacy)
Warn when references to undefined definitions are found.
| **Learning:** Re-validating resource properties (like DNS/IP) when using *cached content* is pure overhead. If the content is served from memory (proven safe at fetch time), checking the *current* state of the source is disconnected from the data being used. | ||
| **Action:** When using a multi-stage pipeline (Warmup -> Process), ensure validation state persists alongside the data cache. Avoid clearing validation caches between stages if the data cache is not also cleared. | ||
|
|
||
| ## 2026-01-28 - [Avoid ThreadPoolExecutor Overhead] |
Check notice
Code scanning / Remark-lint (reported by Codacy)
Warn when shortcut reference links are used.
| @@ -0,0 +1,102 @@ | |||
|
|
|||
Check warning
Code scanning / Pylint (reported by Codacy)
Missing module docstring
| # Add root to path to import main | ||
| sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | ||
|
|
||
| import main |
Check warning
Code scanning / Pylint (reported by Codacy)
Import "import main" should be placed at the top of the module
|
|
||
| import main | ||
|
|
||
| class TestPushRulesPerf(unittest.TestCase): |
Check warning
Code scanning / Pylint (reported by Codacy)
Missing class docstring
| self.profile_id = "test_profile" | ||
| self.folder_name = "test_folder" | ||
| self.folder_id = "test_folder_id" | ||
| self.do = 1 |
Check warning
Code scanning / Pylint (reported by Codacy)
Attribute name "do" doesn't conform to snake_case naming style
|
|
||
| # Verify if Executor was called. | ||
| # AFTER OPTIMIZATION: This should be False. | ||
| self.assertFalse(mock_executor.called, "ThreadPoolExecutor should NOT be called for single batch") |
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (106/100)
| @@ -0,0 +1,102 @@ | |||
|
|
|||
| import unittest | |||
| from unittest.mock import MagicMock, patch, ANY | |||
Check notice
Code scanning / Pylint (reported by Codacy)
Unused ANY imported from unittest.mock
| @@ -0,0 +1,102 @@ | |||
|
|
|||
| import unittest | |||
| from unittest.mock import MagicMock, patch, ANY | |||
Check warning
Code scanning / Prospector (reported by Codacy)
Unused ANY imported from unittest.mock (unused-import)
| # Add root to path to import main | ||
| sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | ||
|
|
||
| import main |
Check warning
Code scanning / Prospector (reported by Codacy)
Import "import main" should be placed at the top of the module (wrong-import-position)
| # This should ALWAYS be True | ||
| self.assertTrue(mock_executor.called, "ThreadPoolExecutor should be called for multi-batch") | ||
|
|
||
| if __name__ == '__main__': |
Check warning
Code scanning / Prospector (reported by Codacy)
expected 2 blank lines after class or function definition, found 1 (E305)
| @@ -0,0 +1,102 @@ | |||
|
|
|||
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing module docstring
| # Add root to path to import main | ||
| sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | ||
|
|
||
| import main |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Import "import main" should be placed at the top of the module
|
|
||
| import main | ||
|
|
||
| class TestPushRulesPerf(unittest.TestCase): |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing class docstring
| self.profile_id = "test_profile" | ||
| self.folder_name = "test_folder" | ||
| self.folder_id = "test_folder_id" | ||
| self.do = 1 |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Attribute name "do" doesn't conform to snake_case naming style
|
|
||
| # Verify if Executor was called. | ||
| # AFTER OPTIMIZATION: This should be False. | ||
| self.assertFalse(mock_executor.called, "ThreadPoolExecutor should NOT be called for single batch") |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (106/100)
| @@ -0,0 +1,102 @@ | |||
|
|
|||
| import unittest | |||
| from unittest.mock import MagicMock, patch, ANY | |||
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused ANY imported from unittest.mock
There was a problem hiding this comment.
Pull request overview
This PR optimizes push_rules() to avoid ThreadPoolExecutor overhead when there’s only a single batch of rules to push, while preserving the existing parallel behavior for multi-batch updates. It also adds a regression test to ensure the executor is bypassed for single-batch workloads.
Changes:
- Run single-batch rule pushes synchronously (skip
ThreadPoolExecutor) to reduce overhead. - Keep parallel processing via
ThreadPoolExecutorfor multi-batch rule pushes. - Add tests validating executor usage behavior for single vs multi-batch pushes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
main.py |
Adds a total_batches == 1 fast path to avoid thread pool creation for single-batch updates. |
tests/test_push_rules_perf.py |
Adds regression coverage asserting executor is skipped for single batch and used for multi-batch. |
.jules/bolt.md |
Documents the optimization rationale as a Bolt learning/action item. |
| @@ -0,0 +1,102 @@ | |||
|
|
|||
| import unittest | |||
| from unittest.mock import MagicMock, patch, ANY | |||
There was a problem hiding this comment.
ANY is imported from unittest.mock but never used in this test module. Please remove it to avoid unused-import lint noise and keep the test focused.
| from unittest.mock import MagicMock, patch, ANY | |
| from unittest.mock import MagicMock, patch |
Summary of ChangesHello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a performance optimization for rule updates by intelligently managing the use of Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a performance optimization to avoid ThreadPoolExecutor overhead for small rule updates by handling single-batch updates synchronously. The change is logical and includes a new regression test to verify the behavior. My review focuses on improving code maintainability by addressing code duplication in the implementation and simplifying the new test case for better clarity.
| if total_batches == 1: | ||
| result = process_batch(1, batches[0]) | ||
| if result: | ||
| successful_batches += 1 | ||
| existing_rules.update(result) | ||
|
|
||
| render_progress_bar( | ||
| successful_batches, | ||
| total_batches, | ||
| f"Folder {sanitize_for_log(folder_name)}", | ||
| ) | ||
| else: | ||
| with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor: | ||
| futures = { | ||
| executor.submit(process_batch, i, batch): i | ||
| for i, batch in enumerate(batches, 1) | ||
| } | ||
|
|
||
| for future in concurrent.futures.as_completed(futures): | ||
| result = future.result() | ||
| if result: | ||
| successful_batches += 1 | ||
| existing_rules.update(result) | ||
|
|
||
| render_progress_bar( | ||
| successful_batches, | ||
| total_batches, | ||
| f"Folder {sanitize_for_log(folder_name)}", | ||
| ) | ||
| for future in concurrent.futures.as_completed(futures): | ||
| result = future.result() | ||
| if result: | ||
| successful_batches += 1 | ||
| existing_rules.update(result) | ||
|
|
||
| render_progress_bar( | ||
| successful_batches, | ||
| total_batches, | ||
| f"Folder {sanitize_for_log(folder_name)}", | ||
| ) |
There was a problem hiding this comment.
There is code duplication between the if and else blocks for handling the result of process_batch. Both branches contain the logic to increment successful_batches, update existing_rules, and render the progress bar. This makes the code harder to maintain, as any change to this logic must be applied in two places.
To improve maintainability and follow the DRY (Don't Repeat Yourself) principle, you can extract the result handling logic into a local helper function.
def _handle_batch_result(result: Optional[List[str]]):
nonlocal successful_batches
if result:
successful_batches += 1
existing_rules.update(result)
render_progress_bar(
successful_batches,
total_batches,
f"Folder {sanitize_for_log(folder_name)}",
)
if total_batches == 1:
result = process_batch(1, batches[0])
_handle_batch_result(result)
else:
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
futures = {
executor.submit(process_batch, i, batch): i
for i, batch in enumerate(batches, 1)
}
for future in concurrent.futures.as_completed(futures):
result = future.result()
_handle_batch_result(result)| # Mock executor context manager | ||
| mock_executor_instance = mock_executor.return_value | ||
| mock_executor_instance.__enter__.return_value = mock_executor_instance | ||
| mock_executor_instance.__exit__.return_value = None | ||
|
|
||
| # Mock future | ||
| mock_future = MagicMock() | ||
| mock_future.result.return_value = hostnames # Success | ||
| mock_executor_instance.submit.return_value = mock_future | ||
|
|
||
| # Mock as_completed to yield the future immediately | ||
| mock_as_completed.return_value = [mock_future] | ||
|
|
||
| # Since we are bypassing TPE, we might need to mock API call? | ||
| # The code will call process_batch(1, batch). | ||
| # process_batch calls _api_post_form. | ||
| # client is mocked, so _api_post_form works (retries mocked). | ||
| # But we need to ensure process_batch works correctly in isolation. | ||
|
|
||
| # For this test, we mock _api_post_form? | ||
| # No, _api_post_form calls client.post. |
There was a problem hiding this comment.
The mocks for the ThreadPoolExecutor instance and the commented-out development notes are unnecessary for this test. Since this test verifies the single-batch scenario where the thread pool is not used, these lines are redundant and can be removed. This will make the test cleaner and more focused on its primary assertion.
This work implements an optimization to avoid thread pool creation for single-batch rule pushes. However, this approach has been superseded by #192. Submitting for record keeping. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
💡 What: Modified `push_rules` to bypass `ThreadPoolExecutor` creation when `len(batches) == 1`. Instead of spinning up a thread pool for a single task, the batch is processed synchronously in the main thread. 🎯 Why: Creating a thread pool, submitting a future, and waiting for completion incurs overhead (context switching, thread management). For small updates (which result in a single batch of < 500 rules), this overhead is unnecessary and dominates the execution time compared to the actual API call. Running synchronously avoids this overhead entirely. 📊 Impact: - Eliminates thread creation latency for small folders. - Reduces CPU usage slightly by avoiding thread management. - Makes small updates faster and more efficient. 🔬 Measurement: Run `uv run python -m unittest tests/test_push_rules_perf.py`. This test verifies that `ThreadPoolExecutor` is NOT initialized for single-batch payloads, but IS initialized for multi-batch payloads, ensuring the optimization works as intended without breaking larger updates. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Co-Authored-By: Warp <agent@warp.dev>
260ddc1 to
23d6d94
Compare
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Avoids ThreadPoolExecutor overhead for small rule updates by processing single batches synchronously. Includes regression test.
PR created automatically by Jules for task 13197744824228566137 started by @abhimehro