Conversation
Previously, `sync_profile` would delete folders, wait for propagation (up to 60s), and ONLY THEN fetch existing rules from remaining folders. This caused serial execution of two independent I/O-bound operations. This change moves the `get_all_existing_rules` call to run in the background (using `shared_executor`) while the main thread handles folder deletions and the propagation wait. This effectively overlaps the latency of fetching rules with the deletion/wait time. Key changes: - Calculate `folders_to_delete` and `folders_to_scan` (kept folders) upfront. - Offload `get_all_existing_rules` to `shared_executor` before starting deletions. - Retrieve the result of the fetch after the deletion phase completes. - Handles `no_delete` flag correctly (scans all folders). Impact: - Significantly reduces end-to-end sync time by hiding the rule fetch latency behind the deletion wait. - Verified with `tests/test_parallel_fetch.py` (simulated ~50% reduction in fetch+delete phase). Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
😎 Merged manually by @abhimehro - details. |
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Summary of ChangesHello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant performance optimization to the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request successfully implements parallel fetching of existing rules concurrently with folder deletion, which is a significant performance improvement. The new test test_parallel_fetch.py effectively demonstrates this optimization. The changes are well-explained and address a clear performance bottleneck. I've identified a few areas for minor improvement related to clarity and error handling, but overall, the changes are positive.
| # Identify folders to delete and folders to keep (scan) | ||
| folders_to_delete = [] | ||
| folders_to_scan = existing_folders.copy() | ||
|
|
||
| # Identify folders to delete | ||
| folders_to_delete = [] | ||
| if not no_delete: | ||
| for folder_data in folder_data_list: | ||
| name = folder_data["group"]["group"].strip() | ||
| if name in existing_folders: | ||
| folders_to_delete.append((name, existing_folders[name])) | ||
| if name in folders_to_scan: | ||
| del folders_to_scan[name] |
There was a problem hiding this comment.
The logic for folders_to_delete and folders_to_scan is duplicated inside and outside the if not no_delete: block. This can lead to confusion and potential bugs if no_delete is true, as folders_to_delete would be empty, but folders_to_scan would still be a copy of existing_folders and not reflect what's actually being scanned. The folders_to_scan should only be modified if no_delete is false.
| # Identify folders to delete and folders to keep (scan) | |
| folders_to_delete = [] | |
| folders_to_scan = existing_folders.copy() | |
| # Identify folders to delete | |
| folders_to_delete = [] | |
| if not no_delete: | |
| for folder_data in folder_data_list: | |
| name = folder_data["group"]["group"].strip() | |
| if name in existing_folders: | |
| folders_to_delete.append((name, existing_folders[name])) | |
| if name in folders_to_scan: | |
| del folders_to_scan[name] | |
| folders_to_delete = [] | |
| folders_to_scan = existing_folders.copy() | |
| if not no_delete: | |
| for folder_data in folder_data_list: | |
| name = folder_data["group"]["group"].strip() | |
| if name in existing_folders: | |
| folders_to_delete.append((name, existing_folders[name])) | |
| if name in folders_to_scan: | |
| del folders_to_scan[name] |
| except Exception as e: | ||
| log.error(f"Failed to fetch existing rules in background: {sanitize_for_log(e)}") |
There was a problem hiding this comment.
The log.error message here is generic. It would be more helpful to include the profile_id in the error message to provide better context for debugging.
| except Exception as e: | |
| log.error(f"Failed to fetch existing rules in background: {sanitize_for_log(e)}") | |
| except Exception as e: | |
| log.error(f"Failed to fetch existing rules for profile {profile_id} in background: {sanitize_for_log(e)}") |
| import os | ||
|
|
||
| # Add root to path | ||
| sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) |
| import unittest | ||
| from unittest.mock import MagicMock, patch | ||
| import time | ||
| import concurrent.futures |
Check warning
Code scanning / Prospector (reported by Codacy)
Unused import concurrent.futures (unused-import) Warning test
|
|
||
| # Add root to path | ||
| sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | ||
| import main |
Check warning
Code scanning / Prospector (reported by Codacy)
Import "import main" should be placed at the top of the module (wrong-import-position) Warning test
| @patch("main.fetch_folder_data") | ||
| @patch("main.validate_folder_url") | ||
| @patch("main.countdown_timer") | ||
| def test_parallel_execution(self, mock_timer, mock_validate, mock_fetch, mock_verify, mock_delete, mock_get_rules): |
Check warning
Code scanning / Prospector (reported by Codacy)
Too many arguments (7/5) (too-many-arguments) Warning test
| @patch("main.fetch_folder_data") | ||
| @patch("main.validate_folder_url") | ||
| @patch("main.countdown_timer") | ||
| def test_parallel_execution(self, mock_timer, mock_validate, mock_fetch, mock_verify, mock_delete, mock_get_rules): |
Check warning
Code scanning / Prospector (reported by Codacy)
Unused argument 'mock_timer' (unused-argument) Warning test
| } | ||
|
|
||
| # Latency simulation | ||
| def slow_get_rules(*args, **kwargs): |
Check warning
Code scanning / Prospector (reported by Codacy)
Unused argument 'args' (unused-argument) Warning test
| } | ||
|
|
||
| # Latency simulation | ||
| def slow_get_rules(*args, **kwargs): |
Check warning
Code scanning / Prospector (reported by Codacy)
Unused argument 'kwargs' (unused-argument) Warning test
| time.sleep(1.0) | ||
| return {"rule_from_keep"} | ||
|
|
||
| def slow_delete(*args, **kwargs): |
Check warning
Code scanning / Prospector (reported by Codacy)
Unused argument 'kwargs' (unused-argument) Warning test
| time.sleep(1.0) | ||
| return {"rule_from_keep"} | ||
|
|
||
| def slow_delete(*args, **kwargs): |
Check warning
Code scanning / Prospector (reported by Codacy)
Unused argument 'args' (unused-argument) Warning test
| known_folders = call_args[0][2] if len(call_args[0]) > 2 else call_args[1]['known_folders'] | ||
|
|
||
| self.assertIn("keep_folder", known_folders) | ||
| self.assertNotIn("test_folder", known_folders, "Should not fetch rules from deleted folder") |
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (104/100) Warning test
| # Verify get_all_existing_rules was called with ONLY keep_folder | ||
| call_args = mock_get_rules.call_args | ||
| # args: client, profile_id, known_folders | ||
| known_folders = call_args[0][2] if len(call_args[0]) > 2 else call_args[1]['known_folders'] |
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (103/100) Warning test
| time.sleep(1.0) | ||
| return {"rule_from_keep"} | ||
|
|
||
| def slow_delete(*args, **kwargs): |
Check warning
Code scanning / Pylint (reported by Codacy)
Missing function docstring Warning test
| } | ||
|
|
||
| # Latency simulation | ||
| def slow_get_rules(*args, **kwargs): |
Check notice
Code scanning / Pylint (reported by Codacy)
Unused argument 'args' Note test
| time.sleep(1.0) | ||
| return {"rule_from_keep"} | ||
|
|
||
| def slow_delete(*args, **kwargs): |
Check notice
Code scanning / Pylint (reported by Codacy)
Unused argument 'args' Note test
| time.sleep(1.0) | ||
| return {"rule_from_keep"} | ||
|
|
||
| def slow_delete(*args, **kwargs): |
Check notice
Code scanning / Pylint (reported by Codacy)
Unused argument 'kwargs' Note test
| import unittest | ||
| from unittest.mock import MagicMock, patch | ||
| import time | ||
| import concurrent.futures |
Check notice
Code scanning / Pylint (reported by Codacy)
Unused import concurrent.futures Note test
| known_folders = call_args[0][2] if len(call_args[0]) > 2 else call_args[1]['known_folders'] | ||
|
|
||
| self.assertIn("keep_folder", known_folders) | ||
| self.assertNotIn("test_folder", known_folders, "Should not fetch rules from deleted folder") |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (104/100) Warning test
| # Verify get_all_existing_rules was called with ONLY keep_folder | ||
| call_args = mock_get_rules.call_args | ||
| # args: client, profile_id, known_folders | ||
| known_folders = call_args[0][2] if len(call_args[0]) > 2 else call_args[1]['known_folders'] |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (103/100) Warning test
| @patch("main.fetch_folder_data") | ||
| @patch("main.validate_folder_url") | ||
| @patch("main.countdown_timer") | ||
| def test_parallel_execution(self, mock_timer, mock_validate, mock_fetch, mock_verify, mock_delete, mock_get_rules): |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (119/100) Warning test
| } | ||
|
|
||
| # Latency simulation | ||
| def slow_get_rules(*args, **kwargs): |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused argument 'args' Note test
| } | ||
|
|
||
| # Latency simulation | ||
| def slow_get_rules(*args, **kwargs): |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused argument 'kwargs' Note test
| time.sleep(1.0) | ||
| return {"rule_from_keep"} | ||
|
|
||
| def slow_delete(*args, **kwargs): |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused argument 'kwargs' Note test
| time.sleep(1.0) | ||
| return {"rule_from_keep"} | ||
|
|
||
| def slow_delete(*args, **kwargs): |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused argument 'args' Note test
There was a problem hiding this comment.
Pull request overview
This pull request aims to optimize the sync workflow by parallelizing the fetch of existing rules with the folder deletion phase. The change modifies sync_profile to start fetching rules from folders that will be kept while deletions are in progress, potentially hiding the fetch latency behind the 60-second deletion propagation wait.
Changes:
- Modified main.py to submit
get_all_existing_rulesto a background executor while deletions proceed - Added test coverage in tests/test_parallel_fetch.py to verify the parallel execution
- Updated test_main.py to include the new
clear_cacheargument in mock objects
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| main.py | Refactored sync_profile to identify folders to keep/delete upfront and start background fetch of existing rules while deletions run |
| tests/test_parallel_fetch.py | New test file with timing-based test to verify parallel execution of delete and fetch operations |
| test_main.py | Added clear_cache=False to mock_args objects to match updated parse_args signature |
| sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | ||
| import main | ||
|
|
||
| class TestParallelFetch(unittest.TestCase): |
There was a problem hiding this comment.
Inconsistent Test Framework: This test file uses unittest.TestCase while the similar test file tests/test_parallel_deletion.py uses pytest. For consistency and to leverage pytest's better fixtures and parametrization, consider converting this test to use pytest instead of unittest. This would also make it easier to use monkeypatch and other pytest features.
| existing_rules_future = shared_executor.submit( | ||
| get_all_existing_rules, client, profile_id, folders_to_scan | ||
| ) |
There was a problem hiding this comment.
Critical Thread Safety Issue: The httpx.Client object is not thread-safe when used from multiple threads simultaneously. Here, the client is submitted to shared_executor to run get_all_existing_rules, which internally creates another ThreadPoolExecutor (5 workers) that also uses the same client object. This means the same httpx.Client instance is being used from multiple threads concurrently, which can lead to race conditions, corrupted connection state, or crashes.
According to httpx documentation, Client objects should not be shared across threads without synchronization. The recommended approach is to either:
- Create a new client within the background task, or
- Use httpx.AsyncClient with asyncio for concurrent operations, or
- Add proper locking around all client operations (not recommended due to performance impact)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
1 similar comment
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
…ization - Corrected a potential corruption/syntax error in `main.py` that caused CI failures. - Verified parallel fetch optimization (fetching existing rules while deleting folders) with `tests/test_parallel_fetch.py`. - Updated `test_main.py` to match latest `main.py` changes (arguments/execution flow). - Verified no regressions with `tests/test_performance_regression.py`. This commit ensures the codebase is in a consistent state and the optimization is correctly applied without syntax errors. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
sync_profileto initiateget_all_existing_rules(for kept folders) in parallel with the folder deletion phase.tests/test_parallel_fetch.pyshowing reduction from ~2s to ~1s in simulated conditions.PR created automatically by Jules for task 1449692924143549150 started by @abhimehro