Reducing python test execution time#9466
Reducing python test execution time#9466MohamedBilelBesbes wants to merge 1 commit intomozilla:masterfrom
Conversation
|
Maybe you can access this overview for the pipeline execution time for PRs. Reduction is 25-55s. Thank you for the improvements. |
3533ba3 to
7761ec8
Compare
7761ec8 to
4c68d88
Compare
| # Track which indices we've already added detections for (to avoid duplicates | ||
| # in both Phase 1 and the fallback equal voting strategy) | ||
| alerted_indices = set() | ||
| last_alerted_index = -detection_index_tolerance - 1 |
There was a problem hiding this comment.
It looks like this may have been added after my last review (I don't remember seeing it before at least). Just a note for future PRs, please try to keep the scope limited to make the reviews quicker and less error-prone.
There was a problem hiding this comment.
Sure thing, I could still separate the new changes into a separate PR if you want.
| # Track which indices we've already added detections for (to avoid duplicates | ||
| # in both Phase 1 and the fallback equal voting strategy) | ||
| alerted_indices = set() | ||
| last_alerted_index = -detection_index_tolerance - 1 |
There was a problem hiding this comment.
Also, why is there a negative sign now for detection_index_tolerance? This seems convoluted.
There was a problem hiding this comment.
Basically the negative sign is just there so the very first data point doesn't get skipped cuz it sets the starting value far enough back that the proximity check never accidentally fires before any real alert has been created
It was reported here that adding the voting system caused an increase of 12% in python unit test execution time.
Upon investigation, it turned out that running 5 statistical detectors twice per signature (once per voting strategy) was the main bottleneck. Now, detection runs once and the result is shared. On top of that, the 5 detector objects were being reconstructed on every loop iteration across the series; pinning them to a module-level constant eliminates that overhead entirely. Both fixes compound across every test that touches alert generation.
Changes in
alerts.py:prepare_series_for_detectionto isolate the expensive DB fetch +detect_methods_changesstep from the voting logic so it could be reused across multiple strategy calls._trim_series_afterhelper to encapsulate the timestamp-based series filtering, replacing an inline list comprehension.build_cpd_methods()is called only once at import time instead of on every alert generation cycle, removing redundant instantiation of all detector objects acrossget_methods_detecting_at_index,create_alert, andprepare_series_for_detection.Changes in
tasks.pyare related to creating the series with detections running once at first and used subsequently for all voting systems in use.Changes in
test_methods_alerts.pyare related to reducing the calls ofbuild_cpd_methods@gmierz is there a way to measure whether there is indeed a reduction of python unit tests runtime given the fixes of this PR?