⚡ Bolt: Pre-compile regex for validation functions#173
⚡ Bolt: Pre-compile regex for validation functions#173
Conversation
💡 What: Pre-compiled regex patterns for `is_valid_rule` and `is_valid_profile_id_format` at the module level. 🎯 Why: These functions are called repeatedly (thousands of times for rules), causing significant overhead from `re.match` recompiling or cache lookup. Pre-compiling saves CPU cycles. 📊 Impact: Reduces validation time by ~2.3x (benchmarked 0.0525s -> 0.0229s for 50k calls). 🔬 Measurement: Run a benchmark script comparing `re.match` vs compiled `pattern.match` in a loop. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Merging to
|
Summary of ChangesHello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the application's performance by optimizing regex-based validation. By pre-compiling regular expression patterns for critical functions like Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| **Learning:** Re-validating resource properties (like DNS/IP) when using *cached content* is pure overhead. If the content is served from memory (proven safe at fetch time), checking the *current* state of the source is disconnected from the data being used. | ||
| **Action:** When using a multi-stage pipeline (Warmup -> Process), ensure validation state persists alongside the data cache. Avoid clearing validation caches between stages if the data cache is not also cleared. | ||
|
|
||
| ## 2025-02-24 - [Regex Compilation for Repeated Validation] |
Check notice
Code scanning / Remark-lint (reported by Codacy)
Warn when references to undefined definitions are found. Note
| **Learning:** Re-validating resource properties (like DNS/IP) when using *cached content* is pure overhead. If the content is served from memory (proven safe at fetch time), checking the *current* state of the source is disconnected from the data being used. | ||
| **Action:** When using a multi-stage pipeline (Warmup -> Process), ensure validation state persists alongside the data cache. Avoid clearing validation caches between stages if the data cache is not also cleared. | ||
|
|
||
| ## 2025-02-24 - [Regex Compilation for Repeated Validation] |
Check notice
Code scanning / Remark-lint (reported by Codacy)
Warn when shortcut reference links are used. Note
There was a problem hiding this comment.
Pull request overview
This PR improves performance in main.py by pre-compiling regexes used by validation helpers, and updates repo metadata/docs to reflect the optimization.
Changes:
- Pre-compile regex patterns for
is_valid_profile_id_format()andis_valid_rule()and switch validation to use the compiled patterns. - Minor
.python-versionformatting normalization. - Document the regex compilation learning in
.jules/bolt.md.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
main.py |
Introduces module-level compiled regex patterns and updates validation helpers to use them. |
.python-version |
Normalizes the Python version file formatting. |
.jules/bolt.md |
Adds a journal entry documenting the regex compilation performance win. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Compiled regex for performance | ||
| PROFILE_ID_PATTERN = re.compile(r"^[a-zA-Z0-9_-]+$") |
There was a problem hiding this comment.
PROFILE_ID_PATTERN is introduced mid-file even though main.py already has a dedicated "1. Constants" section near the top (e.g., API_BASE, USER_AGENT). Consider moving this compiled regex (and the rule pattern below) into that constants block so module-level configuration stays centralized and easier to discover.
| def is_valid_profile_id_format(profile_id: str) -> bool: | ||
| if not re.match(r"^[a-zA-Z0-9_-]+$", profile_id): | ||
| if not PROFILE_ID_PATTERN.match(profile_id): | ||
| return False | ||
| if len(profile_id) > 64: | ||
| return False |
There was a problem hiding this comment.
validate_profile_id() still uses an inline re.match(r"^[a-zA-Z0-9_-]+$", profile_id) when logging errors, duplicating the regex now captured by PROFILE_ID_PATTERN. Reuse PROFILE_ID_PATTERN there as well so the pattern stays consistent and only needs to be updated in one place.
| ## 2026-01-27 - Redundant Validation for Cached Data | ||
| **Learning:** Re-validating resource properties (like DNS/IP) when using *cached content* is pure overhead. If the content is served from memory (proven safe at fetch time), checking the *current* state of the source is disconnected from the data being used. | ||
| **Action:** When using a multi-stage pipeline (Warmup -> Process), ensure validation state persists alongside the data cache. Avoid clearing validation caches between stages if the data cache is not also cleared. | ||
|
|
||
| ## 2025-02-24 - [Regex Compilation for Repeated Validation] | ||
| **Learning:** Pre-compiling regexes for functions called in tight loops (like `is_valid_rule` which runs on 10k+ items) yields a >2x performance improvement (0.0525s -> 0.0229s). | ||
| **Action:** Always pre-compile regexes used in validation loops. |
There was a problem hiding this comment.
This new journal entry is dated 2025-02-24 but is appended after a 2026-01-27 entry, which makes the journal timeline non-chronological. Either reorder the sections by date or adjust the date so entries remain in consistent chronological order.
| ## 2026-01-27 - Redundant Validation for Cached Data | |
| **Learning:** Re-validating resource properties (like DNS/IP) when using *cached content* is pure overhead. If the content is served from memory (proven safe at fetch time), checking the *current* state of the source is disconnected from the data being used. | |
| **Action:** When using a multi-stage pipeline (Warmup -> Process), ensure validation state persists alongside the data cache. Avoid clearing validation caches between stages if the data cache is not also cleared. | |
| ## 2025-02-24 - [Regex Compilation for Repeated Validation] | |
| **Learning:** Pre-compiling regexes for functions called in tight loops (like `is_valid_rule` which runs on 10k+ items) yields a >2x performance improvement (0.0525s -> 0.0229s). | |
| **Action:** Always pre-compile regexes used in validation loops. | |
| ## 2025-02-24 - [Regex Compilation for Repeated Validation] | |
| **Learning:** Pre-compiling regexes for functions called in tight loops (like `is_valid_rule` which runs on 10k+ items) yields a >2x performance improvement (0.0525s -> 0.0229s). | |
| **Action:** Always pre-compile regexes used in validation loops. | |
| ## 2026-01-27 - Redundant Validation for Cached Data | |
| **Learning:** Re-validating resource properties (like DNS/IP) when using *cached content* is pure overhead. If the content is served from memory (proven safe at fetch time), checking the *current* state of the source is disconnected from the data being used. | |
| **Action:** When using a multi-stage pipeline (Warmup -> Process), ensure validation state persists alongside the data cache. Avoid clearing validation caches between stages if the data cache is not also cleared. |
There was a problem hiding this comment.
Code Review
This pull request improves performance by pre-compiling regular expressions used in validation functions, which is a great optimization, especially for functions called in tight loops. The security review confirms that these changes do not introduce any new security vulnerabilities. A suggestion has been made to further improve the design around is_valid_profile_id_format by changing its return signature to eliminate redundant logic and duplicate checks in the calling function validate_profile_id, making the code more efficient and maintainable.
…itization, add dry-run plan details Incorporates the best changes from 36 Jules PRs, addressing review feedback: Bolt (Performance) - from PR #173: - Pre-compile PROFILE_ID_PATTERN and RULE_PATTERN at module level - Use compiled patterns in is_valid_profile_id_format, validate_profile_id, and is_valid_rule - Supersedes PRs: #140, #143, #152, #155, #158, #161, #167, #170, #173 Sentinel (Security) - from PR #172 with review feedback: - Enhance sanitize_for_log to redact Basic Auth credentials in URLs - Redact sensitive query parameters (token, key, secret, password, etc.) - Handle fragment separators (#) per Gemini Code Assist review - Use [^&#\s]* pattern per Copilot reviewer suggestion - Update docstring per reviewer suggestion - Supersedes PRs: #142, #145, #148, #151, #154, #157, #160, #169, #172 Palette (UX) - from PR #174 with lint fixes: - Add print_plan_details function for dry-run visibility - Fix duplicate render_progress_bar definition bug - Supersedes PRs: #139, #141, #144, #147, #150, #153, #156, #159, #162, #165, #168, #171, #174 Also: #146, #149, #164 (parallel folder deletion) and #166 (auto-fix .env perms) are independent features not consolidated here. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
…itization, add dry-run plan details Incorporates the best changes from 36 Jules PRs, addressing review feedback: Bolt (Performance) - from PR #173: - Pre-compile PROFILE_ID_PATTERN and RULE_PATTERN at module level - Use compiled patterns in is_valid_profile_id_format, validate_profile_id, and is_valid_rule - Supersedes PRs: #140, #143, #152, #155, #158, #161, #167, #170, #173 Sentinel (Security) - from PR #172 with review feedback: - Enhance sanitize_for_log to redact Basic Auth credentials in URLs - Redact sensitive query parameters (token, key, secret, password, etc.) - Handle fragment separators (#) per Gemini Code Assist review - Use [^&#\s]* pattern per Copilot reviewer suggestion - Update docstring per reviewer suggestion - Supersedes PRs: #142, #145, #148, #151, #154, #157, #160, #169, #172 Palette (UX) - from PR #174 with lint fixes: - Add print_plan_details function for dry-run visibility - Fix duplicate render_progress_bar definition bug - Supersedes PRs: #139, #141, #144, #147, #150, #153, #156, #159, #162, #165, #168, #171, #174 Also: #146, #149, #164 (parallel folder deletion) and #166 (auto-fix .env perms) are independent features not consolidated here. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Implemented pre-compiled regex for
is_valid_ruleandis_valid_profile_id_formatto improve performance. Verified with benchmarks showing >2x speedup.PR created automatically by Jules for task 11449181145803260853 started by @abhimehro