⚡ Bolt: Optimize sanitize_for_log performance#352
Conversation
💡 What: Added early string checks to sanitize_for_log before running expensive regex substitutions. 🎯 Why: sanitize_for_log is called for every log message. Most messages don't contain sensitive patterns. Regex execution is expensive. 📊 Impact: Reduces CPU time for simple log messages by ~57% (0.129s -> 0.055s for 50k calls). 🔬 Measurement: Verified with tests/test_sanitize_perf.py. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
😎 Merged manually by @abhimehro - details. |
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Summary of ChangesHello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing the performance of the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is a great optimization for sanitize_for_log. Using guard clauses to avoid expensive regex operations on simple strings is a solid performance improvement. The addition of a benchmark test is also a great initiative. I've left one suggestion on the benchmark test to make it more effective at preventing regressions automatically in a CI environment. Overall, this is a high-quality change.
There was a problem hiding this comment.
Pull request overview
Optimizes sanitize_for_log’s hot path in main.py by avoiding regex substitutions when the input can’t possibly contain Basic Auth or sensitive query params, aiming to reduce CPU overhead in common logging scenarios.
Changes:
- Add delimiter guard clauses before running
_BASIC_AUTH_PATTERN.suband_SENSITIVE_PARAM_PATTERN.subinsanitize_for_log. - Add a new
tests/test_sanitize_perf.pyintended to benchmark sanitization performance. - Document the optimization rationale in
.jules/bolt.md.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
main.py |
Adds cheap string checks to skip unnecessary regex work in sanitize_for_log. |
tests/test_sanitize_perf.py |
Introduces a sanitization “perf test” (currently time-based). |
.jules/bolt.md |
Records the “avoid regex on simple strings” optimization learning/action. |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
💡 What: 1. Implemented strict Content-Type validation in `_gh_get` to fix CI failure in `test_reject_xml`. 2. Updated existing tests (`test_cache_optimization.py`, `test_disk_cache.py`) to include valid Content-Type headers in mocks. 3. Retained the log sanitization performance optimization. 🎯 Why: - CI was failing because `test_reject_xml` expected a ValueError but none was raised (missing validation logic). - `unittest.mock` patching in CI flagged `TypeError` due to missing patch context, but the root cause was the missing implementation which I've now added. - Existing tests broke when validation was added because their mocks were incomplete. 📊 Impact: - Fixes CI pipeline. - Adds security validation against invalid content types (HTML/XML). - Maintains performance improvements for logging. 🔬 Measurement: - `python3 -m pytest tests/` passes all 100 tests. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
This PR introduces a performance optimization to the
sanitize_for_logfunction inmain.py. By adding guard clauses that check for the presence of specific delimiters (://for basic auth,?/&/#for query params) before executing expensive regex substitutions, we significantly reduce the CPU overhead for the vast majority of log messages that do not require sanitization.A benchmark test
tests/test_sanitize_perf.pyhas been added to verify the performance gain and prevent regression.Benchmark Results (50k iterations):
PR created automatically by Jules for task 12108142997497083847 started by @abhimehro