⚡ Bolt: Optimized telemetry state caching and secret redaction#169
⚡ Bolt: Optimized telemetry state caching and secret redaction#169heidi-dang wants to merge 1 commit intofeat/bootstrap-scaffoldfrom
Conversation
- Fixed NameError in get_state by removing redundant/broken cache check. - Implemented write-through caching in save_state to eliminate disk I/O on subsequent reads. - Pre-compiled SECRET_PATTERNS to reduce regex overhead in redact_secrets. - Added thread-safe mtime-based caching for load_pricing_config with 5s negative cache for missing files. - Ensured cache isolation using deepcopy for all pricing configuration returns. Performance Impact: - redact_secrets (with secrets): ~34% improvement (0.24s -> 0.16s for 10k calls). - get_state (cache hit): ~28ms for 10k calls (vs disk I/O). - load_pricing_config: significant reduction in syscalls and JSON parsing.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the performance and efficiency of the telemetry module by introducing several critical optimizations. It addresses potential Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces several valuable performance optimizations to the telemetry module, such as pre-compiling regular expressions for secret redaction and implementing caching for pricing configurations and run state. The changes are well-reasoned and are accompanied by a new benchmark script to validate the improvements. My review focuses on a couple of areas where the implementation can be made more robust and Pythonic. Specifically, I've suggested a refinement to the caching logic in load_pricing_config and an improvement to the timer used in the new benchmark script for more accurate measurements.
| try: | ||
| with open(pricing_file) as f: | ||
| custom = json.load(f) | ||
| pricing.update(custom) | ||
| if pricing_file.exists(): | ||
| mtime = pricing_file.stat().st_mtime | ||
| if abs_path in _pricing_cache and _pricing_mtimes.get(abs_path) == mtime: | ||
| return copy.deepcopy(_pricing_cache[abs_path]) | ||
|
|
||
| with open(pricing_file) as f: | ||
| pricing = DEFAULT_PRICING.copy() | ||
| custom = json.load(f) | ||
| pricing.update(custom) | ||
| _pricing_cache[abs_path] = pricing | ||
| _pricing_mtimes[abs_path] = mtime | ||
| return copy.deepcopy(pricing) | ||
| else: | ||
| _pricing_mtimes[abs_path] = -1.0 | ||
| _pricing_check_ts[abs_path] = now | ||
| except Exception as e: | ||
| print(f"[WARN] Failed to load pricing config: {e}", file=sys.stderr) |
There was a problem hiding this comment.
The current implementation for loading the pricing configuration uses a 'Look Before You Leap' (LBYL) approach with pricing_file.exists(). This can be simplified and made more robust by adopting an 'Easier to Ask for Forgiveness than Permission' (EAFP) approach using a try...except FileNotFoundError block. This change would improve readability, be more Pythonic, and avoid a potential race condition between checking for the file's existence and opening it, while also being slightly more performant by reducing a syscall.
| try: | |
| with open(pricing_file) as f: | |
| custom = json.load(f) | |
| pricing.update(custom) | |
| if pricing_file.exists(): | |
| mtime = pricing_file.stat().st_mtime | |
| if abs_path in _pricing_cache and _pricing_mtimes.get(abs_path) == mtime: | |
| return copy.deepcopy(_pricing_cache[abs_path]) | |
| with open(pricing_file) as f: | |
| pricing = DEFAULT_PRICING.copy() | |
| custom = json.load(f) | |
| pricing.update(custom) | |
| _pricing_cache[abs_path] = pricing | |
| _pricing_mtimes[abs_path] = mtime | |
| return copy.deepcopy(pricing) | |
| else: | |
| _pricing_mtimes[abs_path] = -1.0 | |
| _pricing_check_ts[abs_path] = now | |
| except Exception as e: | |
| print(f"[WARN] Failed to load pricing config: {e}", file=sys.stderr) | |
| try: | |
| mtime = pricing_file.stat().st_mtime | |
| if abs_path in _pricing_cache and _pricing_mtimes.get(abs_path) == mtime: | |
| return copy.deepcopy(_pricing_cache[abs_path]) | |
| with open(pricing_file) as f: | |
| pricing = DEFAULT_PRICING.copy() | |
| custom = json.load(f) | |
| pricing.update(custom) | |
| _pricing_cache[abs_path] = pricing | |
| _pricing_mtimes[abs_path] = mtime | |
| return copy.deepcopy(pricing) | |
| except FileNotFoundError: | |
| _pricing_mtimes[abs_path] = -1.0 | |
| _pricing_check_ts[abs_path] = now | |
| except Exception as e: | |
| print(f"[WARN] Failed to load pricing config: {e}", file=sys.stderr) |
| start = time.time() | ||
| for _ in range(10000): | ||
| redact_secrets(text_no_secrets) | ||
| end = time.time() |
There was a problem hiding this comment.
For more accurate performance measurement, it's better to use time.perf_counter() instead of time.time(). time.perf_counter() provides a high-resolution monotonic clock that is not affected by system time changes, making it ideal for benchmarking short-duration operations. This change should be applied to all benchmarks in this file.
| start = time.time() | |
| for _ in range(10000): | |
| redact_secrets(text_no_secrets) | |
| end = time.time() | |
| start = time.perf_counter() | |
| for _ in range(10000): | |
| redact_secrets(text_no_secrets) | |
| end = time.perf_counter() |
Optimized the telemetry module in
heidi_engine/telemetry.pyby fixing aNameErrorand implementing several performance-critical caching and regex optimizations. Specifically, I added a write-through cache for the run state, pre-compiled regex patterns for secret redaction, and implemented mtime-based caching for the pricing configuration. These changes measurably reduce disk I/O and CPU overhead in the telemetry hot-paths.PR created automatically by Jules for task 7177115978715405411 started by @heidi-dang