🛡️ Sentinel: Redact credentials from URLs in logs#160
Conversation
Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Merging to
|
Summary of ChangesHello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical security flaw where sensitive basic authentication credentials embedded in URLs were inadvertently logged in plain text. By introducing a regular expression-based redaction mechanism within the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
|
||
| # Redact credentials in URLs (e.g. https://user:pass@host) | ||
| # Pattern: scheme://user:pass@host -> scheme://[REDACTED]@host | ||
| s = re.sub(r"(https?://)[^/\s@]+@([^/\s]+)", r"\1[REDACTED]@\2", s) |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Variable name "s" doesn't conform to snake_case naming style Warning
|
|
||
| # Redact credentials in URLs (e.g. https://user:pass@host) | ||
| # Pattern: scheme://user:pass@host -> scheme://[REDACTED]@host | ||
| s = re.sub(r"(https?://)[^/\s@]+@([^/\s]+)", r"\1[REDACTED]@\2", s) |
Check warning
Code scanning / Pylint (reported by Codacy)
Variable name "s" doesn't conform to snake_case naming style Warning
There was a problem hiding this comment.
Code Review
The pull request successfully addresses a critical security vulnerability by implementing credential redaction from URLs in logs. The addition of the .jules/sentinel.md file is a valuable step for documenting security findings and prevention strategies. The core change in main.py introduces a regular expression to sanitize URLs, which is a functional fix for the identified issue. My review focuses on enhancing the robustness and maintainability of the URL sanitization logic.
| # Redact credentials in URLs (e.g. https://user:pass@host) | ||
| # Pattern: scheme://user:pass@host -> scheme://[REDACTED]@host | ||
| s = re.sub(r"(https?://)[^/\s@]+@([^/\s]+)", r"\1[REDACTED]@\2", s) |
There was a problem hiding this comment.
While the current regular expression effectively redacts credentials in many common URL formats, using urllib.parse (which is already imported) and urlunparse would provide a more robust and maintainable solution for handling URL components. urllib.parse is specifically designed for URL parsing and can better handle edge cases, such as complex netloc structures or future changes in URL standards, without the potential fragility of regex-based parsing. This approach would enhance the reliability of credential redaction.
There was a problem hiding this comment.
Pull request overview
This PR addresses a security vulnerability where URLs containing Basic Authentication credentials (e.g., https://user:password@example.com) were being logged in plain text by the sanitize_for_log function.
Changes:
- Updated
sanitize_for_loginmain.pyto use regex pattern matching to identify and redact credentials from HTTP/HTTPS URLs - Added documentation in
.jules/sentinel.mdto record the vulnerability and prevention strategy
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| main.py | Added regex substitution to redact credentials from URLs before logging |
| .jules/sentinel.md | Documented the URL credential leakage vulnerability and prevention approach |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ## 2025-05-23 - URL Credential Leakage in Logs | ||
| **Vulnerability:** `sanitize_for_log` only redacted the API token but allowed URLs containing Basic Auth credentials (e.g. `https://user:pass@host`) to be logged in plain text. | ||
| **Learning:** Sanitization functions often focus on known secrets (like specific tokens) but miss pattern-based leaks like standard URI credentials. | ||
| **Prevention:** Always scrub user:password combinations from any URL before logging. Use regex or URL parsing libraries to identifying and redact the authority section. |
There was a problem hiding this comment.
Typo in the documentation: "identifying" should be "identify" to maintain parallel structure with "Use regex or URL parsing libraries to identify..."
| **Prevention:** Always scrub user:password combinations from any URL before logging. Use regex or URL parsing libraries to identifying and redact the authority section. | |
| **Prevention:** Always scrub user:password combinations from any URL before logging. Use regex or URL parsing libraries to identify and redact the authority section. |
| # Redact credentials in URLs (e.g. https://user:pass@host) | ||
| # Pattern: scheme://user:pass@host -> scheme://[REDACTED]@host | ||
| s = re.sub(r"(https?://)[^/\s@]+@([^/\s]+)", r"\1[REDACTED]@\2", s) |
There was a problem hiding this comment.
The new URL credential redaction functionality lacks test coverage. Consider adding a test to verify that URLs containing credentials (e.g., 'https://user:pass@host.com') are properly redacted to 'https://[REDACTED]@host.com' while URLs without credentials remain unchanged. This would follow the pattern established in tests/test_log_sanitization.py and tests/test_security.py.
…itization, add dry-run plan details Incorporates the best changes from 36 Jules PRs, addressing review feedback: Bolt (Performance) - from PR #173: - Pre-compile PROFILE_ID_PATTERN and RULE_PATTERN at module level - Use compiled patterns in is_valid_profile_id_format, validate_profile_id, and is_valid_rule - Supersedes PRs: #140, #143, #152, #155, #158, #161, #167, #170, #173 Sentinel (Security) - from PR #172 with review feedback: - Enhance sanitize_for_log to redact Basic Auth credentials in URLs - Redact sensitive query parameters (token, key, secret, password, etc.) - Handle fragment separators (#) per Gemini Code Assist review - Use [^&#\s]* pattern per Copilot reviewer suggestion - Update docstring per reviewer suggestion - Supersedes PRs: #142, #145, #148, #151, #154, #157, #160, #169, #172 Palette (UX) - from PR #174 with lint fixes: - Add print_plan_details function for dry-run visibility - Fix duplicate render_progress_bar definition bug - Supersedes PRs: #139, #141, #144, #147, #150, #153, #156, #159, #162, #165, #168, #171, #174 Also: #146, #149, #164 (parallel folder deletion) and #166 (auto-fix .env perms) are independent features not consolidated here. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
…itization, add dry-run plan details Incorporates the best changes from 36 Jules PRs, addressing review feedback: Bolt (Performance) - from PR #173: - Pre-compile PROFILE_ID_PATTERN and RULE_PATTERN at module level - Use compiled patterns in is_valid_profile_id_format, validate_profile_id, and is_valid_rule - Supersedes PRs: #140, #143, #152, #155, #158, #161, #167, #170, #173 Sentinel (Security) - from PR #172 with review feedback: - Enhance sanitize_for_log to redact Basic Auth credentials in URLs - Redact sensitive query parameters (token, key, secret, password, etc.) - Handle fragment separators (#) per Gemini Code Assist review - Use [^&#\s]* pattern per Copilot reviewer suggestion - Update docstring per reviewer suggestion - Supersedes PRs: #142, #145, #148, #151, #154, #157, #160, #169, #172 Palette (UX) - from PR #174 with lint fixes: - Add print_plan_details function for dry-run visibility - Fix duplicate render_progress_bar definition bug - Supersedes PRs: #139, #141, #144, #147, #150, #153, #156, #159, #162, #165, #168, #171, #174 Also: #146, #149, #164 (parallel folder deletion) and #166 (auto-fix .env perms) are independent features not consolidated here. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Vulnerability: URLs containing Basic Auth credentials (e.g.,
https://user:password@example.com) were logged in plain text bysanitize_for_log, exposing sensitive information if a user provided such a URL (e.g. viaFOLDER_URLS).Fix: Updated
sanitize_for_loginmain.pyto use a regular expression that identifies and replaces the credential part of HTTP/HTTPS URLs with[REDACTED].Verification:
https://user:password@hostis logged ashttps://[REDACTED]@host.uv run pytest) to ensure no regressions.remodule is imported (it was already).PR created automatically by Jules for task 3352341414052035153 started by @abhimehro