🛡️ Sentinel: [CRITICAL] Fix sensitive data exposure in logs#172
🛡️ Sentinel: [CRITICAL] Fix sensitive data exposure in logs#172
Conversation
🚨 Severity: CRITICAL 💡 Vulnerability: Credentials in URLs (Basic Auth and sensitive query parameters) were logged to stdout/stderr in plain text when using `sanitize_for_log`. 🎯 Impact: Attackers with access to logs could steal credentials or API tokens. 🔧 Fix: Updated `sanitize_for_log` in `main.py` to redact Basic Auth credentials and sensitive query parameters (token, key, secret, password, auth, access_token, api_key). ✅ Verification: Added new test case in `tests/test_log_sanitization.py`. Verified with reproduction script `repro_log_leak.py`. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Merging to
|
Summary of ChangesHello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a crucial security enhancement by preventing the exposure of sensitive credentials and API tokens in application logs. It achieves this by significantly improving the log sanitization mechanism, ensuring that confidential information within URLs is automatically redacted, thereby mitigating a critical data leakage risk. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| s = s.replace(TOKEN, "[REDACTED]") | ||
|
|
||
| # Redact Basic Auth in URLs (e.g. https://user:pass@host) | ||
| s = re.sub(r"://[^/@]+@", "://[REDACTED]@", s) |
Check warning
Code scanning / Pylint (reported by Codacy)
Variable name "s" doesn't conform to snake_case naming style Warning
|
|
||
| # Redact sensitive query parameters | ||
| sensitive_keys = r"token|key|secret|password|auth|access_token|api_key" | ||
| s = re.sub( |
Check warning
Code scanning / Pylint (reported by Codacy)
Variable name "s" doesn't conform to snake_case naming style Warning
| s = s.replace(TOKEN, "[REDACTED]") | ||
|
|
||
| # Redact Basic Auth in URLs (e.g. https://user:pass@host) | ||
| s = re.sub(r"://[^/@]+@", "://[REDACTED]@", s) |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Variable name "s" doesn't conform to snake_case naming style Warning
|
|
||
| # Redact sensitive query parameters | ||
| sensitive_keys = r"token|key|secret|password|auth|access_token|api_key" | ||
| s = re.sub( |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Variable name "s" doesn't conform to snake_case naming style Warning
There was a problem hiding this comment.
Code Review
This pull request addresses a critical security vulnerability by redacting sensitive credentials from logs. However, the implementation is incomplete as it misses sensitive data in URL fragments (e.g., #access_token=...) and fails to apply sanitize_for_log to exception objects (e) in several logging statements throughout main.py (e.g., lines 537, 632, 670). This means credentials could still leak in error logs, which the PR intended to prevent. Additionally, consider improving query parameter redaction to handle empty values and expanding the test suite to cover more edge cases.
| s = re.sub( | ||
| r"([?&])(" + sensitive_keys + r")=[^&\s]+", | ||
| r"\1\2=[REDACTED]", | ||
| s, | ||
| flags=re.IGNORECASE, | ||
| ) |
There was a problem hiding this comment.
The regex for redacting sensitive query parameters should also include the fragment separator #. Sensitive data like access_token is frequently passed in the URL fragment (after the # symbol) and will not be redacted by the current implementation. Additionally, the current regex [^&\s]+ doesn't handle cases where the value is empty (e.g., ?token=&...), which could lead to incomplete redaction. Consider changing + to * to correctly redact such cases.
| s = re.sub( | |
| r"([?&])(" + sensitive_keys + r")=[^&\s]+", | |
| r"\1\2=[REDACTED]", | |
| s, | |
| flags=re.IGNORECASE, | |
| ) | |
| s = re.sub( | |
| r"([?&#])(" + sensitive_keys + r")=[^&\s]+", | |
| r"\1\2=[REDACTED]", | |
| s, | |
| flags=re.IGNORECASE, | |
| ) |
| url_with_token = "https://example.com/folder.json?TOKEN=mytoken" | ||
| sanitized_token = main.sanitize_for_log(url_with_token) | ||
| self.assertNotIn("mytoken", sanitized_token) | ||
| self.assertIn("[REDACTED]", sanitized_token) |
There was a problem hiding this comment.
The current tests are a great start! To make them more robust, I'd suggest a few things:
- Make existing assertions more specific. For example, instead of
self.assertIn("[REDACTED]"), you could check for the full redacted key-value pair likeself.assertIn("secret=[REDACTED]"). - Add test cases for more scenarios, such as empty sensitive values and multiple sensitive parameters in one URL.
Here's a suggestion to add a couple more test cases:
| self.assertIn("[REDACTED]", sanitized_token) | |
| self.assertIn("[REDACTED]", sanitized_token) | |
| # Test multiple sensitive params and mixed params | |
| url_multi = "https://example.com/api?id=123&token=abc&name=user&api_key=def" | |
| sanitized_multi = main.sanitize_for_log(url_multi) | |
| self.assertIn("id=123", sanitized_multi) | |
| self.assertIn("name=user", sanitized_multi) | |
| self.assertNotIn("token=abc", sanitized_multi) | |
| self.assertNotIn("api_key=def", sanitized_multi) | |
| self.assertIn("token=[REDACTED]", sanitized_multi) | |
| self.assertIn("api_key=[REDACTED]", sanitized_multi) | |
| # Test empty sensitive param | |
| url_empty = "https://example.com/api?token=&id=123" | |
| sanitized_empty = main.sanitize_for_log(url_empty) | |
| self.assertIn("token=[REDACTED]", sanitized_empty) | |
| self.assertIn("id=123", sanitized_empty) |
There was a problem hiding this comment.
Pull request overview
This PR strengthens log sanitization to prevent leaking credentials embedded in URLs, aligning with the project’s existing focus on safe terminal/log output.
Changes:
- Extend
sanitize_for_logto redact Basic Auth credentials in URLs and redact common sensitive query parameters. - Add a unit test covering Basic Auth and sensitive query parameter redaction.
- Add a
.jules/sentinel.mdjournal stub for documenting critical security learnings.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
main.py |
Expands sanitize_for_log to redact URL credentials and sensitive query params. |
tests/test_log_sanitization.py |
Adds coverage for URL credential/query-param redaction behavior. |
.jules/sentinel.md |
Introduces a security journal template for future audit notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sensitive_keys = r"token|key|secret|password|auth|access_token|api_key" | ||
| s = re.sub( | ||
| r"([?&])(" + sensitive_keys + r")=[^&\s]+", | ||
| r"\1\2=[REDACTED]", | ||
| s, | ||
| flags=re.IGNORECASE, | ||
| ) |
There was a problem hiding this comment.
The query-parameter redaction pattern ...=[^&\s]+ will also consume trailing punctuation when a URL is followed by characters like ), ., ,, etc. in the surrounding log message (because those chars are not & or whitespace). That can unintentionally remove non-secret context from logs. Consider parsing URLs with urllib.parse and rewriting query values, or tightening the regex so it stops at common URL terminators (e.g., # and trailing punctuation) while preserving surrounding text.
| # Redact Basic Auth in URLs (e.g. https://user:pass@host) | ||
| s = re.sub(r"://[^/@]+@", "://[REDACTED]@", s) | ||
|
|
||
| # Redact sensitive query parameters | ||
| sensitive_keys = r"token|key|secret|password|auth|access_token|api_key" | ||
| s = re.sub( | ||
| r"([?&])(" + sensitive_keys + r")=[^&\s]+", | ||
| r"\1\2=[REDACTED]", | ||
| s, | ||
| flags=re.IGNORECASE, | ||
| ) |
There was a problem hiding this comment.
PR description mentions verification with a repro_log_leak.py reproduction script, but that file isn’t included in this PR (and doesn’t appear to exist in the repository root). Either add the script to the PR (if it’s intended to be versioned) or update the PR description to avoid referring to a non-existent artifact.
| def sanitize_for_log(text: Any) -> str: | ||
| """Sanitize text for logging, ensuring TOKEN is redacted and control chars are escaped.""" | ||
| s = str(text) | ||
| if TOKEN and TOKEN in s: | ||
| s = s.replace(TOKEN, "[REDACTED]") | ||
|
|
||
| # Redact Basic Auth in URLs (e.g. https://user:pass@host) | ||
| s = re.sub(r"://[^/@]+@", "://[REDACTED]@", s) |
There was a problem hiding this comment.
sanitize_for_log’s docstring no longer reflects its behavior: it says it only redacts TOKEN/control chars, but the function now also redacts Basic Auth credentials and sensitive query parameters. Please update the docstring to describe the full set of redactions so callers know what guarantees it provides.
…itization, add dry-run plan details Incorporates the best changes from 36 Jules PRs, addressing review feedback: Bolt (Performance) - from PR #173: - Pre-compile PROFILE_ID_PATTERN and RULE_PATTERN at module level - Use compiled patterns in is_valid_profile_id_format, validate_profile_id, and is_valid_rule - Supersedes PRs: #140, #143, #152, #155, #158, #161, #167, #170, #173 Sentinel (Security) - from PR #172 with review feedback: - Enhance sanitize_for_log to redact Basic Auth credentials in URLs - Redact sensitive query parameters (token, key, secret, password, etc.) - Handle fragment separators (#) per Gemini Code Assist review - Use [^&#\s]* pattern per Copilot reviewer suggestion - Update docstring per reviewer suggestion - Supersedes PRs: #142, #145, #148, #151, #154, #157, #160, #169, #172 Palette (UX) - from PR #174 with lint fixes: - Add print_plan_details function for dry-run visibility - Fix duplicate render_progress_bar definition bug - Supersedes PRs: #139, #141, #144, #147, #150, #153, #156, #159, #162, #165, #168, #171, #174 Also: #146, #149, #164 (parallel folder deletion) and #166 (auto-fix .env perms) are independent features not consolidated here. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
…itization, add dry-run plan details Incorporates the best changes from 36 Jules PRs, addressing review feedback: Bolt (Performance) - from PR #173: - Pre-compile PROFILE_ID_PATTERN and RULE_PATTERN at module level - Use compiled patterns in is_valid_profile_id_format, validate_profile_id, and is_valid_rule - Supersedes PRs: #140, #143, #152, #155, #158, #161, #167, #170, #173 Sentinel (Security) - from PR #172 with review feedback: - Enhance sanitize_for_log to redact Basic Auth credentials in URLs - Redact sensitive query parameters (token, key, secret, password, etc.) - Handle fragment separators (#) per Gemini Code Assist review - Use [^&#\s]* pattern per Copilot reviewer suggestion - Update docstring per reviewer suggestion - Supersedes PRs: #142, #145, #148, #151, #154, #157, #160, #169, #172 Palette (UX) - from PR #174 with lint fixes: - Add print_plan_details function for dry-run visibility - Fix duplicate render_progress_bar definition bug - Supersedes PRs: #139, #141, #144, #147, #150, #153, #156, #159, #162, #165, #168, #171, #174 Also: #146, #149, #164 (parallel folder deletion) and #166 (auto-fix .env perms) are independent features not consolidated here. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
🚨 Severity: CRITICAL
💡 Vulnerability: Credentials in URLs (Basic Auth and sensitive query parameters) were logged to stdout/stderr in plain text when using
sanitize_for_log.🎯 Impact: Attackers with access to logs could steal credentials or API tokens.
🔧 Fix: Updated
sanitize_for_loginmain.pyto redact Basic Auth credentials and sensitive query parameters (token, key, secret, password, auth, access_token, api_key).✅ Verification: Added new test case in
tests/test_log_sanitization.py. Verified with reproduction scriptrepro_log_leak.py.PR created automatically by Jules for task 14313453863725986533 started by @abhimehro