Skip to content

🛡️ Sentinel: Enforce strict Content-Type validation#344

Merged
abhimehro merged 7 commits intomainfrom
sentinel/content-type-validation-2047347595139016233
Feb 19, 2026
Merged

🛡️ Sentinel: Enforce strict Content-Type validation#344
abhimehro merged 7 commits intomainfrom
sentinel/content-type-validation-2047347595139016233

Conversation

@abhimehro
Copy link
Owner

🛡️ Sentinel: Security Enhancement - Enforce Content-Type validation

Context:
The application fetches blocklists from remote URLs. While it parses the response as JSON, it previously did not validate the Content-Type header. This could allow the application to process unexpected content (e.g., HTML from a captive portal or error page) if it happened to be valid JSON, or lead to confusing errors.

Changes:

  • Modified _gh_get in main.py to enforce that the Content-Type header contains one of: application/json, text/json, or text/plain.
  • If the content type is invalid (e.g., text/html), the request is rejected with a clear ValueError, preventing further processing.
  • Updated existing tests in tests/test_cache_optimization.py and tests/test_disk_cache.py to include valid Content-Type headers in their mocks.
  • Added tests/test_content_type.py to verify the new validation logic.

Impact:

  • Prevents the application from processing non-JSON content types.
  • Improves error reporting for misconfigured URLs or network interception (captive portals).
  • Enhances defense-in-depth for input validation.

PR created automatically by Jules for task 2047347595139016233 started by @abhimehro

- Added Content-Type validation to `_gh_get` in `main.py`.
- Updated existing tests to comply with new validation.
- Added new test file `tests/test_content_type.py`.

Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 18, 2026 10:59
@google-labs-jules
Copy link

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@trunk-io
Copy link

trunk-io bot commented Feb 18, 2026

😎 Merged manually by @abhimehro - details.

@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

@gemini-code-assist
Copy link

Summary of Changes

Hello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial security enhancement by enforcing Content-Type validation when the application fetches remote blocklists. Previously, the system would parse any content as JSON if it was syntactically valid, even if the Content-Type header indicated something else. This change prevents the processing of potentially malicious or unintended content, such as HTML from a captive portal, thereby improving the application's robustness and security posture against unexpected network responses.

Highlights

  • Security Enhancement: Implemented strict Content-Type validation within the _gh_get function to prevent the application from processing unexpected content types when fetching blocklists.
  • Content-Type Enforcement: The _gh_get function now explicitly checks if the Content-Type header is application/json, text/json, or text/plain, raising a ValueError for any other types.
  • Test Updates: Existing tests in test_cache_optimization.py and test_disk_cache.py were updated to include valid Content-Type headers in their mock responses to align with the new validation.
  • New Test Suite: A new test file, test_content_type.py, was added to specifically verify the correct behavior of the Content-Type validation logic, including both allowed and rejected types.
Changelog
  • main.py
    • Added Content-Type validation logic in _gh_get for both initial requests and retries.
    • Introduced a ValueError if the Content-Type header does not match allowed types (application/json, text/json, text/plain).
  • tests/test_cache_optimization.py
    • Imported the httpx library.
    • Updated mock HTTP responses to explicitly include Content-Type: application/json in headers.
  • tests/test_content_type.py
    • Added a new test file to verify Content-Type validation.
    • Included tests to ensure application/json and text/plain are allowed.
    • Added tests to confirm text/html and application/xml are correctly rejected with a ValueError.
  • tests/test_disk_cache.py
    • Imported the httpx library.
    • Updated mock HTTP responses to explicitly include Content-Type: application/json in headers.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances security by enforcing strict Content-Type validation when fetching blocklist data from remote URLs, preventing the application from processing unexpected content types like HTML from captive portals or error pages.

Changes:

  • Added Content-Type header validation to the _gh_get function in both the normal response path and the 304 retry path
  • Updated existing tests to include valid Content-Type headers in their mocks
  • Added comprehensive test coverage for the new validation logic

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
main.py Added Content-Type validation logic (lines 941-948 and 1006-1014) to enforce that responses contain application/json, text/json, or text/plain
tests/test_content_type.py New test file verifying Content-Type validation accepts valid types (application/json, text/plain with charset) and rejects invalid types (text/html, application/xml)
tests/test_cache_optimization.py Updated mock response to include Content-Type: application/json header
tests/test_disk_cache.py Updated mock response to include Content-Type: application/json header

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable security enhancement by enforcing Content-Type validation when fetching remote blocklists. The implementation is correct and is accompanied by good test coverage, including a new dedicated test file. My review includes a couple of suggestions to improve maintainability by reducing code duplication in main.py and to enhance the readability of the new tests.

Comment on lines +1006 to +1014
# Security: Enforce Content-Type to be JSON or text
# This prevents processing of unexpected content (e.g., HTML from captive portals)
ct = r.headers.get("content-type", "").lower()
allowed_types = ("application/json", "text/json", "text/plain")
if not any(t in ct for t in allowed_types):
raise ValueError(
f"Invalid Content-Type from {sanitize_for_log(url)}: {ct}. "
f"Expected one of: {', '.join(allowed_types)}"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This Content-Type validation logic is nearly identical to the block on lines 941-948. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, consider extracting this logic into a helper function that can be called from both locations.

Comment on lines +67 to +80
try:
main._gh_get("https://example.com/malicious.html")
# If it doesn't raise, we fail the test (once fixed)
# But for TDD, we expect this to fail AFTER the fix.
# For now, let's assert that it *should* raise ValueError
except ValueError as e:
self.assertIn("Invalid Content-Type", str(e))
return

# If we are here, no exception was raised.
# This confirms the vulnerability (or lack of validation).
# We can mark this as "expected failure" or just print it.
# For now, I'll fail the test so I can see it pass later.
self.fail("Should have raised ValueError for text/html Content-Type")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Now that the validation logic is implemented, this test can be simplified. Using the assertRaises context manager, as you've done in test_reject_xml, would make this test more concise and readable. The comments related to TDD are also no longer necessary.

        with self.assertRaises(ValueError) as cm:
            main._gh_get("https://example.com/malicious.html")
        self.assertIn("Invalid Content-Type", str(cm.exception))

abhimehro and others added 2 commits February 18, 2026 13:10
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

1 similar comment
@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

@abhimehro abhimehro merged commit b6e9558 into main Feb 19, 2026
13 of 15 checks passed
@abhimehro abhimehro deleted the sentinel/content-type-validation-2047347595139016233 branch February 19, 2026 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants