Skip to content

🛡️ Sentinel: [HIGH] Fix XSS by sanitizing go-readability output with bluemonday#135

Open
lucasew wants to merge 5 commits intomasterfrom
sentinel/fix-xss-bluemonday-1476203048334350304
Open

🛡️ Sentinel: [HIGH] Fix XSS by sanitizing go-readability output with bluemonday#135
lucasew wants to merge 5 commits intomasterfrom
sentinel/fix-xss-bluemonday-1476203048334350304

Conversation

@lucasew
Copy link
Owner

@lucasew lucasew commented Feb 7, 2026

Severity

High

Vulnerability

The application uses go-readability to parse and clean up web articles. However, go-readability is not a strict security sanitizer and may preserve dangerous HTML attributes (like onerror) or schemes (like javascript:). This allows a malicious or compromised website to serve content that, when processed by our API and rendered to a user, executes arbitrary JavaScript in the user's browser (Stored/Reflected XSS).

Impact

An attacker can craft a URL that, when viewed through this reader application, executes malicious scripts. This could lead to session hijacking, phishing, or other client-side attacks against users of the reader.

Fix

Implemented strict HTML sanitization using github.com/microcosm-cc/bluemonday with the UGCPolicy. The sanitization is applied immediately after the go-readability rendering step, ensuring that the contentBuf used for all output formats (HTML, JSON, Markdown) is free of XSS vectors.

Verification

Added api/xss_test.go which:

  1. Sets up a mock server returning HTML with <img onerror=...> and <a href="javascript:...">.
  2. Bypasses SSRF protection for the test environment.
  3. Asserts that the processed output does not contain onerror or javascript:.
  4. Run go test ./api to verify.

PR created automatically by Jules for task 1476203048334350304 started by @lucasew

- Adds github.com/microcosm-cc/bluemonday dependency
- Initializes global htmlSanitizer with UGCPolicy
- Sanitizes contentBuf in handler before formatting
- Adds api/xss_test.go to verify fix and prevent regression
- Updates .jules/sentinel.md with learning

Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@vercel
Copy link

vercel bot commented Feb 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
articleparser Ready Ready Preview, Comment Feb 7, 2026 9:38am

@gemini-code-assist
Copy link

Summary of Changes

Hello @lucasew, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a critical security enhancement by introducing robust HTML sanitization to mitigate a high-severity Cross-Site Scripting (XSS) vulnerability. By integrating the bluemonday library, the application now strictly cleanses all content processed by go-readability before rendering, effectively preventing malicious scripts from executing in users' browsers. This significantly improves the security posture of the application, protecting users from potential client-side attacks.

Highlights

  • Vulnerability Fix: Addresses a high-severity Cross-Site Scripting (XSS) vulnerability where go-readability output could contain dangerous HTML attributes (onerror) or schemes (javascript:), allowing arbitrary JavaScript execution.
  • Sanitization Implementation: Integrates github.com/microcosm-cc/bluemonday with its UGCPolicy to strictly sanitize HTML content immediately after go-readability processing.
  • Comprehensive Protection: Ensures that the contentBuf used for all output formats (HTML, JSON, Markdown) is free of XSS vectors, protecting users from client-side attacks.
  • New Test Coverage: Adds api/xss_test.go to specifically verify the XSS prevention by setting up a mock server with malicious content and asserting its removal from the processed output.
Changelog
  • .jules/sentinel.md
    • Added a new entry documenting the XSS fix and the use of bluemonday for HTML sanitization.
  • api/index.go
    • Imported the github.com/microcosm-cc/bluemonday package.
    • Initialized a global htmlSanitizer instance using bluemonday.UGCPolicy().
    • Applied the htmlSanitizer to the contentBuf after go-readability processing to remove XSS vectors.
  • api/xss_test.go
    • Added a new test file to verify XSS prevention.
    • Includes a mock HTTP server serving HTML with onerror and javascript: attributes.
    • Asserts that the processed output no longer contains these XSS vectors.
  • go.mod
    • Added github.com/microcosm-cc/bluemonday as a new dependency.
    • Updated codeberg.org/readeck/go-readability/v2 from v2.1.0 to v2.1.1.
    • Updated golang.org/x/net from v0.48.0 to v0.49.0.
  • go.sum
    • Updated checksums for new and updated dependencies.
Activity
  • The pull request was automatically generated by Jules for a task initiated by @lucasew.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively mitigates a high-severity XSS vulnerability by sanitizing HTML output using bluemonday. The approach is sound, and the inclusion of a dedicated test case is excellent for verifying the fix. My review includes a couple of suggestions for improvement: one to enhance the efficiency and readability of the sanitization logic, and another to address a potential race condition in the new test code to improve test suite robustness. Overall, this is a well-executed security enhancement.

api/index.go Outdated
Comment on lines +500 to +502
sanitized := htmlSanitizer.SanitizeBytes(contentBuf.Bytes())
contentBuf.Reset()
contentBuf.Write(sanitized)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This sequence of sanitizing, resetting, and writing to the buffer can be simplified and made more idiomatic. Using SanitizeReader reads from the original buffer and returns a new buffer containing the sanitized content. This is cleaner than modifying the buffer in place and avoids multiple separate operations.

Suggested change
sanitized := htmlSanitizer.SanitizeBytes(contentBuf.Bytes())
contentBuf.Reset()
contentBuf.Write(sanitized)
contentBuf = htmlSanitizer.SanitizeReader(contentBuf)

Comment on lines +13 to +17
originalClient := httpClient
httpClient = &http.Client{
Timeout: httpClientTimeout,
}
defer func() { httpClient = originalClient }()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Modifying the global httpClient variable here is not safe for parallel execution. If tests are run in parallel in the future (e.g., with t.Parallel()), this could cause data races. To make this test safe for parallelism, you should protect the modification of the global variable with a mutex. A longer-term solution would be to refactor the code to avoid a global HTTP client and instead use dependency injection.

- Adds github.com/microcosm-cc/bluemonday dependency
- Initializes global htmlSanitizer with UGCPolicy
- Sanitizes contentBuf in handler before formatting
- Adds api/xss_test.go to verify fix and prevent regression
- Updates .jules/sentinel.md with learning
- Fixes unused parameter lint error in api/xss_test.go

Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
- Adds github.com/microcosm-cc/bluemonday dependency
- Initializes global htmlSanitizer with UGCPolicy
- Sanitizes contentBuf in handler before formatting
- Adds api/xss_test.go to verify fix and prevent regression
- Updates .jules/sentinel.md with learning
- Fixes unused parameter lint error in api/xss_test.go
- Runs go mod tidy to update dependencies correctly

Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
- Adds github.com/microcosm-cc/bluemonday dependency
- Initializes global htmlSanitizer with UGCPolicy
- Sanitizes contentBuf in handler before formatting
- Adds api/xss_test.go to verify fix and prevent regression
- Updates .jules/sentinel.md with learning
- Fixes unused parameter lint error in api/xss_test.go
- Runs go mod tidy to update dependencies correctly

Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
- Adds github.com/microcosm-cc/bluemonday dependency
- Initializes global htmlSanitizer with UGCPolicy
- Sanitizes contentBuf in handler before formatting
- Adds api/xss_test.go to verify fix and prevent regression
- Updates .jules/sentinel.md with learning
- Fixes unused parameter lint error in api/xss_test.go
- Runs go mod tidy to update dependencies correctly
- Explicitly ignores errors in tests and handler to satisfy strict linters

Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant