Skip to content

fix(xss): strip all C0 control chars from URI schemes per WHATWG spec#7280

Open
mendarb wants to merge 2 commits intoprojectdiscovery:devfrom
mendarb:fix/xss-context-analyzer-v2
Open

fix(xss): strip all C0 control chars from URI schemes per WHATWG spec#7280
mendarb wants to merge 2 commits intoprojectdiscovery:devfrom
mendarb:fix/xss-context-analyzer-v2

Conversation

@mendarb
Copy link

@mendarb mendarb commented Mar 20, 2026

Summary

  • Fixes a bypass in the XSS context analyzer's dangerous URI scheme detection where only tab (0x09), LF (0x0A), and CR (0x0D) were stripped from scheme prefixes before comparison
  • Per the WHATWG URL spec, browsers silently strip all C0 control characters (0x00-0x1F) and DEL (0x7F) from URI schemes, meaning attackers could bypass detection with payloads like java\x00script:, java\x0Cscript:, \x07javascript:, etc.
  • Adds stripC0Controls() using strings.Map to remove the full C0 range (0x00-0x1F) plus DEL (0x7F) before scheme prefix matching
  • Adds 9 new test cases covering NULL byte, form feed, vertical tab, bell, backspace, DEL, multiple mixed C0 chars, vbscript with C0, and data: URI with C0

Context

This is a v2 of #7279 which was closed because the Neo security bot correctly identified that the original fix only handled tab/LF/CR and missed the broader C0 control character range.

Fixes #7086

Test plan

  • All 69 existing + new tests pass (go test ./pkg/fuzz/analyzers/xss/...)
  • New tests specifically cover: NULL (0x00), vertical tab (0x0B), form feed (0x0C), bell (0x07), backspace (0x08), DEL (0x7F), multiple mixed C0 chars, vbscript with C0, data:text/html with C0
  • Verify no regressions in broader test suite

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved XSS detection for URL-like attributes by normalizing and filtering control characters so dangerous URI schemes are consistently recognized.
  • Tests
    • Expanded test coverage with cases asserting correct classification when control characters are embedded in URI-like values.

…spec

The previous implementation only stripped tab (0x09), LF (0x0A), and
CR (0x0D) from URI scheme prefixes before comparison. However, per the
WHATWG URL specification, browsers silently strip ALL C0 control
characters (0x00-0x1F) and DEL (0x7F) from URI schemes. This means an
attacker could bypass the dangerous-scheme check by inserting characters
like NULL (0x00), form feed (0x0C), vertical tab (0x0B), bell (0x07),
backspace (0x08), or DEL (0x7F) into the scheme prefix.

Add stripC0Controls() that removes the full C0 range plus DEL before
scheme comparison, and add comprehensive test coverage for each bypass
vector.

Fixes projectdiscovery#7086

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@auto-assign auto-assign bot requested a review from dogancanbakir March 20, 2026 22:06
@neo-by-projectdiscovery-dev
Copy link

neo-by-projectdiscovery-dev bot commented Mar 20, 2026

Neo - PR Security Review

No security issues found

Highlights

  • Incremental diff analysis shows no code changes between the two commits
  • The stripC0Controls() implementation correctly removes all C0 control characters (0x00-0x1F) and DEL (0x7F)
  • All 9 test cases covering NULL byte, form feed, vertical tab, bell, backspace, DEL, and mixed C0 characters remain present
Hardening Notes
  • ✅ Implementation continues to correctly match WHATWG URL specification for C0 control character stripping
  • ✅ Order of operations remains browser-accurate: TrimSpace → ToLower → stripC0Controls → HasPrefix check
  • ✅ Test coverage remains comprehensive across the full C0 range (0x00-0x1F) plus DEL (0x7F)

Comment @pdneo help for available commands. · Open in Neo

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e13a7921-1790-40a0-8184-ba9ce8e77c9a

📥 Commits

Reviewing files that changed from the base of the PR and between 077daf2 and 780664d.

📒 Files selected for processing (1)
  • pkg/fuzz/analyzers/xss/analyzer_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/fuzz/analyzers/xss/analyzer_test.go

Walkthrough

Added C0 control character filtering (U+0000–U+001F and U+007F) to URL attribute normalization in the XSS analyzer so scheme-prefix checks (e.g., javascript:, vbscript:, data:) operate on values with those control characters removed. Tests were added to cover these cases.

Changes

Cohort / File(s) Summary
XSS analyzer & tests
pkg/fuzz/analyzers/xss/analyzer.go, pkg/fuzz/analyzers/xss/analyzer_test.go
Added stripC0Controls helper that removes C0 control characters and DEL from normalized attribute values before HasPrefix checks for javascript:, vbscript:, and certain data: URIs. Extended tests with multiple cases embedding C0 bytes in href/src values to assert ContextScript classification.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I nibble bytes both bold and small,

I strip the hidden control-char sprawl.
With C0s gone from scheme and link,
No sneaky script can hop or slink.
A rabbit's fix — quick, sure, and small.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR partially addresses issue #7086 by fixing the C0 control character bypass in XSS context detection, but does not implement all four requirements from the linked issue. Address remaining requirements: case-insensitive reflection detection, proper classification of script type="application/json", and srcdoc attribute handling as HTML injection context.
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: stripping C0 control characters from URI schemes in the XSS analyzer, aligning with WHATWG spec.
Out of Scope Changes check ✅ Passed All changes are focused on fixing C0 control character handling in URI scheme detection, which is directly related to the XSS context analyzer bypass mentioned in issue #7086.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/fuzz/analyzers/xss/analyzer_test.go (1)

225-279: Comprehensive test coverage for C0 control character bypass scenarios.

The new test cases effectively cover:

  • Various C0 code points (NULL, FF, VT, BEL, BS, DEL)
  • Control characters at different positions (prefix, mid-scheme)
  • All three dangerous URI schemes (javascript:, vbscript:, data:text/html)
  • Mixed sequences to ensure cumulative stripping works

One optional addition: explicit tests for tab (0x09), LF (0x0A), and CR (0x0D) embedded mid-scheme (e.g., java\x09script:) would serve as regression tests, since the PR summary mentions these were previously only handled at boundaries by TrimSpace.

,

💡 Optional: add regression tests for tab/LF/CR mid-scheme
{
    name:     "tab (0x09) mid-scheme in javascript: URI",
    body:     "<a href=\"java\x09script:alert('FUZZ1337MARKER')\">xss</a>",
    marker:   marker,
    expected: ContextScript,
},
{
    name:     "LF (0x0A) mid-scheme in javascript: URI",
    body:     "<a href=\"java\x0ascript:alert('FUZZ1337MARKER')\">xss</a>",
    marker:   marker,
    expected: ContextScript,
},
{
    name:     "CR (0x0D) mid-scheme in javascript: URI",
    body:     "<a href=\"java\x0dscript:alert('FUZZ1337MARKER')\">xss</a>",
    marker:   marker,
    expected: ContextScript,
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer_test.go` around lines 225 - 279, Add three
regression test cases to the same test table in analyzer_test.go alongside the
existing C0 control-char cases: "tab (0x09) mid-scheme in javascript: URI", "LF
(0x0A) mid-scheme in javascript: URI", and "CR (0x0D) mid-scheme in javascript:
URI"; set each body to embed the respective escape (e.g., java\x09script:) using
the existing marker variable and expected to ContextScript so they exercise the
same detection path as the other cases (place them near the other javascript:
entries in the test slice).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/fuzz/analyzers/xss/analyzer_test.go`:
- Around line 225-279: Add three regression test cases to the same test table in
analyzer_test.go alongside the existing C0 control-char cases: "tab (0x09)
mid-scheme in javascript: URI", "LF (0x0A) mid-scheme in javascript: URI", and
"CR (0x0D) mid-scheme in javascript: URI"; set each body to embed the respective
escape (e.g., java\x09script:) using the existing marker variable and expected
to ContextScript so they exercise the same detection path as the other cases
(place them near the other javascript: entries in the test slice).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e26fb27b-9b53-4613-bf7a-e792b6b70301

📥 Commits

Reviewing files that changed from the base of the PR and between 48e05ca and 077daf2.

📒 Files selected for processing (2)
  • pkg/fuzz/analyzers/xss/analyzer.go
  • pkg/fuzz/analyzers/xss/analyzer_test.go

Addresses CodeRabbit review suggestion to add explicit regression tests
for tab (0x09), LF (0x0A), and CR (0x0D) embedded mid-scheme, since
these were previously only handled at boundaries by TrimSpace.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@themavik themavik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stripC0Controls before the scheme prefix checks matches WHATWG stripping; tests cover the interesting C0 cases. nit: any other branches in this analyzer that classify javascript:/data: without going through this trimmed path could still be bypassed the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XSS Context Analyzer misclassifies javascript: URIs and JSON script blocks

2 participants