fix(xss): strip all C0 control chars from URI schemes per WHATWG spec by mendarb · Pull Request #7280 · projectdiscovery/nuclei

mendarb · 2026-03-20T22:06:28Z

Summary

Fixes a bypass in the XSS context analyzer's dangerous URI scheme detection where only tab (0x09), LF (0x0A), and CR (0x0D) were stripped from scheme prefixes before comparison
Per the WHATWG URL spec, browsers silently strip all C0 control characters (0x00-0x1F) and DEL (0x7F) from URI schemes, meaning attackers could bypass detection with payloads like java\x00script:, java\x0Cscript:, \x07javascript:, etc.
Adds stripC0Controls() using strings.Map to remove the full C0 range (0x00-0x1F) plus DEL (0x7F) before scheme prefix matching
Adds 9 new test cases covering NULL byte, form feed, vertical tab, bell, backspace, DEL, multiple mixed C0 chars, vbscript with C0, and data: URI with C0

Context

This is a v2 of #7279 which was closed because the Neo security bot correctly identified that the original fix only handled tab/LF/CR and missed the broader C0 control character range.

Fixes #7086

Test plan

All 69 existing + new tests pass (go test ./pkg/fuzz/analyzers/xss/...)
New tests specifically cover: NULL (0x00), vertical tab (0x0B), form feed (0x0C), bell (0x07), backspace (0x08), DEL (0x7F), multiple mixed C0 chars, vbscript with C0, data:text/html with C0
Verify no regressions in broader test suite

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved XSS detection for URL-like attributes by normalizing and filtering control characters so dangerous URI schemes are consistently recognized.
Tests
- Expanded test coverage with cases asserting correct classification when control characters are embedded in URI-like values.

…spec The previous implementation only stripped tab (0x09), LF (0x0A), and CR (0x0D) from URI scheme prefixes before comparison. However, per the WHATWG URL specification, browsers silently strip ALL C0 control characters (0x00-0x1F) and DEL (0x7F) from URI schemes. This means an attacker could bypass the dangerous-scheme check by inserting characters like NULL (0x00), form feed (0x0C), vertical tab (0x0B), bell (0x07), backspace (0x08), or DEL (0x7F) into the scheme prefix. Add stripC0Controls() that removes the full C0 range plus DEL before scheme comparison, and add comprehensive test coverage for each bypass vector. Fixes projectdiscovery#7086 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

neo-by-projectdiscovery-dev · 2026-03-20T22:06:36Z

Neo - PR Security Review

No security issues found

Highlights

Incremental diff analysis shows no code changes between the two commits
The stripC0Controls() implementation correctly removes all C0 control characters (0x00-0x1F) and DEL (0x7F)
All 9 test cases covering NULL byte, form feed, vertical tab, bell, backspace, DEL, and mixed C0 characters remain present

Hardening Notes

✅ Implementation continues to correctly match WHATWG URL specification for C0 control character stripping
✅ Order of operations remains browser-accurate: TrimSpace → ToLower → stripC0Controls → HasPrefix check
✅ Test coverage remains comprehensive across the full C0 range (0x00-0x1F) plus DEL (0x7F)

_{Comment @pdneo help for available commands. · Open in Neo}

coderabbitai · 2026-03-20T22:06:46Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e13a7921-1790-40a0-8184-ba9ce8e77c9a

📥 Commits

Reviewing files that changed from the base of the PR and between 077daf2 and 780664d.

📒 Files selected for processing (1)

pkg/fuzz/analyzers/xss/analyzer_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/fuzz/analyzers/xss/analyzer_test.go

Walkthrough

Added C0 control character filtering (U+0000–U+001F and U+007F) to URL attribute normalization in the XSS analyzer so scheme-prefix checks (e.g., javascript:, vbscript:, data:) operate on values with those control characters removed. Tests were added to cover these cases.

Changes

Cohort / File(s)	Summary
XSS analyzer & tests `pkg/fuzz/analyzers/xss/analyzer.go`, `pkg/fuzz/analyzers/xss/analyzer_test.go`	Added `stripC0Controls` helper that removes C0 control characters and DEL from normalized attribute values before `HasPrefix` checks for `javascript:`, `vbscript:`, and certain `data:` URIs. Extended tests with multiple cases embedding C0 bytes in `href`/`src` values to assert `ContextScript` classification.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I nibble bytes both bold and small,

I strip the hidden control-char sprawl.
With C0s gone from scheme and link,
No sneaky script can hop or slink.
A rabbit's fix — quick, sure, and small.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR partially addresses issue `#7086` by fixing the C0 control character bypass in XSS context detection, but does not implement all four requirements from the linked issue.	Address remaining requirements: case-insensitive reflection detection, proper classification of script type="application/json", and srcdoc attribute handling as HTML injection context.
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: stripping C0 control characters from URI schemes in the XSS analyzer, aligning with WHATWG spec.
Out of Scope Changes check	✅ Passed	All changes are focused on fixing C0 control character handling in URI scheme detection, which is directly related to the XSS context analyzer bypass mentioned in issue `#7086`.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

pkg/fuzz/analyzers/xss/analyzer_test.go (1)
225-279: Comprehensive test coverage for C0 control character bypass scenarios.

The new test cases effectively cover:

Various C0 code points (NULL, FF, VT, BEL, BS, DEL)

Control characters at different positions (prefix, mid-scheme)

All three dangerous URI schemes (javascript:, vbscript:, data:text/html)

Mixed sequences to ensure cumulative stripping works

One optional addition: explicit tests for tab (0x09), LF (0x0A), and CR (0x0D) embedded mid-scheme (e.g., java\x09script:) would serve as regression tests, since the PR summary mentions these were previously only handled at boundaries by TrimSpace.

,
💡 Optional: add regression tests for tab/LF/CR mid-scheme
{
    name:     "tab (0x09) mid-scheme in javascript: URI",
    body:     "<a href=\"java\x09script:alert('FUZZ1337MARKER')\">xss</a>",
    marker:   marker,
    expected: ContextScript,
},
{
    name:     "LF (0x0A) mid-scheme in javascript: URI",
    body:     "<a href=\"java\x0ascript:alert('FUZZ1337MARKER')\">xss</a>",
    marker:   marker,
    expected: ContextScript,
},
{
    name:     "CR (0x0D) mid-scheme in javascript: URI",
    body:     "<a href=\"java\x0dscript:alert('FUZZ1337MARKER')\">xss</a>",
    marker:   marker,
    expected: ContextScript,
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer_test.go` around lines 225 - 279, Add three
regression test cases to the same test table in analyzer_test.go alongside the
existing C0 control-char cases: "tab (0x09) mid-scheme in javascript: URI", "LF
(0x0A) mid-scheme in javascript: URI", and "CR (0x0D) mid-scheme in javascript:
URI"; set each body to embed the respective escape (e.g., java\x09script:) using
the existing marker variable and expected to ContextScript so they exercise the
same detection path as the other cases (place them near the other javascript:
entries in the test slice).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/fuzz/analyzers/xss/analyzer_test.go`:
- Around line 225-279: Add three regression test cases to the same test table in
analyzer_test.go alongside the existing C0 control-char cases: "tab (0x09)
mid-scheme in javascript: URI", "LF (0x0A) mid-scheme in javascript: URI", and
"CR (0x0D) mid-scheme in javascript: URI"; set each body to embed the respective
escape (e.g., java\x09script:) using the existing marker variable and expected
to ContextScript so they exercise the same detection path as the other cases
(place them near the other javascript: entries in the test slice).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e26fb27b-9b53-4613-bf7a-e792b6b70301

📥 Commits

Reviewing files that changed from the base of the PR and between 48e05ca and 077daf2.

📒 Files selected for processing (2)

pkg/fuzz/analyzers/xss/analyzer.go
pkg/fuzz/analyzers/xss/analyzer_test.go

Addresses CodeRabbit review suggestion to add explicit regression tests for tab (0x09), LF (0x0A), and CR (0x0D) embedded mid-scheme, since these were previously only handled at boundaries by TrimSpace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

themavik

stripC0Controls before the scheme prefix checks matches WHATWG stripping; tests cover the interesting C0 cases. nit: any other branches in this analyzer that classify javascript:/data: without going through this trimmed path could still be bypassed the same way.

auto-assign bot requested a review from dogancanbakir March 20, 2026 22:06

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

themavik reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(xss): strip all C0 control chars from URI schemes per WHATWG spec#7280

fix(xss): strip all C0 control chars from URI schemes per WHATWG spec#7280
mendarb wants to merge 2 commits intoprojectdiscovery:devfrom
mendarb:fix/xss-context-analyzer-v2

mendarb commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

neo-by-projectdiscovery-dev bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

themavik left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mendarb commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Summary by CodeRabbit

Uh oh!

neo-by-projectdiscovery-dev bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Neo - PR Security Review

Highlights

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

themavik left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mendarb commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

neo-by-projectdiscovery-dev bot commented Mar 20, 2026 •

edited

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading