fix(xss): strip all C0 control chars from URI schemes per WHATWG spec#7280
fix(xss): strip all C0 control chars from URI schemes per WHATWG spec#7280mendarb wants to merge 2 commits intoprojectdiscovery:devfrom
Conversation
…spec The previous implementation only stripped tab (0x09), LF (0x0A), and CR (0x0D) from URI scheme prefixes before comparison. However, per the WHATWG URL specification, browsers silently strip ALL C0 control characters (0x00-0x1F) and DEL (0x7F) from URI schemes. This means an attacker could bypass the dangerous-scheme check by inserting characters like NULL (0x00), form feed (0x0C), vertical tab (0x0B), bell (0x07), backspace (0x08), or DEL (0x7F) into the scheme prefix. Add stripC0Controls() that removes the full C0 range plus DEL before scheme comparison, and add comprehensive test coverage for each bypass vector. Fixes projectdiscovery#7086 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Neo - PR Security ReviewNo security issues found Highlights
Hardening Notes
Comment |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdded C0 control character filtering (U+0000–U+001F and U+007F) to URL attribute normalization in the XSS analyzer so scheme-prefix checks (e.g., Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/fuzz/analyzers/xss/analyzer_test.go (1)
225-279: Comprehensive test coverage for C0 control character bypass scenarios.The new test cases effectively cover:
- Various C0 code points (NULL, FF, VT, BEL, BS, DEL)
- Control characters at different positions (prefix, mid-scheme)
- All three dangerous URI schemes (javascript:, vbscript:, data:text/html)
- Mixed sequences to ensure cumulative stripping works
One optional addition: explicit tests for tab (0x09), LF (0x0A), and CR (0x0D) embedded mid-scheme (e.g.,
java\x09script:) would serve as regression tests, since the PR summary mentions these were previously only handled at boundaries byTrimSpace.,
💡 Optional: add regression tests for tab/LF/CR mid-scheme
{ name: "tab (0x09) mid-scheme in javascript: URI", body: "<a href=\"java\x09script:alert('FUZZ1337MARKER')\">xss</a>", marker: marker, expected: ContextScript, }, { name: "LF (0x0A) mid-scheme in javascript: URI", body: "<a href=\"java\x0ascript:alert('FUZZ1337MARKER')\">xss</a>", marker: marker, expected: ContextScript, }, { name: "CR (0x0D) mid-scheme in javascript: URI", body: "<a href=\"java\x0dscript:alert('FUZZ1337MARKER')\">xss</a>", marker: marker, expected: ContextScript, },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/fuzz/analyzers/xss/analyzer_test.go` around lines 225 - 279, Add three regression test cases to the same test table in analyzer_test.go alongside the existing C0 control-char cases: "tab (0x09) mid-scheme in javascript: URI", "LF (0x0A) mid-scheme in javascript: URI", and "CR (0x0D) mid-scheme in javascript: URI"; set each body to embed the respective escape (e.g., java\x09script:) using the existing marker variable and expected to ContextScript so they exercise the same detection path as the other cases (place them near the other javascript: entries in the test slice).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/fuzz/analyzers/xss/analyzer_test.go`:
- Around line 225-279: Add three regression test cases to the same test table in
analyzer_test.go alongside the existing C0 control-char cases: "tab (0x09)
mid-scheme in javascript: URI", "LF (0x0A) mid-scheme in javascript: URI", and
"CR (0x0D) mid-scheme in javascript: URI"; set each body to embed the respective
escape (e.g., java\x09script:) using the existing marker variable and expected
to ContextScript so they exercise the same detection path as the other cases
(place them near the other javascript: entries in the test slice).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e26fb27b-9b53-4613-bf7a-e792b6b70301
📒 Files selected for processing (2)
pkg/fuzz/analyzers/xss/analyzer.gopkg/fuzz/analyzers/xss/analyzer_test.go
Addresses CodeRabbit review suggestion to add explicit regression tests for tab (0x09), LF (0x0A), and CR (0x0D) embedded mid-scheme, since these were previously only handled at boundaries by TrimSpace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
themavik
left a comment
There was a problem hiding this comment.
stripC0Controls before the scheme prefix checks matches WHATWG stripping; tests cover the interesting C0 cases. nit: any other branches in this analyzer that classify javascript:/data: without going through this trimmed path could still be bypassed the same way.
Summary
java\x00script:,java\x0Cscript:,\x07javascript:, etc.stripC0Controls()usingstrings.Mapto remove the full C0 range (0x00-0x1F) plus DEL (0x7F) before scheme prefix matchingContext
This is a v2 of #7279 which was closed because the Neo security bot correctly identified that the original fix only handled tab/LF/CR and missed the broader C0 control character range.
Fixes #7086
Test plan
go test ./pkg/fuzz/analyzers/xss/...)🤖 Generated with Claude Code
Summary by CodeRabbit