Update rawbody.go - Fix broken encoding due to mishandling Content-Type#35
Update rawbody.go - Fix broken encoding due to mishandling Content-Type#35gunir wants to merge 2 commits intoZenPrivacy:masterfrom
Conversation
Fix: ZenPrivacy#34 This is a classic "Mojibake" issue caused by double-decoding. The artifacts you are seeing (→ instead of →) occur when UTF-8 bytes are misinterpreted as Windows-1252 (Latin-1). The function getRawBodyReader relies on golang.org/x/net/html/charset.NewReader to handle character encoding. When the upstream website (e.g., data-star.dev) returns Content-Type: text/html without an explicit charset=utf-8 parameter, the Go charset library defaults to Windows-1252 (to be spec-compliant with legacy HTML). Your proxy takes the valid UTF-8 bytes (e.g., E2 86 92 for →), "decodes" them as Windows-1252 (resulting in â, †, ’), and then re-encodes them as UTF-8 for the output. The browser receives this re-encoded mess and displays →. Both StreamRewrite and BufferRewrite unconditionally force this header because they assume the content has been successfully converted to UTF-8 by getRawBodyReader. Original Response: Content-Type: text/html (No charset specified). getRawBodyReader: Sees no charset. The golang.org/x/net/html/charset library defaults to Windows-1252 for compatibility. It reads the valid UTF-8 bytes from the server as if they were Windows-1252 bytes. Result: UTF-8 → (bytes E2 86 92) becomes string →. StreamRewrite: Sets Content-Type: text/html; charset=utf-8. Browser: Sees charset=utf-8 header. It renders the string → correctly as those characters, instead of the arrow you wanted. Signed-off-by: Gunir <134402102+gunir@users.noreply.github.com>
WalkthroughChange in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (3)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
httprewrite/rawbody.go (1)
110-116: Critical:bodyis nil in multiCloser – will panic on Close().The
multiCloserincludesbodywhich isnil. When the returned reader is closed,multiCloser.Close()will attempt to callClose()on a nil interface, causing a panic.🐛 Proposed fix
return struct { io.Reader io.Closer }{ decodedReader, - &multiCloser{[]io.Closer{decompressedReader, body}}, + &multiCloser{[]io.Closer{decompressedReader, res.Body}}, }, mimeType, nil
🤖 Fix all issues with AI agents
In `@httprewrite/rawbody.go`:
- Around line 79-80: The code uses an undefined variable header when reading
headers (encoding := header.Get("Content-Encoding") and contentType :=
header.Get("Content-Type")), causing a compile error; update those calls to use
the response's header (res.Header.Get(...)) so they reference the existing
res.Header, keeping the same variable names encoding and contentType.
- Around line 86-91: The early return uses the named return variable body which
is never assigned and is nil; change the pass-through branch so it returns the
actual response body (res.Body) to callers instead of the zero-valued
body—either assign res.Body to body before returning or directly return the
response body along with mimeType and nil error in the branch that checks
charsetParam, encoding and params.
- Around line 93-96: The code passes an uninitialized nil variable "body" into
decompressReader which will fail; locate where "body" is declared/should be set
and initialize it with the actual request payload (e.g., the incoming
io.ReadCloser or a bytes.Reader wrapping the read bytes) before calling
decompressReader(enconding), ensuring you read the raw request body into that
variable, handle errors, and preserve/restore the original request body if
needed and close any readers to avoid leaks.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
httprewrite/rawbody.go
🔇 Additional comments (2)
httprewrite/rawbody.go (2)
82-84: LGTM on the fallback handling.Falling back to
text/plainwhen Content-Type parsing fails is reasonable. Accessingparams["charset"]on a nil map safely returns an empty string in Go.
98-102: Good fix for the Windows-1252 default issue.This correctly forces
charset=utf-8when the upstream response has compression but no explicit charset, preventingcharset.NewReaderfrom defaulting to Windows-1252 per HTML5 legacy behavior. This addresses the root cause described in issue#34.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Update rawbody.go - Fix some variable mismatches Signed-off-by: Gunir <134402102+gunir@users.noreply.github.com>
| // [FIX 2] If we reach here (because of compression), ensure we don't | ||
| // let charset.NewReader default to Windows-1252 if charset is missing. | ||
| if charsetParam == "" { | ||
| contentType = mimeType + "; charset=utf-8" |
There was a problem hiding this comment.
This will unfortunately prevent charset.NewReader from performing a prescan algorithm to find a potential <meta charset> element, see:

Fix: #34
This is a classic "Mojibake" issue caused by double-decoding. The artifacts you are seeing (→ instead of →) occur when UTF-8 bytes are misinterpreted as Windows-1252 (Latin-1).
The function getRawBodyReader relies on golang.org/x/net/html/charset.NewReader to handle character encoding.
Both StreamRewrite and BufferRewrite unconditionally force this header because they assume the content has been successfully converted to UTF-8 by getRawBodyReader.
Original Response: Content-Type: text/html (No charset specified).
getRawBodyReader: Sees no charset. The golang.org/x/net/html/charset library defaults to Windows-1252 for compatibility. It reads the valid UTF-8 bytes from the server as if they were Windows-1252 bytes.
StreamRewrite: Sets Content-Type: text/html; charset=utf-8.
Browser: Sees charset=utf-8 header. It renders the string → correctly as those characters, instead of the arrow you wanted.
What does this PR do?
How did you verify your code works?
What are the relevant issues?