Fix DoH response header corruption and POST body mangling#34
Open
Gobo42 wants to merge 1 commit into
Open
Conversation
X-DNS-Answers included raw binary RDATA for record types with no dedicated decoder (e.g. HTTPS/SVCB), which nginx rejected as an invalid upstream header. Unhandled RDATA is now hex-encoded. DoH POST request bodies were decoded as UTF-8 text and re-encoded back to bytes to extract the message from the HTTP payload. Any byte sequence that wasn't valid UTF-8 (random query IDs, EDNS0 COOKIE options, etc.) got replaced with U+FFFD and corrupted before the DNS packet was ever parsed, breaking real-world DoH clients like dig. The body is now sliced directly from the raw request buffer at the header/body boundary instead of round-tripping through a string.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Invalid
X-DNS-Answersheader for unhandled record types (e.g. HTTPS/SVCB)Symptom in
error.log:parse_resource_record()inlibdns.jsonly had decoders forA,AAAA,NS,CNAME,SOA,SRV,MX,TXT. Any other type (notably HTTPS/SVCB, increasingly common in real-world responses) fell into the default branch, which leftrdataas a raw binaryBuffer. That buffer was then string-concatenated directly into theX-DNS-Answersdebug header, embedding raw/control bytes into an HTTP header value — which nginx rejects.Fixed by hex-encoding unhandled RDATA instead of leaving it raw.
Note: nginx 1.21.1 (Jul 2021) tightened validation of headers and improved the invalid-header log message to include the offending bytes in hex (nginx trac #2437, nginx-devel changeset). This bug likely existed on older versions too but wouldn't produce this specific hard failure/log format — it's 1.21.1+ where it surfaces exactly like this.
2. DoH POST body corruption breaking real DoH clients
Symptom in
error.log:sometimes preceded by:
process_doh_request()decoded the raw POST body viadata.toString('utf8')and later reconstructed the DNS message bytes viaBuffer.from(lines[index + 1]). Since the body is arbitrary binary DNS wire format, any byte sequence that isn't valid UTF-8 (random 16-bit query IDs, EDNS0 COOKIE options, etc.) gets silently replaced withU+FFFDon decode, then re-encoded as a 3-byteEF BF BDsequence — corrupting the packet before it's ever parsed. This is intermittent by nature (depends on whether the random ID/cookie bytes happen to be valid UTF-8), which is why it can look like "some domains work, others don't" when it's actually query-dependent, not domain-dependent, and independent of nginx version. Reliably breaks clients likedig +httpsthat include EDNS0 cookies.How this was diagnosed: manually crafted, hand-built wire-format DNS queries sent via
curl --data-binary (POST)worked fine and returned correct200responses — including for domains with CNAME chains and DNS name compression — which initially looked like a complete fix. Butdig +httpsagainst the same endpoint still failed every time with the "prematurely closed" error. The difference turned out to be the query content, not the client or method: the hand-built test queries used simple, low-byte-value/ASCII-safe fields (fixed query ID, no EDNS0 options), whiledigfills in a random query ID and attaches an EDNS0COOKIEoption with random bytes — both far more likely to contain invalid UTF-8 sequences. Bumping nginx'serror_loglevel towarnorinfo(it was set to the defaulterrorlevel, which is less verbose thanwarnand was silently suppressing the script's own debug output) surfaced the actual bytes njs received: repeatedefbfbdsequences — the UTF-8 encoding of theU+FFFDreplacement character — confirming the corruption was happening inside the request-body decoding step, not in DNS packet parsing itself.Fixed by slicing the raw request buffer directly at the header/body boundary (
\r\n\r\n) instead of round-tripping the body through a string.Test plan
dig +tcpagainst local resolver confirms TCP path unaffected200with correct decoded answersdig +httpsagainst a live DoH endpoint (previously failing withupstream prematurely closed connection) now succeeds and returns correct answersX-DNS-Answersheader