Skip to content

Fix DoH response header corruption and POST body mangling#34

Open
Gobo42 wants to merge 1 commit into
TuxInvader:masterfrom
Gobo42:fix/doh-header-and-post-body-corruption
Open

Fix DoH response header corruption and POST body mangling#34
Gobo42 wants to merge 1 commit into
TuxInvader:masterfrom
Gobo42:fix/doh-header-and-post-body-corruption

Conversation

@Gobo42

@Gobo42 Gobo42 commented Jul 3, 2026

Copy link
Copy Markdown

1. Invalid X-DNS-Answers header for unhandled record types (e.g. HTTPS/SVCB)

Symptom in error.log:

upstream sent invalid header: "X-DNS-Answers: [HTTPS:\x00..." while reading response header from upstream

parse_resource_record() in libdns.js only had decoders for A, AAAA, NS, CNAME, SOA, SRV, MX, TXT. Any other type (notably HTTPS/SVCB, increasingly common in real-world responses) fell into the default branch, which left rdata as a raw binary Buffer. That buffer was then string-concatenated directly into the X-DNS-Answers debug header, embedding raw/control bytes into an HTTP header value — which nginx rejects.

Fixed by hex-encoding unhandled RDATA instead of leaving it raw.

Note: nginx 1.21.1 (Jul 2021) tightened validation of headers and improved the invalid-header log message to include the offending bytes in hex (nginx trac #2437, nginx-devel changeset). This bug likely existed on older versions too but wouldn't produce this specific hard failure/log format — it's 1.21.1+ where it surfaces exactly like this.

2. DoH POST body corruption breaking real DoH clients

Symptom in error.log:

upstream prematurely closed connection while reading response header from upstream

sometimes preceded by:

js exception: RangeError: index ## is outside the bound of the buffer
    at readUInt16BE (native)
    at parse_question (/etc/nginx/njs.d/dns/libdns.js:185)
    at <anonymous> (/etc/nginx/njs.d/dns/dns.js:91)

process_doh_request() decoded the raw POST body via data.toString('utf8') and later reconstructed the DNS message bytes via Buffer.from(lines[index + 1]). Since the body is arbitrary binary DNS wire format, any byte sequence that isn't valid UTF-8 (random 16-bit query IDs, EDNS0 COOKIE options, etc.) gets silently replaced with U+FFFD on decode, then re-encoded as a 3-byte EF BF BD sequence — corrupting the packet before it's ever parsed. This is intermittent by nature (depends on whether the random ID/cookie bytes happen to be valid UTF-8), which is why it can look like "some domains work, others don't" when it's actually query-dependent, not domain-dependent, and independent of nginx version. Reliably breaks clients like dig +https that include EDNS0 cookies.

How this was diagnosed: manually crafted, hand-built wire-format DNS queries sent via curl --data-binary (POST) worked fine and returned correct 200 responses — including for domains with CNAME chains and DNS name compression — which initially looked like a complete fix. But dig +https against the same endpoint still failed every time with the "prematurely closed" error. The difference turned out to be the query content, not the client or method: the hand-built test queries used simple, low-byte-value/ASCII-safe fields (fixed query ID, no EDNS0 options), while dig fills in a random query ID and attaches an EDNS0 COOKIE option with random bytes — both far more likely to contain invalid UTF-8 sequences. Bumping nginx's error_log level to warn or info (it was set to the default error level, which is less verbose than warn and was silently suppressing the script's own debug output) surfaced the actual bytes njs received: repeated efbfbd sequences — the UTF-8 encoding of the U+FFFD replacement character — confirming the corruption was happening inside the request-body decoding step, not in DNS packet parsing itself.

Fixed by slicing the raw request buffer directly at the header/body boundary (\r\n\r\n) instead of round-tripping the body through a string.

Test plan

  • dig +tcp against local resolver confirms TCP path unaffected
  • DoH POST with a manually-crafted ASCII-safe wire-format query returns 200 with correct decoded answers
  • dig +https against a live DoH endpoint (previously failing with upstream prematurely closed connection) now succeeds and returns correct answers
  • Query for a domain returning an HTTPS/SVCB record no longer produces an invalid X-DNS-Answers header

X-DNS-Answers included raw binary RDATA for record types with no
dedicated decoder (e.g. HTTPS/SVCB), which nginx rejected as an
invalid upstream header. Unhandled RDATA is now hex-encoded.

DoH POST request bodies were decoded as UTF-8 text and re-encoded
back to bytes to extract the message from the HTTP payload. Any
byte sequence that wasn't valid UTF-8 (random query IDs, EDNS0
COOKIE options, etc.) got replaced with U+FFFD and corrupted before
the DNS packet was ever parsed, breaking real-world DoH clients
like dig. The body is now sliced directly from the raw request
buffer at the header/body boundary instead of round-tripping
through a string.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant