fix(data/json): escape strings and keys on emit (RFC 8259)#341
Merged
Conversation
emit_value wrote string values and object keys as raw bytes between quotes, so any value containing a quote, backslash, or control byte produced invalid JSON. add emit_escaped: escape '"' and '\', control bytes 0x00-0x1F via the short escapes \b \t \n \f \r else \u00xx, leaving other bytes (including valid UTF-8) verbatim per RFC 8259. this is the minimal structural escaping, not the #338 ensure-ascii policy. document the parse/emit representation asymmetry (parse yields raw wire bytes, emit treats str_val as logical bytes) on both sides; the deeper contract question is tracked in #340. Closes #337
exact-byte assertions for quote/backslash escaping and for control bytes (named short escapes plus \u00xx); the emitted document for a tree with quote+backslash+control re-parses cleanly through the module's own parser; object-key escaping covered with exact bytes and a clean re-parse.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #337.
Problem
emit_valuewrote string values and object keys as raw bytes between quotes, so any valueholding a
",\, or control byte (0x00-0x1F) emitted invalid JSON (e.g.say "hi"->"say "hi"").Fix
emit_escapedwrites a string body escaping"->\",\->\\, and control bytes viathe short escapes
\b \t \n \f \rwhere JSON defines them, else\u00xx. All other bytes,including valid UTF-8, pass through verbatim per RFC 8259. Applied to both the STRING value
path and the object-key path (the issue only named values; keys had the identical bug).
This is the minimal structural escaping. It deliberately does not adopt mach.cli.json`s
UTF-8-validating / ensure-ascii policy — that unification is #338.
Representation asymmetry (documented, tracked as #340)
The parser stores string values as raw on-wire bytes (escapes intact, zero-copy); emit now
treats them as logical bytes. So emitting a parsed tree double-escapes any wire escapes it
held. This asymmetry is documented loudly in the file header and
emitdoc, and the deepercontract decision (parser-side decode vs decode-on-demand accessor) is filed as #340, linked
to #337/#338.
Tests
\u00xx)module`s own parser
mach build .andmach test .green.🤖 Generated with Claude Code