Skip to content

fix(data/json): escape strings and keys on emit (RFC 8259)#341

Merged
octalide merged 2 commits into
devfrom
fix/337-json-emit-escaping
Jul 2, 2026
Merged

fix(data/json): escape strings and keys on emit (RFC 8259)#341
octalide merged 2 commits into
devfrom
fix/337-json-emit-escaping

Conversation

@octalide

@octalide octalide commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Closes #337.

Problem

emit_value wrote string values and object keys as raw bytes between quotes, so any value
holding a ", \, or control byte (0x00-0x1F) emitted invalid JSON (e.g. say "hi" ->
"say "hi"").

Fix

emit_escaped writes a string body escaping " -> \", \ -> \\, and control bytes via
the short escapes \b \t \n \f \r where JSON defines them, else \u00xx. All other bytes,
including valid UTF-8, pass through verbatim per RFC 8259. Applied to both the STRING value
path and the object-key path (the issue only named values; keys had the identical bug).

This is the minimal structural escaping. It deliberately does not adopt mach.cli.json`s
UTF-8-validating / ensure-ascii policy — that unification is #338.

Representation asymmetry (documented, tracked as #340)

The parser stores string values as raw on-wire bytes (escapes intact, zero-copy); emit now
treats them as logical bytes. So emitting a parsed tree double-escapes any wire escapes it
held. This asymmetry is documented loudly in the file header and emit doc, and the deeper
contract decision (parser-side decode vs decode-on-demand accessor) is filed as #340, linked
to #337/#338.

Tests

  • exact-byte assertions for quote/backslash escaping
  • exact-byte assertions for control bytes (named short escapes + \u00xx)
  • emitted output for a tree with quote+backslash+control re-parses cleanly through the
    module`s own parser
  • object-key escaping, exact bytes + clean re-parse

mach build . and mach test . green.

🤖 Generated with Claude Code

octalide added 2 commits July 2, 2026 11:27
emit_value wrote string values and object keys as raw bytes between
quotes, so any value containing a quote, backslash, or control byte
produced invalid JSON. add emit_escaped: escape '"' and '\', control
bytes 0x00-0x1F via the short escapes \b \t \n \f \r else \u00xx,
leaving other bytes (including valid UTF-8) verbatim per RFC 8259. this
is the minimal structural escaping, not the #338 ensure-ascii policy.

document the parse/emit representation asymmetry (parse yields raw wire
bytes, emit treats str_val as logical bytes) on both sides; the deeper
contract question is tracked in #340.

Closes #337
exact-byte assertions for quote/backslash escaping and for control bytes
(named short escapes plus \u00xx); the emitted document for a tree with
quote+backslash+control re-parses cleanly through the module's own parser;
object-key escaping covered with exact bytes and a clean re-parse.
@octalide octalide marked this pull request as ready for review July 2, 2026 15:32
@octalide octalide merged commit 4cae93d into dev Jul 2, 2026
2 checks passed
@octalide octalide deleted the fix/337-json-emit-escaping branch July 2, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data/json: emit writes strings unescaped — quotes/backslashes/control bytes produce invalid JSON

1 participant