Skip to content

perf(proto-gen): backpatched length-delim writes#25

Merged
dfa1 merged 1 commit into
mainfrom
proto-writer-backpatch
Jun 10, 2026
Merged

perf(proto-gen): backpatched length-delim writes#25
dfa1 merged 1 commit into
mainfrom
proto-writer-backpatch

Conversation

@dfa1

@dfa1 dfa1 commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

Eliminates per-nested-message allocation + double-copy in ProtoWriter.

Old pattern (per nested message / packed-repeated):

ProtoWriter packed = new ProtoWriter();   // alloc 64-byte buf
packed.writeXxx(...);
byte[] __bytes = packed.toByteArray();    // copy #1
w.writeEmbedded(__bytes);                 // copy #2

New pattern:

w.writeTag(N, 2);
int __mark = w.beginLenDelim();           // reserve 5 bytes
field.encodeTo(w);                        // write into parent buffer
w.endLenDelim(__mark);                    // backpatch length, shift left if short

Generator now emits a package-private encodeTo(ProtoWriter) per record. Public encode() is a thin wrapper.

Wire compatibility

Verified by Rust round-trip integration suite (./mvnw verify, 243 integration tests pass).

Test plan

  • ./mvnw verify — 872 unit + 243 integration, 0 failures
  • New Backpatch tests in ProtoRuntimeTest: short payload compaction, byte-parity with legacy writeEmbedded, empty payload, multi-byte length varint
  • Reviewer: confirm beginLenDelim/endLenDelim API shape
  • Reviewer: consider whether encodeTo should be public for external callers

🤖 Generated with Claude Code

…Writer

The generator's nested-message and packed-repeated encode paths used to
allocate a fresh ProtoWriter per nested region, call toByteArray() to
materialise its bytes, then writeEmbedded() to copy them again into the
parent buffer — one alloc plus two copies per nested field.

This commit replaces that pattern with backpatched length-delimited
regions on the parent writer:

- ProtoWriter.beginLenDelim() reserves 5 bytes (max varint32 length) at
  the current position and returns a mark.
- The caller writes the payload directly into the parent buffer.
- ProtoWriter.endLenDelim(mark) writes the actual length varint at the
  reserved offset and shifts the payload left if the varint is shorter
  than 5 bytes.

Generator changes:

- Each message now exposes a package-private `encodeTo(ProtoWriter w)`
  that writes the record into the caller's writer. The public
  `encode()` becomes a thin wrapper that constructs a writer, calls
  encodeTo, and returns toByteArray().
- MessageOptionalEmitter / MessageRepeatedEmitter call
  `field.encodeTo(w)` between beginLenDelim/endLenDelim instead of
  `w.writeEmbedded(field.encode())`.
- ScalarRepeatedEmitter and EnumRepeatedEmitter use the same backpatch
  pattern for packed varint regions instead of a temporary writer.

Removed sites (all in CodeGen): 4 nested-message and packed-repeated
emitters no longer allocate `new ProtoWriter()` or call `toByteArray()`
on a child writer.

Wire-format compatibility verified by the full Rust round-trip
integration suite (`./mvnw verify`, 243 integration tests including
RustWritesJavaReadsIntegrationTest and JavaWritesRustReadsIntegrationTest).

ProtoRuntimeTest additions in a new `Backpatch` nested class:
- shortPayloadCompactsLengthVarint: 3-byte payload yields 1-byte length
- backpatchedMatchesLegacyEmbeddedPattern: byte-for-byte parity with
  the old temp-writer + writeEmbedded path
- emptyPayloadProducesSingleZeroLength: zero-length region collapses to
  a single 0x00 byte
- largePayloadKeepsMultiByteLengthVarint: 800-byte payload survives the
  3-byte leftward shift

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dfa1 dfa1 merged commit c79611e into main Jun 10, 2026
3 checks passed
@dfa1 dfa1 deleted the proto-writer-backpatch branch June 10, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant