perf(proto-gen): backpatched length-delim writes#25
Merged
Conversation
…Writer The generator's nested-message and packed-repeated encode paths used to allocate a fresh ProtoWriter per nested region, call toByteArray() to materialise its bytes, then writeEmbedded() to copy them again into the parent buffer — one alloc plus two copies per nested field. This commit replaces that pattern with backpatched length-delimited regions on the parent writer: - ProtoWriter.beginLenDelim() reserves 5 bytes (max varint32 length) at the current position and returns a mark. - The caller writes the payload directly into the parent buffer. - ProtoWriter.endLenDelim(mark) writes the actual length varint at the reserved offset and shifts the payload left if the varint is shorter than 5 bytes. Generator changes: - Each message now exposes a package-private `encodeTo(ProtoWriter w)` that writes the record into the caller's writer. The public `encode()` becomes a thin wrapper that constructs a writer, calls encodeTo, and returns toByteArray(). - MessageOptionalEmitter / MessageRepeatedEmitter call `field.encodeTo(w)` between beginLenDelim/endLenDelim instead of `w.writeEmbedded(field.encode())`. - ScalarRepeatedEmitter and EnumRepeatedEmitter use the same backpatch pattern for packed varint regions instead of a temporary writer. Removed sites (all in CodeGen): 4 nested-message and packed-repeated emitters no longer allocate `new ProtoWriter()` or call `toByteArray()` on a child writer. Wire-format compatibility verified by the full Rust round-trip integration suite (`./mvnw verify`, 243 integration tests including RustWritesJavaReadsIntegrationTest and JavaWritesRustReadsIntegrationTest). ProtoRuntimeTest additions in a new `Backpatch` nested class: - shortPayloadCompactsLengthVarint: 3-byte payload yields 1-byte length - backpatchedMatchesLegacyEmbeddedPattern: byte-for-byte parity with the old temp-writer + writeEmbedded path - emptyPayloadProducesSingleZeroLength: zero-length region collapses to a single 0x00 byte - largePayloadKeepsMultiByteLengthVarint: 800-byte payload survives the 3-byte leftward shift Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Eliminates per-nested-message allocation + double-copy in
ProtoWriter.Old pattern (per nested message / packed-repeated):
New pattern:
Generator now emits a package-private
encodeTo(ProtoWriter)per record. Publicencode()is a thin wrapper.Wire compatibility
Verified by Rust round-trip integration suite (
./mvnw verify, 243 integration tests pass).Test plan
./mvnw verify— 872 unit + 243 integration, 0 failuresBackpatchtests inProtoRuntimeTest: short payload compaction, byte-parity with legacywriteEmbedded, empty payload, multi-byte length varintbeginLenDelim/endLenDelimAPI shapeencodeToshould bepublicfor external callers🤖 Generated with Claude Code