[producer] Prevent RecordTooLargeException when batched partial updates exceed size limit#2660
[producer] Prevent RecordTooLargeException when batched partial updates exceed size limit#2660sixpluszero wants to merge 4 commits into
Conversation
exceed size limit When BatchingVeniceWriter merges multiple partial updates for the same key, the merged payload can exceed the 950KB producer size limit, causing RecordTooLargeException since UPDATE operations don't support chunking. This adds a size-aware fallback: after the optimistic merge, if the result exceeds the limit, the merge is redone incrementally. When the next merge would exceed the limit, the accumulated result is produced as an intermediate UPDATE and a new accumulation starts. This preserves as much batching as possible while preventing oversize records. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses nearline producer failures caused by RecordTooLargeException when BatchingVeniceWriter merges multiple partial UPDATEs for the same key into a single oversized payload (UPDATEs can’t be chunked).
Changes:
- Add a size check after the optimistic merged UPDATE serialization and fall back to an incremental merge that emits intermediate UPDATEs when needed.
- Introduce helpers to split/produce intermediate UPDATEs and a getter for the max payload size to improve testability.
- Add unit tests covering oversize-merge splitting behavior and regression coverage for the normal merge path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
internal/venice-common/src/main/java/com/linkedin/venice/writer/BatchingVeniceWriter.java |
Adds size-aware fallback merging/splitting to prevent producing oversized merged UPDATE records. |
internal/venice-common/src/test/java/com/linkedin/venice/writer/BatchingVeniceWriterTest.java |
Adds unit tests validating the new splitting behavior and the unchanged normal merge behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…e after split, use atLeast(2) - Chain intermediate produce future to shared produceResultFuture so async failures propagate to callers - Re-serialize single update through superset schema after split for consistency - Complete produceResultFuture exceptionally on synchronous failures too - Use atLeast(2) in tests instead of times(2) to avoid brittleness from serialized size variations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
correctness - testIntermediateProduceFailurePropagesToProduceResultFuture: verifies that when VeniceWriter.update() throws during intermediate produce, the shared produceResultFuture is completed exceptionally - testMultipleSplitsWithFiveUpdates: 5 updates with a tight limit forcing 3+ splits, verifies all 5 callbacks are completed (no dropped callbacks) - testIntermediateProducePayloadContainsCorrectFields: verifies each intermediate/final payload deserializes to a valid update record and the final payload contains expected field data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When the final merge exceeds the limit but lastValidSerialized is null (e.g., anchor is DELETE with no prior accumulated UPDATEs), the intermediate produce is skipped. Previously, dependentCallbackList was cleared unconditionally, dropping callbacks and leaving callers hanging. Now callbacks are preserved on the final record when no intermediate produce occurred. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Hi there. This pull request has been inactive for 30 days. To keep our review queue healthy, we plan to close it in 7 days unless there is new activity. If you are still working on this, please push a commit, leave a comment, or convert it to draft to signal intent. Thank you for your time and contributions. |
|
Closing this pull request due to 37 days of inactivity. This is not a judgment on the value of the work. If you would like to continue, please reopen or open a new PR and we will be happy to take another look. Thank you again for contributing. |
Problem Statement
When
BatchingVeniceWritermerges multiple partial update (UPDATE) records for the same key, the merged payload can exceed the 950KB producer size limit (DEFAULT_MAX_SIZE_FOR_USER_PAYLOAD_PER_MESSAGE_IN_BYTES). This happens when updates touch different fields — the merged result contains all fields and grows beyond any individual update's size.When the oversized merged record is produced via
VeniceWriter.update(), it throwsRecordTooLargeException. Since UPDATE operations do not support chunking, there is no recovery path — the exception propagates and causes nearline producer failures.Solution
Added a size-aware fallback in
BatchingVeniceWriter.maybeUpdateRecordUpdatePayload():This preserves as much batching as possible while preventing oversize records. The fallback only triggers when the merged payload actually exceeds the limit, so there is no performance impact on the normal path.
Key changes:
maybeUpdateRecordUpdatePayload(): Added size check after optimistic merge; calls fallback if exceeded.mergeAndProduceWithSizeLimit(): New method that re-does the merge incrementally, producing intermediate results when the limit would be exceeded.produceIntermediateUpdate(): New helper that produces an intermediate merged UPDATE directly to the internal VeniceWriter with proper callback management.getMaxSizeForUserPayloadPerMessageInBytes(): Package-private getter for testability.Code changes
Concurrency-Specific Checks
ReentrantLockincheckAndMaybeProduceBatchRecord().synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
New tests:
testMergedUpdateExceedsSizeLimitProducesIntermediateBatch— 3 UPDATE records touching different fields; verifies intermediate + final produce when merged result exceeds limit.testMergedUpdateUnderSizeLimitStillMerges— Regression test verifying normal merge path is unaffected.testMergedUpdateWithPutAnchorExceedsSizeLimit— PUT + 2 UPDATEs where full merge exceeds limit; verifies splitting with PUT anchor.All existing
BatchingVeniceWriterTesttests continue to pass.Does this PR introduce any user-facing or breaking changes?
🤖 Generated with Claude Code