issue #619: bulk-prefetch vector segment warmup via SegmentBlockCache.bulkInsert#625
Merged
Merged
Conversation
….bulkInsert The pre-publish warmup BFS in PersistentVectorStore.warmUpNewSegmentsBeforePublish issues one ~16 KiB readFileRange RPC per HNSW Layer-0 node up to the configured warmup budget — ~2000 round-trips per 32 MiB-budgeted segment, dominating checkpoint Phase C-prep and compaction publish latency when the indexing service runs against the remote file server / S3 / GCS. This change adds: - BulkPrefetchReaderSupplier, a ReaderSupplier mixin that fetches a contiguous prefix of the segment file in multipart-chunk-sized async reads (one per underlying multipart chunk, in parallel) and bulk-inserts every covered cache-block into SegmentBlockCache; - SegmentBlockCache.bulkInsert(path, baseOffset, blockSize, ByteBuf), which splits the prefetched buffer into block-sized retained slices keyed exactly like getBlock and never overwrites a single-flighted entry, with refcount discipline verified by SegmentBlockCacheBulkInsertTest; - bulkPrefetchSegmentIntoCache in PersistentVectorStore, invoked from warmUpSegment before the BFS so that the BFS runs entirely against a hot cache (no wire I/O), preserving the per-segment "warmedUp" accounting and the pin BFS pass for the frontier region; the prefetch is best-effort — an IOException is swallowed and the original BFS path still runs. When the segment's onDiskReaderSupplier is not a remote/multipart-backed one (local-file, in-memory) the prefetch is skipped and behaviour is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes the BLOCK and REQUEST_CHANGES items raised by the pr-reviewer on PR #625: - (BLOCK) SegmentBlockCache.bulkInsert no longer stores retainedSlice(...) of the multipart-chunk-sized parent buffer; instead it allocates a fresh pooled direct ByteBuf per block (matching the cache's blockSize weigher exactly) and copies the bytes across before releasing the parent. This eliminates the off-heap memory amplification where a single hot 16 KiB block could pin a 4 MiB multipart-chunk pool buffer alive, making weightedSize under-account by up to 256x. Per-block copies cost one memcpy each, negligible vs. the network read that produced the chunk. - (R2) SegmentBlockCache.pinBlock now checks the main eviction region BEFORE invoking the loader; on hit it allocates a fresh pooled buffer for the frontier and copies the bytes from the main entry. The warmup pin-BFS pass that follows bulk-prefetch therefore promotes blocks main -> frontier with zero additional wire reads. - (R3) PersistentVectorStore.bulkPrefetchSegmentIntoCache now returns the inserted-byte count. warmUpSegment uses it to skip warmUpSegmentBfs when the prefetch fully covers Math.min(budget, fileSize). A new counter warmedSegmentsBulkPrefetchOnly tracks the share of warmups served by the fast path; getWarmedSegmentsBulkPrefetchOnly() exposes it for tests and Prometheus. - (R4) SegmentBlockCacheBulkInsertTest gains concurrentBulkInsertGetBlockAndInvalidate: 8 threads race bulkInsert / getBlock / invalidatePath against the same path for 200 iterations; assertions on payload correctness and final weightedSize == 0. - (R5) RemoteRandomAccessReaderBulkPrefetchTest gains prefetchFailsCleanlyWhenServerMissingTrailingChunk (partial availability, IOException + clean cache after invalidate) and the pin-after-prefetch pair pinModeAfterPrefetchPromotesMainCacheWithoutWireRead / pinModeWithoutPriorPrefetchStillLoadsFromWire that prove R2's main->frontier promotion path. - (R7) BulkPrefetchReaderSupplier and PersistentVectorStore.bulkPrefetchSegmentIntoCache javadoc the "first N bytes ~= HNSW entry-frontier" layout assumption. - (R8) RemoteRandomAccessReader.Supplier.bulkPrefetchIntoCache javadoc the "must not be called from a netty event-loop thread" constraint. - Style nits: rename the per-block inserted[] flag to didInsert[], expand in-line comments on the per-block-copy and main-cache-promotion semantics, test assertions updated to reflect the new eager-release ownership contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace `assertNull(cache.estimatedSize() > 0 ? "non-empty" : null)` with `assertEquals(0L, cache.estimatedSize())` for readability. The semantic is identical: both assert that the cache has no entries after invalidatePath. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #619.
What
PersistentVectorStore.warmUpNewSegmentsBeforePublishis the dominant cost ofcheckpoint Phase C-prep and compaction publish on the indexing service —
because the warmup BFS issues one ~16 KiB
readFileRangeRPC per HNSWLayer-0 node, ~2000 round-trips per 32 MiB-budgeted segment, against the
remote file server / S3 / GCS.
This PR replaces those many small reads with a small number of large
multipart-chunk-sized bulk reads that go straight into the
SegmentBlockCache. The BFS still runs afterwards — exclusively against thehot cache — so it preserves the per-segment
warmedUpaccounting and thepin BFS pass for the frontier region (#578), with no behaviour change for
local-file / in-memory readers.
Changes
BulkPrefetchReaderSupplier(new, inherddb-core): mixin interfaceanalogous to
PinModeReaderSupplier. Exposeslong bulkPrefetchIntoCache(long startOffset, long maxBytes).SegmentBlockCache.bulkInsert(path, baseOffset, blockSize, ByteBuf):new API that splits a contiguous prefetched range into block-sized retained
slices keyed identically to
getBlock, taking ownership of the input bufferand never overwriting a single-flighted entry. New stats counters
bulkInsertedBlockCount/bulkInsertedByteCount/bulkInsertSkippedCount.RemoteRandomAccessReader.Suppliernow implementsBulkPrefetchReaderSupplier: dispatches onereadFileRangeAsByteBufAsyncper multipart chunk in parallel, thenbulk-inserts each completed chunk into the cache. Aligns to the
writeBlockSizeboundary because the storage backend forbids readsspanning more than one multipart chunk.
PersistentVectorStore.bulkPrefetchSegmentIntoCache: invoked fromwarmUpSegmentbefore the BFS. Best-effort: anIOExceptionfrom thesupplier is logged at
WARNINGand the original per-node BFS path stillruns, so we never regress versus pre-Speed up segment warm up #619 behaviour even when the remote
storage is degraded.
Tests
SegmentBlockCacheBulkInsertTest(10 tests): contract of the newbulkInsertAPI — full-block range, partial last block, argumentvalidation (always releases the input buffer), disabled-cache no-op,
existing-entry preservation + skip counting,
invalidatePathdrives theparent buffer's refCnt back to 0 (no leak), non-zero aligned offsets,
repeated insert.
RemoteRandomAccessReaderBulkPrefetchTest(11 tests): end-to-endagainst an in-process
RemoteFileServer+ multipart file spanning 3 chunks— supplier implements the mixin; full prefetch leaves the cache hot with
zero loader invocations; partial prefetch only caches the requested chunks;
disabled cache short-circuits; missing path surfaces as
IOException;repeated prefetch counts skips;
invalidatePathreleases everyprefetched block.
Issue619BulkWarmupTest(7 tests): dispatch contract fromPersistentVectorStore.bulkPrefetchSegmentIntoCache— exactly-once callwith
(0, budget)for matching suppliers, no call for plainReaderSupplier,IOExceptionand zero/negative budget swallowed, segment state notmutated.
Test plan
mvn -pl herddb-remote-file-service -Dtest='SegmentBlockCacheBulkInsertTest,RemoteRandomAccessReaderBulkPrefetchTest,SegmentBlockCacheTest,SegmentBlockCacheAsyncTest,SegmentBlockCacheHammerTest' test— 57/57 ✅mvn -pl herddb-indexing-service -Dtest='Issue619BulkWarmupTest,Issue569WarmupBeforePublishTest,BlockCacheWarmupTest' test— 19/19 ✅LazyValueCacheConcurrentUpdatesSuite{NoIndexes,WithNonUniqueIndexes,WithUniqueIndexes}Test— 12/12 ✅mvn -B spotless:check apache-rat:check spotbugs:check install -DskipTests -Pci— BUILD SUCCESS ✅🤖 Implemented by the
pr-workeragent.