Skip to content

issue #619: bulk-prefetch vector segment warmup via SegmentBlockCache.bulkInsert#625

Merged
eolivelli merged 3 commits into
masterfrom
issue-619-speed-up-segment-warm-up
May 20, 2026
Merged

issue #619: bulk-prefetch vector segment warmup via SegmentBlockCache.bulkInsert#625
eolivelli merged 3 commits into
masterfrom
issue-619-speed-up-segment-warm-up

Conversation

@eolivelli
Copy link
Copy Markdown
Owner

Fixes #619.

What

PersistentVectorStore.warmUpNewSegmentsBeforePublish is the dominant cost of
checkpoint Phase C-prep and compaction publish on the indexing service —
because the warmup BFS issues one ~16 KiB readFileRange RPC per HNSW
Layer-0 node, ~2000 round-trips per 32 MiB-budgeted segment, against the
remote file server / S3 / GCS.

This PR replaces those many small reads with a small number of large
multipart-chunk-sized bulk reads that go straight into the
SegmentBlockCache. The BFS still runs afterwards — exclusively against the
hot cache — so it preserves the per-segment warmedUp accounting and the
pin BFS pass for the frontier region (#578), with no behaviour change for
local-file / in-memory readers.

Changes

  • BulkPrefetchReaderSupplier (new, in herddb-core): mixin interface
    analogous to PinModeReaderSupplier. Exposes
    long bulkPrefetchIntoCache(long startOffset, long maxBytes).
  • SegmentBlockCache.bulkInsert(path, baseOffset, blockSize, ByteBuf):
    new API that splits a contiguous prefetched range into block-sized retained
    slices keyed identically to getBlock, taking ownership of the input buffer
    and never overwriting a single-flighted entry. New stats counters
    bulkInsertedBlockCount / bulkInsertedByteCount / bulkInsertSkippedCount.
  • RemoteRandomAccessReader.Supplier now implements
    BulkPrefetchReaderSupplier: dispatches one
    readFileRangeAsByteBufAsync per multipart chunk in parallel, then
    bulk-inserts each completed chunk into the cache. Aligns to the
    writeBlockSize boundary because the storage backend forbids reads
    spanning more than one multipart chunk.
  • PersistentVectorStore.bulkPrefetchSegmentIntoCache: invoked from
    warmUpSegment before the BFS. Best-effort: an IOException from the
    supplier is logged at WARNING and the original per-node BFS path still
    runs, so we never regress versus pre-Speed up segment warm up #619 behaviour even when the remote
    storage is degraded.

Tests

  • SegmentBlockCacheBulkInsertTest (10 tests): contract of the new
    bulkInsert API — full-block range, partial last block, argument
    validation (always releases the input buffer), disabled-cache no-op,
    existing-entry preservation + skip counting, invalidatePath drives the
    parent buffer's refCnt back to 0 (no leak), non-zero aligned offsets,
    repeated insert.
  • RemoteRandomAccessReaderBulkPrefetchTest (11 tests): end-to-end
    against an in-process RemoteFileServer + multipart file spanning 3 chunks
    — supplier implements the mixin; full prefetch leaves the cache hot with
    zero loader invocations; partial prefetch only caches the requested chunks;
    disabled cache short-circuits; missing path surfaces as IOException;
    repeated prefetch counts skips; invalidatePath releases every
    prefetched block.
  • Issue619BulkWarmupTest (7 tests): dispatch contract from
    PersistentVectorStore.bulkPrefetchSegmentIntoCache — exactly-once call
    with (0, budget) for matching suppliers, no call for plain ReaderSupplier,
    IOException and zero/negative budget swallowed, segment state not
    mutated.

Test plan

  • mvn -pl herddb-remote-file-service -Dtest='SegmentBlockCacheBulkInsertTest,RemoteRandomAccessReaderBulkPrefetchTest,SegmentBlockCacheTest,SegmentBlockCacheAsyncTest,SegmentBlockCacheHammerTest' test — 57/57 ✅
  • mvn -pl herddb-indexing-service -Dtest='Issue619BulkWarmupTest,Issue569WarmupBeforePublishTest,BlockCacheWarmupTest' test — 19/19 ✅
  • LazyValueCacheConcurrentUpdatesSuite{NoIndexes,WithNonUniqueIndexes,WithUniqueIndexes}Test — 12/12 ✅
  • mvn -B spotless:check apache-rat:check spotbugs:check install -DskipTests -Pci — BUILD SUCCESS ✅
  • CI

🤖 Implemented by the pr-worker agent.

eolivelli and others added 3 commits May 20, 2026 20:22
….bulkInsert

The pre-publish warmup BFS in PersistentVectorStore.warmUpNewSegmentsBeforePublish
issues one ~16 KiB readFileRange RPC per HNSW Layer-0 node up to the configured
warmup budget — ~2000 round-trips per 32 MiB-budgeted segment, dominating
checkpoint Phase C-prep and compaction publish latency when the indexing
service runs against the remote file server / S3 / GCS.

This change adds:
- BulkPrefetchReaderSupplier, a ReaderSupplier mixin that fetches a contiguous
  prefix of the segment file in multipart-chunk-sized async reads (one per
  underlying multipart chunk, in parallel) and bulk-inserts every covered
  cache-block into SegmentBlockCache;
- SegmentBlockCache.bulkInsert(path, baseOffset, blockSize, ByteBuf), which
  splits the prefetched buffer into block-sized retained slices keyed exactly
  like getBlock and never overwrites a single-flighted entry, with refcount
  discipline verified by SegmentBlockCacheBulkInsertTest;
- bulkPrefetchSegmentIntoCache in PersistentVectorStore, invoked from
  warmUpSegment before the BFS so that the BFS runs entirely against a hot
  cache (no wire I/O), preserving the per-segment "warmedUp" accounting and
  the pin BFS pass for the frontier region; the prefetch is best-effort — an
  IOException is swallowed and the original BFS path still runs.

When the segment's onDiskReaderSupplier is not a remote/multipart-backed one
(local-file, in-memory) the prefetch is skipped and behaviour is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes the BLOCK and REQUEST_CHANGES items raised by the pr-reviewer on PR #625:

- (BLOCK) SegmentBlockCache.bulkInsert no longer stores retainedSlice(...) of
  the multipart-chunk-sized parent buffer; instead it allocates a fresh pooled
  direct ByteBuf per block (matching the cache's blockSize weigher exactly) and
  copies the bytes across before releasing the parent. This eliminates the
  off-heap memory amplification where a single hot 16 KiB block could pin a
  4 MiB multipart-chunk pool buffer alive, making weightedSize under-account by
  up to 256x. Per-block copies cost one memcpy each, negligible vs. the network
  read that produced the chunk.
- (R2) SegmentBlockCache.pinBlock now checks the main eviction region BEFORE
  invoking the loader; on hit it allocates a fresh pooled buffer for the
  frontier and copies the bytes from the main entry. The warmup pin-BFS pass
  that follows bulk-prefetch therefore promotes blocks main -> frontier with
  zero additional wire reads.
- (R3) PersistentVectorStore.bulkPrefetchSegmentIntoCache now returns the
  inserted-byte count. warmUpSegment uses it to skip warmUpSegmentBfs when the
  prefetch fully covers Math.min(budget, fileSize). A new counter
  warmedSegmentsBulkPrefetchOnly tracks the share of warmups served by the
  fast path; getWarmedSegmentsBulkPrefetchOnly() exposes it for tests and
  Prometheus.
- (R4) SegmentBlockCacheBulkInsertTest gains concurrentBulkInsertGetBlockAndInvalidate:
  8 threads race bulkInsert / getBlock / invalidatePath against the same path
  for 200 iterations; assertions on payload correctness and final weightedSize == 0.
- (R5) RemoteRandomAccessReaderBulkPrefetchTest gains
  prefetchFailsCleanlyWhenServerMissingTrailingChunk (partial availability,
  IOException + clean cache after invalidate) and the pin-after-prefetch pair
  pinModeAfterPrefetchPromotesMainCacheWithoutWireRead /
  pinModeWithoutPriorPrefetchStillLoadsFromWire that prove R2's main->frontier
  promotion path.
- (R7) BulkPrefetchReaderSupplier and PersistentVectorStore.bulkPrefetchSegmentIntoCache
  javadoc the "first N bytes ~= HNSW entry-frontier" layout assumption.
- (R8) RemoteRandomAccessReader.Supplier.bulkPrefetchIntoCache javadoc the
  "must not be called from a netty event-loop thread" constraint.
- Style nits: rename the per-block inserted[] flag to didInsert[], expand
  in-line comments on the per-block-copy and main-cache-promotion semantics,
  test assertions updated to reflect the new eager-release ownership contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace `assertNull(cache.estimatedSize() > 0 ? "non-empty" : null)` with
`assertEquals(0L, cache.estimatedSize())` for readability. The semantic is
identical: both assert that the cache has no entries after invalidatePath.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@eolivelli eolivelli merged commit 7c2f7ca into master May 20, 2026
19 of 22 checks passed
@eolivelli eolivelli deleted the issue-619-speed-up-segment-warm-up branch May 20, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed up segment warm up

1 participant