issue #587: cap input segments per vector-index compaction cycle by eolivelli · Pull Request #589 · eolivelli/herddb

eolivelli · 2026-05-18T13:25:40Z

Fixes #587.

A vector-index compaction cycle that selected 53 input segments stalled the Indexing Service for 2+ hours. The downstream PQ-retraining step (jvector PQRetrainer.extractVectorsSequential) samples training vectors per input segment with random remote-storage reads, so its I/O cost scales with the number of input segments. VectorIndexCompactor.chooseSegmentsToMerge bounded the picked set only by a byte cap, which never bites when many segments are individually small — leaving the input count effectively unbounded.

This PR is the HerdDB-side mitigation. The complementary jvector-side fix (parallelizing the per-source extraction so the per-read latency is hidden) is in eolivelli/jvector#12. The two changes are independent — HerdDB CI builds jvector from eolivelli/jvector main and is unaffected by the jvector PR's merge state — but full latency relief needs both.

Changes

VectorIndexCompactor — new 7-arg chooseSegmentsToMerge overload with a maxInputs parameter (6-arg overload delegates with the cap disabled). After the fire/no-fire trigger decision, the normal byte-capped selection is truncated smallest-first to at most maxInputs segments, with an INFO log when truncation happens. The micro-segment fast path ([IS compaction] Prioritize merging micro-segments to relieve back-pressure faster #570) is deliberately exempt — those cycles must stay fast slot-reclaiming merges and the PQ-retraining-I/O concern does not apply to them. Added clampMaxInputs (<=0 disables, 1→2) and computeTieredMaxInputs.
PersistentVectorStore — DEFAULT_VECTOR_INDEX_COMPACTION_MAX_INPUTS = 16, a vectorIndexCompactionMaxInputs field, setCompactionMaxInputs/getCompactionMaxInputs. The base cap is tier-scaled (2×/4×/8× at 100/300/500 segments) per cycle alongside the byte/count caps, so the per-cycle drain rate rises with the backlog and the cap cannot starve the tailer toward the back-pressure threshold. The cycle still fires on the same triggers and merges leftover segments in subsequent cycles.
IndexingServerConfiguration / IndexingServiceEngine — new vector.index.compaction.maxInputs config key (default 16), wired into the store and the startup config log.

Tests

VectorIndexCompactorChooseTest — new cases: a 53-segment pick is truncated to the 16 smallest in order; maxInputs=0 disables the cap; the cap never changes the fire/no-fire trigger decision; a picked set within the cap is returned untruncated; the micro-segment fast-path result is not capped; clampMaxInputs normalisation.
Issue587CompactionInputCapTest — new end-to-end test: builds a 50-segment backlog with the cap enabled at its default, drives multiple compaction cycles, and asserts every cycle merges at most the cap and the segment count strictly converges (no starvation).
Issue354TieredCompactionTest — new computeTieredMaxInputs unit tests (scaling, overflow, disabled-cap); the two end-to-end tiered tests disable the orthogonal input cap so their "drain the whole backlog in one cycle" premise still holds.
Pre-PR validation green: spotless:check apache-rat:check install -DskipTests spotbugs:check -Pci (the exact CI gate).
Hammer suite green (twice): DirectMultipleConcurrentUpdatesSuite{NoIndexes,WithNonUniqueIndexes,WithUniqueIndexes}Test, BLinkConcurrentSearchInsertTest.

🤖 Generated with Claude Code

A compaction cycle that selected 53 input segments stalled for 2+ hours because the downstream PQ-retraining step samples training vectors per input segment with random remote-storage reads, so its I/O cost scales with the input count. chooseSegmentsToMerge bounded the picked set only by a byte cap, which never bites when many segments are individually small — leaving the input count effectively unbounded. Add a hard, non-tier-scaled cap (vector.index.compaction.maxInputs, default 16) on the number of input segments a single cycle may merge. The cycle still fires on the same triggers; it merges at most maxInputs segments (smallest-first) and the rest are picked up by later cycles, bounding worst-case per-cycle retraining I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address pr-reviewer findings on #589: - Tier-scale the maxInputs cap (2x/4x/8x) alongside maxBytes/maxCount via new computeTieredMaxInputs, so the per-cycle drain rate rises with the backlog and a flat cap can never starve the tailer toward the back-pressure threshold. - Do not apply the cap to the micro-segment fast path (#570): those cycles must stay fast slot-reclaiming merges and the PQ-retraining I/O concern does not apply to them. - Add Issue587CompactionInputCapTest: an end-to-end multi-cycle drain test with the cap enabled, asserting every cycle merges at most the cap and the backlog strictly converges. - Add computeTieredMaxInputs unit tests and a micro-path no-cap / under-cap no-op test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The wait loop exited on getOnDiskSegmentCount()==1 && adoptions>=1, but onSegmentReleased calls store.dropSegmentByUuid() (decrementing the on-disk count) before drops.incrementAndGet(). On a slow CI runner the loop could observe count==1 after the 3rd drop's dropSegmentByUuid() but before its counter increment, exit early, and fail the drops>=3 assertion. Add drops.get() < 3 to the wait condition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

eolivelli and others added 3 commits May 18, 2026 15:25

eolivelli merged commit fdc0a10 into master May 18, 2026
0 of 2 checks passed

eolivelli deleted the issue-587-k3s-bench-pqretrainerextractvectorssequen branch May 18, 2026 14:22

eolivelli mentioned this pull request May 18, 2026

[k3s-bench] ByteBuf leak in OnDiskGraphIndexCompactor.CompactVamanaDiversityProvider.retainDiverse() causes IS apply queue stall #590

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue #587: cap input segments per vector-index compaction cycle#589

issue #587: cap input segments per vector-index compaction cycle#589
eolivelli merged 3 commits into
masterfrom
issue-587-k3s-bench-pqretrainerextractvectorssequen

eolivelli commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eolivelli commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eolivelli commented May 18, 2026 •

edited

Loading