Conversation
A lock profile on the indexing path showed ~92% of lock-wait time inside jvector DenseIntMap during concurrent graph build, driven by the resize lock fired every time the backing array had to grow. Upstream jvector (branch reduce-denseintmap-lock-contention, commit 87e3bfff) rewrites DenseIntMap as a lock-free spine-of-segments and adds an initialCapacity hint on GraphIndexBuilder that pre-allocates the base-layer map so the hot insert phase never touches the spine lock. This commit adopts the new API at the two places where herddb builds a graph with a known node count: - createEmptyLiveShard: pass cap (= computeEffectiveMaxLiveGraphSize), which is the same bound already used to pre-size the ConcurrentHashMaps living next to the builder. - writeFusedPQGraphToTempFile: pass totalVectors, the exact number of nodes about to be inserted in the compaction/merge path. CI (ci.yml and kubernetes-tests.yml) now checks out the reduce-denseintmap-lock-contention branch of eolivelli/jvector so the new constructor resolves at compile time. The jvector artifact version (4.0.0-rc.9-herddb-SNAPSHOT) is unchanged, so herddb-core/pom.xml does not need a bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
initialCapacityhint onGraphIndexBuilderintroduced by jvector branchreduce-denseintmap-lock-contention(commit87e3bfff), which rewritesDenseIntMapas a lock-free spine-of-segments. A herddb lock-profile showed ~92% of lock-wait time inside that map during concurrent graph build.PersistentVectorStore.createEmptyLiveShard— passcap = computeEffectiveMaxLiveGraphSize()as the hint. This is the same bound already used to pre-size the twoConcurrentHashMaps next to the builder.PersistentVectorStore.writeFusedPQGraphToTempFile— passtotalVectors = allNodeToPk.size(), the exact node count about to be inserted in the compaction/merge path.ci.yml+kubernetes-tests.yml) now checks out the new jvector branch so the 11-arg constructor resolves at compile time. Artifact version (4.0.0-rc.9-herddb-SNAPSHOT) is unchanged, so no pom bump is required.Closes #223.
Test plan
mvn -B checkstyle:check apache-rat:check spotbugs:check install -DskipTests -Pci(green locally)ci.yml+kubernetes-tests.yml) runs against the new jvector branchDirectMultipleConcurrentUpdatesSuite{NoIndexes,WithNonUniqueIndexes,WithUniqueIndexes}Test(hammer gate for index/checkpoint/concurrency changes)🤖 Generated with Claude Code