Skip to content

issue #223: pre-size jvector GraphIndexBuilder base layer#224

Merged
eolivelli merged 1 commit into
masterfrom
issue-223
Apr 22, 2026
Merged

issue #223: pre-size jvector GraphIndexBuilder base layer#224
eolivelli merged 1 commit into
masterfrom
issue-223

Conversation

@eolivelli
Copy link
Copy Markdown
Owner

Summary

  • Adopt the new initialCapacity hint on GraphIndexBuilder introduced by jvector branch reduce-denseintmap-lock-contention (commit 87e3bfff), which rewrites DenseIntMap as a lock-free spine-of-segments. A herddb lock-profile showed ~92% of lock-wait time inside that map during concurrent graph build.
  • PersistentVectorStore.createEmptyLiveShard — pass cap = computeEffectiveMaxLiveGraphSize() as the hint. This is the same bound already used to pre-size the two ConcurrentHashMaps next to the builder.
  • PersistentVectorStore.writeFusedPQGraphToTempFile — pass totalVectors = allNodeToPk.size(), the exact node count about to be inserted in the compaction/merge path.
  • CI (ci.yml + kubernetes-tests.yml) now checks out the new jvector branch so the 11-arg constructor resolves at compile time. Artifact version (4.0.0-rc.9-herddb-SNAPSHOT) is unchanged, so no pom bump is required.

Closes #223.

Test plan

  • mvn -B checkstyle:check apache-rat:check spotbugs:check install -DskipTests -Pci (green locally)
  • CI (ci.yml + kubernetes-tests.yml) runs against the new jvector branch
  • DirectMultipleConcurrentUpdatesSuite{NoIndexes,WithNonUniqueIndexes,WithUniqueIndexes}Test (hammer gate for index/checkpoint/concurrency changes)
  • Vector indexing smoke on k3s-local / GKE confirms no lock-profile regression

🤖 Generated with Claude Code

A lock profile on the indexing path showed ~92% of lock-wait time inside
jvector DenseIntMap during concurrent graph build, driven by the resize
lock fired every time the backing array had to grow.

Upstream jvector (branch reduce-denseintmap-lock-contention,
commit 87e3bfff) rewrites DenseIntMap as a lock-free spine-of-segments
and adds an initialCapacity hint on GraphIndexBuilder that pre-allocates
the base-layer map so the hot insert phase never touches the spine lock.

This commit adopts the new API at the two places where herddb builds a
graph with a known node count:

- createEmptyLiveShard: pass cap (= computeEffectiveMaxLiveGraphSize),
  which is the same bound already used to pre-size the ConcurrentHashMaps
  living next to the builder.
- writeFusedPQGraphToTempFile: pass totalVectors, the exact number of
  nodes about to be inserted in the compaction/merge path.

CI (ci.yml and kubernetes-tests.yml) now checks out the
reduce-denseintmap-lock-contention branch of eolivelli/jvector so the
new constructor resolves at compile time. The jvector artifact version
(4.0.0-rc.9-herddb-SNAPSHOT) is unchanged, so herddb-core/pom.xml does
not need a bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eolivelli eolivelli merged commit 481dd36 into master Apr 22, 2026
5 of 7 checks passed
@eolivelli eolivelli deleted the issue-223 branch April 22, 2026 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Jvector to get benefits from DenseIntMap improvements

1 participant