Skip to content

Search candidate heap grows unbounded: upHeap dominates CPU over vector comparisons #10

@eolivelli

Description

@eolivelli

Summary

CPU profiling of a graph-index build/compaction workload shows AbstractLongHeap.upHeap consuming ~43% of CPU — more than the actual vector distance computations (squareDistance, ~32%).

Method Self CPU
AbstractLongHeap.upHeap 43.1%
PanamaVectorUtilSupport.squareDistancePreferred (real vector math) 32.3%
IntHashSet.add (visited set) 8.5%

Call path: GraphIndexBuilder.addGraphNodeGraphSearcher.resumesearchLayer0searchOneLayer → neighbor callback → NodeQueue.pushGrowableLongHeap.pushAbstractLongHeap.addupHeap. All upHeap samples land on the sift-up loop.

Root cause

GraphSearcher.candidates is an unbounded GrowableLongHeap (MAX-heap). The neighbor callback in searchOneLayer pushes every scored neighbor onto it unconditionally — roughly maxDegree (~32) pushes per expanded node against a single pop. Within one search the heap grows to many thousands of entries (an 80 KB–800 KB long[]). Each push then performs an O(log N) sift-up that walks parent indices i, i/2, i/4 … — addresses scattered far apart in an array much larger than CPU cache, so the loop becomes cache-miss bound.

NodeQueue / AbstractLongHeap.upHeap themselves are fine — the sift-up is already an optimal implementation. The problem is the size of the heap, driven by the caller.

Why most of those pushes are wasted work

stopSearch ends the loop once the result set is full (approximateResults.size() >= rerankK) and the best remaining candidate is worse than the worst kept result (topCandidateScore < approximateResults.topScore()).

A candidate pushed with score < approximateResults.topScore() can only reach the top of the MAX-heap after every better candidate — by which point stopSearch has already fired. Since approximateResults.topScore() (the worst kept result) only ever rises, such a candidate is provably unexpandable for the rest of the search. Pushing it is pure heap bloat that slows every subsequent upHeap.

This is the standard HNSW frontier-pruning step, which jvector currently omits.

Proposed fix

Divert provably-unexpandable neighbors away from the hot candidates heap into the existing evictedResults buffer (a NodesUnsorted with O(1) append and no sift-up), which is already drained back into candidates on resume() / layer descent. This keeps resume() bit-exact while bounding the candidates heap to ~rerankK + maxDegree for non-resumed searches and graph builds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions