Search candidate heap grows unbounded: upHeap dominates CPU over vector comparisons

## Summary

CPU profiling of a graph-index build/compaction workload shows `AbstractLongHeap.upHeap` consuming **~43% of CPU** — more than the actual vector distance computations (`squareDistance`, ~32%).

| Method | Self CPU |
|---|---|
| `AbstractLongHeap.upHeap` | **43.1%** |
| `PanamaVectorUtilSupport.squareDistancePreferred` (real vector math) | 32.3% |
| `IntHashSet.add` (visited set) | 8.5% |

Call path: `GraphIndexBuilder.addGraphNode` → `GraphSearcher.resume` → `searchLayer0` → `searchOneLayer` → neighbor callback → `NodeQueue.push` → `GrowableLongHeap.push` → `AbstractLongHeap.add` → `upHeap`. All `upHeap` samples land on the sift-up loop.

## Root cause

`GraphSearcher.candidates` is an **unbounded** `GrowableLongHeap` (MAX-heap). The neighbor callback in `searchOneLayer` pushes **every** scored neighbor onto it unconditionally — roughly `maxDegree` (~32) pushes per expanded node against a single `pop`. Within one search the heap grows to many thousands of entries (an 80 KB–800 KB `long[]`). Each `push` then performs an O(log N) sift-up that walks parent indices `i, i/2, i/4 …` — addresses scattered far apart in an array much larger than CPU cache, so the loop becomes cache-miss bound.

`NodeQueue` / `AbstractLongHeap.upHeap` themselves are fine — the sift-up is already an optimal implementation. The problem is the *size* of the heap, driven by the caller.

## Why most of those pushes are wasted work

`stopSearch` ends the loop once the result set is full (`approximateResults.size() >= rerankK`) and the best remaining candidate is worse than the worst kept result (`topCandidateScore < approximateResults.topScore()`).

A candidate pushed with `score < approximateResults.topScore()` can only reach the top of the MAX-heap *after* every better candidate — by which point `stopSearch` has already fired. Since `approximateResults.topScore()` (the worst kept result) only ever *rises*, such a candidate is provably unexpandable for the rest of the search. Pushing it is pure heap bloat that slows every subsequent `upHeap`.

This is the standard HNSW frontier-pruning step, which jvector currently omits.

## Proposed fix

Divert provably-unexpandable neighbors away from the hot `candidates` heap into the existing `evictedResults` buffer (a `NodesUnsorted` with O(1) append and no sift-up), which is already drained back into `candidates` on `resume()` / layer descent. This keeps `resume()` bit-exact while bounding the `candidates` heap to ~`rerankK + maxDegree` for non-resumed searches and graph builds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search candidate heap grows unbounded: upHeap dominates CPU over vector comparisons #10

Summary

Root cause

Why most of those pushes are wasted work

Proposed fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Method	Self CPU
`AbstractLongHeap.upHeap`	43.1%
`PanamaVectorUtilSupport.squareDistancePreferred` (real vector math)	32.3%
`IntHashSet.add` (visited set)	8.5%

Search candidate heap grows unbounded: upHeap dominates CPU over vector comparisons #10

Description

Summary

Root cause

Why most of those pushes are wasted work

Proposed fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions