Skip to content

Prune unexpandable candidates from the search frontier#11

Merged
eolivelli merged 1 commit into
parallel-fusedpq-iofrom
prune-search-frontier
May 16, 2026
Merged

Prune unexpandable candidates from the search frontier#11
eolivelli merged 1 commit into
parallel-fusedpq-iofrom
prune-search-frontier

Conversation

@eolivelli
Copy link
Copy Markdown
Owner

Closes #10

Problem

CPU profiling of a graph-index build/compaction workload showed AbstractLongHeap.upHeap at ~43% of CPU — more than the actual vector distance computations (~32%).

GraphSearcher.candidates is an unbounded GrowableLongHeap. The neighbor callback in searchOneLayer pushed every scored neighbor onto it (~maxDegree pushes per expanded node, one pop), so within a single search the heap grew to thousands of entries. Each push then sift-ups over a long[] far larger than CPU cache, making upHeap cache-miss bound.

Most of those pushes are wasted: a candidate scoring below the worst kept result can never be expanded — stopSearch() terminates the loop before it reaches the top of the queue, and approximateResults.topScore() only rises. This is the standard HNSW frontier-pruning step, previously omitted.

Change

In both the sync searchOneLayer and the async searchOneLayerAsync neighbor handlers, a neighbor with score < approximateResults.topScore() (when the result set is full) is diverted to the existing evictedResults buffer instead of the hot candidates heap.

evictedResults is a NodesUnsorted (O(1) append, no sift-up) and is already drained back into candidates at the start of searchLayer0 and setEntryPointsFromPreviousLayer. So:

  • Build / non-resumed search: candidates stays bounded at ~rerankK + maxDegree, fits in cache, upHeap collapses. Zero recall change.
  • resume(): the next call drains evictedResults (including pruned candidates) back into candidates, so the exact same candidate set is reconsidered — bit-exact recall, no new fields or plumbing.

A strict < is used to match stopSearch's strict comparison: a candidate scoring exactly topScore() is still expandable and stays in candidates.

Testing

  • TestVectorGraph (15, recall) and GraphIndexBuilderTest (6, build): all pass.
  • SearchAllocationProfileTest: passes.
  • TestAsyncPipelineSearch currently fails on parallel-fusedpq-io with a pre-existing NoSuchMethodError (a stale multi-release jar shadows the branch's new async methods on the test classpath); confirmed to fail identically with this change reverted, so it is unrelated to this PR.

Re-profiling the same workload should show upHeap drop from ~43% to single digits, leaving squareDistance as the dominant cost.

🤖 Generated with Claude Code

The candidates queue in GraphSearcher is an unbounded GrowableLongHeap.
The neighbor callback in searchOneLayer pushed every scored neighbor onto
it unconditionally (~maxDegree pushes per expanded node, one pop), so the
heap grew to many thousands of entries within a single search. Each push
then sift-ups over a long[] far larger than CPU cache, making
AbstractLongHeap.upHeap cache-miss bound. Profiling a graph-build workload
showed upHeap at ~43% of CPU, more than the vector comparisons themselves.

A candidate scoring below the worst kept result can never be expanded:
stopSearch() terminates the loop before it reaches the top of the queue,
and approximateResults.topScore() only rises. Such candidates are pure
heap bloat (the HNSW frontier-pruning step, previously omitted).

Divert these candidates to the existing evictedResults buffer (a
NodesUnsorted with O(1) append and no sift-up) instead of the hot
candidates heap. evictedResults is already drained back into candidates
at the start of searchLayer0 and setEntryPointsFromPreviousLayer, so
resume() and layer descent stay bit-exact with no new fields. Applied to
both the sync searchOneLayer and the async searchOneLayerAsync paths.

Closes #10

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eolivelli eolivelli merged commit a287053 into parallel-fusedpq-io May 16, 2026
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant