Prune unexpandable candidates from the search frontier#11
Merged
Conversation
The candidates queue in GraphSearcher is an unbounded GrowableLongHeap. The neighbor callback in searchOneLayer pushed every scored neighbor onto it unconditionally (~maxDegree pushes per expanded node, one pop), so the heap grew to many thousands of entries within a single search. Each push then sift-ups over a long[] far larger than CPU cache, making AbstractLongHeap.upHeap cache-miss bound. Profiling a graph-build workload showed upHeap at ~43% of CPU, more than the vector comparisons themselves. A candidate scoring below the worst kept result can never be expanded: stopSearch() terminates the loop before it reaches the top of the queue, and approximateResults.topScore() only rises. Such candidates are pure heap bloat (the HNSW frontier-pruning step, previously omitted). Divert these candidates to the existing evictedResults buffer (a NodesUnsorted with O(1) append and no sift-up) instead of the hot candidates heap. evictedResults is already drained back into candidates at the start of searchLayer0 and setEntryPointsFromPreviousLayer, so resume() and layer descent stay bit-exact with no new fields. Applied to both the sync searchOneLayer and the async searchOneLayerAsync paths. Closes #10 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #10
Problem
CPU profiling of a graph-index build/compaction workload showed
AbstractLongHeap.upHeapat ~43% of CPU — more than the actual vector distance computations (~32%).GraphSearcher.candidatesis an unboundedGrowableLongHeap. The neighbor callback insearchOneLayerpushed every scored neighbor onto it (~maxDegreepushes per expanded node, onepop), so within a single search the heap grew to thousands of entries. Eachpushthen sift-ups over along[]far larger than CPU cache, makingupHeapcache-miss bound.Most of those pushes are wasted: a candidate scoring below the worst kept result can never be expanded —
stopSearch()terminates the loop before it reaches the top of the queue, andapproximateResults.topScore()only rises. This is the standard HNSW frontier-pruning step, previously omitted.Change
In both the sync
searchOneLayerand the asyncsearchOneLayerAsyncneighbor handlers, a neighbor withscore < approximateResults.topScore()(when the result set is full) is diverted to the existingevictedResultsbuffer instead of the hotcandidatesheap.evictedResultsis aNodesUnsorted(O(1) append, no sift-up) and is already drained back intocandidatesat the start ofsearchLayer0andsetEntryPointsFromPreviousLayer. So:candidatesstays bounded at ~rerankK + maxDegree, fits in cache,upHeapcollapses. Zero recall change.resume(): the next call drainsevictedResults(including pruned candidates) back intocandidates, so the exact same candidate set is reconsidered — bit-exact recall, no new fields or plumbing.A strict
<is used to matchstopSearch's strict comparison: a candidate scoring exactlytopScore()is still expandable and stays incandidates.Testing
TestVectorGraph(15, recall) andGraphIndexBuilderTest(6, build): all pass.SearchAllocationProfileTest: passes.TestAsyncPipelineSearchcurrently fails onparallel-fusedpq-iowith a pre-existingNoSuchMethodError(a stale multi-release jar shadows the branch's new async methods on the test classpath); confirmed to fail identically with this change reverted, so it is unrelated to this PR.Re-profiling the same workload should show
upHeapdrop from ~43% to single digits, leavingsquareDistanceas the dominant cost.🤖 Generated with Claude Code