Skip to content

perf: optimize byNeighborhoodSimilarity with degree pruning and allocation-free Jaccard#103

Open
sauravbhattacharya001 wants to merge 1 commit intomasterfrom
perf/compressor-jaccard-optimization
Open

perf: optimize byNeighborhoodSimilarity with degree pruning and allocation-free Jaccard#103
sauravbhattacharya001 wants to merge 1 commit intomasterfrom
perf/compressor-jaccard-optimization

Conversation

@sauravbhattacharya001
Copy link
Owner

Problem

\�yNeighborhoodSimilarity()\ in \GraphCompressor\ performs O(V²) pairwise Jaccard comparisons. Each \jaccardSimilarity()\ call allocates two new \HashSet\ objects (union + intersection), creating massive GC pressure on large graphs.

Changes

  1. Degree-based upper-bound pruning: Vertices are sorted by degree. Since Jaccard(A,B) ≤ min(|A|,|B|)/max(|A|,|B|), once a vertex's degree ratio drops below the threshold, all subsequent vertices (with equal or larger degree) can be skipped entirely via \�reak.

  2. Allocation-free Jaccard: New \jaccardFast()\ computes intersection by iterating the smaller set and checking membership in the larger one. Union = |A| + |B| - intersection. Zero intermediate \HashSet\ allocations.

  3. **Removed unused \indexMap**: Was allocated but never read.

Performance Impact

  • Best case (high threshold, varied degrees): ~O(V log V) from sorting + very few actual comparisons due to early breaks
  • Worst case (threshold=0, uniform degrees): Still O(V²) comparisons but ~2x faster per comparison (no HashSet allocation)
  • GC pressure: Reduced from 2 HashSet allocations per comparison to zero
  • Memory: Removed unused HashMap allocation

…ation-free Jaccard

- Sort vertices by degree for effective upper-bound pruning
- Break early when max possible Jaccard < threshold (sorted order guarantee)
- Replace jaccardSimilarity's 2x HashSet allocation per call with
  jaccardFast that iterates smaller set and counts in larger (zero alloc)
- Remove unused indexMap
- Reduces O(V^2 * avg_degree) to O(V * effective_candidates * min_degree)
  with significant constant-factor improvement from avoiding GC pressure
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@github-actions github-actions bot added visualization Graph visualization and UI size/m labels Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m visualization Graph visualization and UI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant