Skip to content

perf(sorted_set): rewrite set algebra ops with split/join tree recursion#3333

Open
bobzhang wants to merge 2 commits intomainfrom
perf/sorted-set-algebra-ops
Open

perf(sorted_set): rewrite set algebra ops with split/join tree recursion#3333
bobzhang wants to merge 2 commits intomainfrom
perf/sorted-set-algebra-ops

Conversation

@bobzhang
Copy link
Copy Markdown
Contributor

@bobzhang bobzhang commented Mar 23, 2026

Summary

  • Rewrite intersection, difference, and symmetric_difference using split/join tree recursion instead of naive contains+add loops, improving performance from O(mlog(n)) to O(mlog(n/m+1)) for balanced inputs.
  • Fix union to track size through recursion instead of performing an O(n) recount at the end.
  • Add size verification tests for all set algebra operations (union, intersection, difference, symmetric_difference) to ensure the cached size field stays consistent.
  • This addresses existing TODO comments in the code that noted the previous implementations were suboptimal.

Test plan

  • moon test -p moonbitlang/core/sorted_set — all 54 tests pass, including new size verification tests.

🤖 Generated with Claude Code


Open with Devin

@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Mar 23, 2026

Pull Request Test Coverage Report for Build 3147

Details

  • 63 of 71 (88.73%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.04%) to 95.703%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sorted_set/set.mbt 63 71 88.73%
Totals Coverage Status
Change from base Build 3138: -0.04%
Covered Lines: 13874
Relevant Lines: 14497

💛 - Coveralls

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

bobzhang and others added 2 commits March 23, 2026 17:18
Rewrite intersection, difference, and symmetric_difference to use
the split/join approach instead of naive contains+add loops.

Before: intersection/difference were O(n * log(m)) where n is the
size of one set and m is the other, using contains() + add() per
element which each do O(log m) work including tree rebalancing.

After: All three operations use split_member + join/join2 tree
recursion, achieving O(n * log(m/n + 1)) which is optimal for
set operations on balanced trees.

Also fix union to track size through recursion instead of doing
a full O(n) tree traversal to recount after construction.

Helper functions added:
- split_member: like split but also reports if pivot was found
- join2: join two trees where all left < all right (no pivot)
- split_min: extract minimum element from a tree
- tree_count: count nodes in a tree

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tests that verify the .length() of results from intersection,
difference, union, and symmetric_difference are correctly tracked
through the split/join recursion. Also add tests for empty set
edge cases in intersection and difference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bobzhang bobzhang force-pushed the perf/sorted-set-algebra-ops branch from d88aca3 to 21fbe71 Compare March 23, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants