perf(ram): lazy SymSpell and emoji catalog, bounded emoji frequency map#682
Merged
Conversation
The English SymSpell index was built eagerly at startup (an in-memory delete-variant index measured in the tens of MB) for a corrector that is only consulted once the typo gate finds a misspelling, and whose designed NSSpellChecker fallback already covers the load window. The emoji catalog (252KB JSON decoded into a few MB of indexed lowercased strings) was decoded at app construction for a picker most sessions never open. Both now build on first use. The emoji frequency map also grew one entry per unique emoji forever; it is now bounded with trimming that keeps recents and heavy hitters, which cannot change match results because frequency only breaks ties within a relevance tier.
…he exact post-trim count
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three idle-footprint cuts, none of which change steady-state behavior:
NSSpellCheckerwhile an index builds; the only eager part was the launch-time preload. A built index is an[Int: [Int32]]delete-variant map over ~83-100k words (order tens of MB resident), paid by every session for a feature consulted only when the typo gate actually finds a misspelling. The first consultation now triggers the same background build throughcachedIndexOrRequestLoad. Cost of staying cold: the first correction or two after launch rank via the system spell checker instead of corpus frequency, exactly the designed fallback.:capture.EmojiPickerControllernow takes amatcherProviderand builds the matcher (bundle JSON decode + lowercased index, a few MB) on the first match request instead of at app construction. One-time ~tens-of-ms build on first picker use, user-paced.EmojiUsageStore.frequencygrew one entry per unique emoji forever. It now trims above 300 entries down to 200, preserving current recents and (by construction, since trimming removes the rarest first) heavy hitters. Frequency only breaks ties within a relevance tier, so trimming cannot change which emoji match a query.Validation
New test:
test_frequencyIsBoundedAndKeepsHeavyHittersfloods the store with 320 unique aliases and asserts the bound plus survivor counts.Linked issues
Refs #661
Risk / rollout notes
🤖 Generated with Claude Code
Greptile Summary
This PR reduces idle memory footprint across three subsystems without changing steady-state behavior: SymSpell index load is deferred until the first correction, the emoji catalog is built on the first picker interaction, and the emoji frequency map is capped at 300 entries (trimming to 200 on overflow).
nilpreload and amatcherProviderclosure, respectively. The SymSpell fallback (NSSpellChecker) covers the cold path; the emoji catalog builds on first:capture, which is user-paced.trimFrequencyIfNeeded()fires above 300 entries and cuts to 200, protecting the current recents set and usingeffectiveTarget = max(frequencyTrimTarget, keepAlways.count)to stay correct ifrecentsCapis ever raised abovefrequencyTrimTarget.Confidence Score: 5/5
Safe to merge — all three deferred-load changes preserve existing runtime behavior, previous review concerns about under-trim and loose test assertions are both resolved.
The lazy SymSpell path is a deliberate, documented tradeoff with a safe fallback (NSSpellChecker). The emoji matcher lazy-init is correct and main-actor-isolated. The trim logic is sound: overflow ≤ removable.count is guaranteed by the effectiveTarget = max(frequencyTrimTarget, keepAlways.count) construction, and the new test uses an exact count assertion that would catch both a missing trim and an under-trim.
No files require special attention.
Important Files Changed
Reviews (2): Last reviewed commit: "review: clamp the frequency trim floor t..." | Re-trigger Greptile