Add synonym expansion to component lexical search#2425
Open
Mbeaulne wants to merge 1 commit into
Open
Conversation
8 tasks
Collaborator
Author
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
8 tasks
🎩 PreviewA preview build has been created at: |
This was referenced Jun 18, 2026
Mbeaulne
commented
Jun 18, 2026
Mbeaulne
commented
Jun 18, 2026
Mbeaulne
commented
Jun 18, 2026
Mbeaulne
commented
Jun 18, 2026
2655160 to
dce82a1
Compare
dce82a1 to
f5a29c0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
Adds a synonym expansion system to the component lexical search so that queries using common aliases resolve to the intended components. For example, searching
gcsnow surfaces storage-related components,fitsurfaces training components,infersurfaces prediction components, anddfsurfaces dataframe/table components.A new
componentSearchSynonyms.tsmodule defines synonym groups (e.g.gcs ↔ storage ↔ bucket,train ↔ fit,predict ↔ infer,df ↔ dataframe ↔ table) and exposesexpandSynonymTokens, which fans out any recognized token into all members of its group.The search pipeline was also refactored to separate base tokenization (
baseSearchTokens) from the full normalized text used for document indexing. Synonym expansion is applied to query tokens before scoring, and the phrase-match bonus now uses the original (pre-expansion) token sequence so multi-word phrase matching remains accurate.Related Issue and Pull requests
Type of Change
Checklist
Screenshots (if applicable)
Test Instructions
gcsand confirm storage/bucket components appear at the top.fitand confirm model training components appear.inferand confirm prediction components appear.dfand confirm dataframe/table components appear.train test split) still correctly ranks exact name matches above partial matches.Additional Comments
The synonym groups are intentionally domain-neutral and kept in a single flat list in
componentSearchSynonyms.tsto make it easy to extend with additional aliases in the future. THIS IS NOT AN EXHAUSTIVE LIST