Improve component search scoring relevance by Mbeaulne · Pull Request #2426 · TangleML/tangle-ui

Mbeaulne · 2026-06-18T17:09:24Z

Description

Improves the lexical search scoring model with three enhancements:

Prefix match boost: Partial query terms (e.g. classif) now rank components where the term is a prefix of a token higher than components where it appears only as a mid-string substring.
IDF-style rare token weighting: Query tokens that match fewer components are weighted more heavily than common tokens, preventing high-frequency terms from dominating scores. For example, searching train xgboost will surface components mentioning xgboost above generic train matches.
All-query-tokens bonus: When a component matches every token in the query (across any fields), it receives an additional score bonus, ensuring more complete matches rank above partial ones.

The phrase match bonus previously applied only to the name field has been extended to all search fields using per-field bonus weights (FIELD_PHRASE_BONUS).

The tokenize function has been refactored to extract a reusable uniqueTokens helper, and a new requiredQueryTokens function produces stemmed, deduplicated tokens from the raw query without synonym expansion, used for phrase and completeness checks.

Related Issue and Pull requests

Type of Change

Checklist

I have tested this does not break current pipelines / runs functionality
I have tested the changes on staging

Test Instructions

Three new unit tests cover the added behaviors:

Search classif — verify classify_rows ranks above a component with classif as a non-prefix substring.
Search train xgboost — verify the component with the rare token xgboost ranks first.
Search train model — verify the component matching both tokens across fields ranks above one matching only train.

Run the test suite with:

npx jest componentSearchIndex

Additional Comments

Token weights are computed per-query using a smoothed inverse document frequency: 1 + log((N+1) / (df+1)), where N is the index size and df is the number of entries containing the token.

github-actions · 2026-06-18T17:09:40Z

🎩 Preview

A preview build has been created at: 06-18-improve_component_search_scoring_relevance/d4d0a60

Mbeaulne · 2026-06-18T17:09:45Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

This was referenced Jun 18, 2026

Expand component search indexing fields #2423

Open

Normalize component search tokens for better matching #2424

Open

Mbeaulne mentioned this pull request Jun 18, 2026

Add synonym expansion to component lexical search #2425

Open

8 tasks

Mbeaulne marked this pull request as ready for review June 18, 2026 17:10

Mbeaulne requested a review from a team as a code owner June 18, 2026 17:10

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts Outdated

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.test.ts Outdated

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.test.ts Outdated

Mbeaulne force-pushed the 06-18-add_synonym_groups branch from 2655160 to dce82a1 Compare June 18, 2026 19:12

Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch from bbd53a7 to 36032c1 Compare June 18, 2026 19:12

Mbeaulne force-pushed the 06-18-add_synonym_groups branch from dce82a1 to f5a29c0 Compare June 18, 2026 20:28

Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch from 36032c1 to d8e31f8 Compare June 18, 2026 20:28

Improve component search scoring relevance

d4d0a60

Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch from d8e31f8 to d4d0a60 Compare June 18, 2026 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve component search scoring relevance#2426

Improve component search scoring relevance#2426
Mbeaulne wants to merge 1 commit into
06-18-add_synonym_groupsfrom
06-18-improve_component_search_scoring_relevance

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mbeaulne commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue and Pull requests

Type of Change

Checklist

Test Instructions

Additional Comments

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎩 Preview

Uh oh!

Mbeaulne commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mbeaulne commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Mbeaulne commented Jun 18, 2026 •

edited

Loading