Normalize component search tokens for better matching by Mbeaulne · Pull Request #2424 · TangleML/tangle-ui

Mbeaulne · 2026-06-18T16:56:34Z

Description

Improves the component search index by normalizing indexed and query text beyond simple lowercasing. Specifically:

Identifier splitting: snake_case, kebab-case, and camelCase component names are split into individual words before indexing, so a query like "train model" matches a component named train-model or train_model, and "load csv file" matches loadCSVFile.
Lightweight stemming: A stemToken function reduces common English inflections (plurals via -s/-ies, gerunds via -ing, past tense via -ed, sibilant plurals) to their base forms. Both the original token and its stem are stored in the index, so queries like "training", "datasets", or "batch" match components described with "train", "dataset", or "batches".
Normalized query tokenization: The same normalizeSearchText pipeline is applied to query text before scoring, ensuring query tokens and indexed tokens are in the same form.

Related Issue and Pull requests

Type of Change

Checklist

I have tested this does not break current pipelines / runs functionality
I have tested the changes on staging

Screenshots (if applicable)

Test Instructions

Run the existing test suite (componentSearchIndex.test.ts) to verify the new normalization cases pass:
- Snake/kebab/camelCase names matched by space-separated queries.
- Stemmed query terms (training, datasets, batch) matching indexed descriptions.
Manually search for components using inflected or hyphenated terms in the UI and confirm relevant results surface.

Additional Comments

The stemmer is intentionally minimal — it handles the most common English suffixes without introducing a full NLP dependency. Both the raw token and its stem are stored so that exact matches are never lost.

github-actions · 2026-06-18T16:56:44Z

🎩 Preview

A preview build has been created at: 06-18-normalize_component_search_tokens_for_better_matching/0494d71

Mbeaulne · 2026-06-18T16:56:50Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

- splitIdentifierText: anchor first capital group to a single char to remove O(n²) regex backtracking on long uppercase runs (behavior-preserving) - stemToken: guard -is/-us endings so status/analysis/axis aren't over-stemmed Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Normalize component search tokens for better matching

e7b76a8

Mbeaulne mentioned this pull request Jun 18, 2026

Expand component search indexing fields #2423

Open

8 tasks

Mbeaulne mentioned this pull request Jun 18, 2026

Add synonym expansion to component lexical search #2425

Open

8 tasks

Mbeaulne marked this pull request as ready for review June 18, 2026 17:06

Mbeaulne requested a review from a team as a code owner June 18, 2026 17:06

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize component search tokens for better matching#2424

Normalize component search tokens for better matching#2424
Mbeaulne wants to merge 2 commits into
06-18-expand_component_search_indexing_fieldsfrom
06-18-normalize_component_search_tokens_for_better_matching

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mbeaulne commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue and Pull requests

Type of Change

Checklist

Screenshots (if applicable)

Test Instructions

Additional Comments

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎩 Preview

Uh oh!

Mbeaulne commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mbeaulne commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Mbeaulne commented Jun 18, 2026 •

edited

Loading