Skip to content

Add fuzzy/typo-tolerant matching for component name and I/O fields#2427

Open
Mbeaulne wants to merge 1 commit into
06-18-improve_component_search_scoring_relevancefrom
06-18-add_safe_typo_tolerance_for_names_and_io_fields
Open

Add fuzzy/typo-tolerant matching for component name and I/O fields#2427
Mbeaulne wants to merge 1 commit into
06-18-improve_component_search_scoring_relevancefrom
06-18-add_safe_typo_tolerance_for_names_and_io_fields

Conversation

@Mbeaulne

@Mbeaulne Mbeaulne commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Description

Adds typo tolerance to the lexical search functionality for component names and input/output fields. When a query token is 4–6 characters, a single-character edit distance is allowed; for tokens 7+ characters, up to two edits are permitted. Fuzzy matches receive a slightly lower score than exact matches via a dedicated FUZZY_MATCH_BONUS_MULTIPLIER. Typo tolerance is intentionally restricted to name and io fields — descriptions and implementation text do not benefit from fuzzy matching to avoid noisy results.

The implementation uses a standard dynamic programming Levenshtein distance algorithm with an early-exit optimisation that abandons the computation once the running row minimum exceeds the allowed distance.

Related Issue and Pull requests

Type of Change

  • Bug fix
  • New feature
  • Improvement
  • Cleanup/Refactor
  • Breaking change
  • Documentation update

Checklist

  • I have tested this does not break current pipelines / runs functionality
  • I have tested the changes on staging

Screenshots (if applicable)

Test Instructions

  1. Run the existing test suite — two new test cases cover the expected behaviour:
    • Confirm that queries like filtr (for filter_rows) and datset (for dataset) return the correct component.
    • Confirm that typo queries against description or implementation text (e.g. xgbost) return no results.

Additional Comments

Fuzzy matching is skipped entirely when the computed max edit distance is 0 (tokens shorter than 4 characters), keeping short-token searches fast and precise.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

🎩 Preview

A preview build has been created at: 06-18-add_safe_typo_tolerance_for_names_and_io_fields/fc80727

@Mbeaulne Mbeaulne changed the title Add safe typo tolerance for names and IO fields. Add fuzzy/typo-tolerant matching for component name and I/O fields Jun 18, 2026
@Mbeaulne Mbeaulne marked this pull request as ready for review June 18, 2026 17:35
@Mbeaulne Mbeaulne requested a review from a team as a code owner June 18, 2026 17:35
Comment thread src/services/componentSearchIndex.ts Outdated
Comment thread src/services/componentSearchIndex.ts Outdated
@Mbeaulne Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch from bbd53a7 to 36032c1 Compare June 18, 2026 19:12
@Mbeaulne Mbeaulne force-pushed the 06-18-add_safe_typo_tolerance_for_names_and_io_fields branch 2 times, most recently from 0a7d588 to e379e64 Compare June 18, 2026 20:28
@Mbeaulne Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch from 36032c1 to d8e31f8 Compare June 18, 2026 20:28
@Mbeaulne Mbeaulne force-pushed the 06-18-add_safe_typo_tolerance_for_names_and_io_fields branch from e379e64 to 89029f0 Compare June 18, 2026 20:49
@Mbeaulne Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch from d8e31f8 to d4d0a60 Compare June 18, 2026 20:49
@Mbeaulne Mbeaulne force-pushed the 06-18-add_safe_typo_tolerance_for_names_and_io_fields branch from 89029f0 to fc80727 Compare June 18, 2026 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant