Skip to content

fix(search): filter stopwords from FTS5 queries and proximity reranking#271

Merged
mksglu merged 2 commits intomksglu:nextfrom
sebastianbreguel:fix/stopword-filter-search-queries
Apr 14, 2026
Merged

fix(search): filter stopwords from FTS5 queries and proximity reranking#271
mksglu merged 2 commits intomksglu:nextfrom
sebastianbreguel:fix/stopword-filter-search-queries

Conversation

@sebastianbreguel
Copy link
Copy Markdown
Contributor

Summary

  • Domain-specific stopwords (update, test, fix, run, using, etc.) were defined in STOPWORDS but only used for vocabulary extraction — search queries passed them through to FTS5 unfiltered, diluting BM25 ranking
  • Now filtered in sanitizeQuery(), sanitizeTrigramQuery(), and #applyProximityReranking() — meaningful terms drive ranking while stopwords are dropped
  • Falls back to unfiltered terms when ALL query words are stopwords (no empty results)

Example

Query "fix database connection" — before, "fix" got equal BM25 weight despite appearing in most chunks. Now only "database" and "connection" drive the ranking.

Test plan

  • Stopwords filtered from porter search — meaningful terms drive ranking
  • All-stopword query falls back to unfiltered (no empty results)
  • Stopwords filtered from trigram search
  • Proximity reranking excludes stopwords from boost calculation
  • Full test suite — 0 regressions (5 pre-existing failures in hooks/integration unrelated)

github-actions bot and others added 2 commits April 14, 2026 00:50
Domain-specific stopwords (update, test, fix, run, using, etc.) were defined
in STOPWORDS but only used for vocabulary extraction. Search queries passed
them through to FTS5, diluting BM25 ranking — e.g. "fix database connection"
gave equal weight to "fix" which appears everywhere.

Now filtered in sanitizeQuery, sanitizeTrigramQuery, and proximity reranking.
Falls back to unfiltered terms when ALL query words are stopwords.
@mksglu mksglu changed the base branch from main to next April 14, 2026 01:05
@mksglu mksglu merged commit 707c4ae into mksglu:next Apr 14, 2026
6 of 8 checks passed
@mksglu
Copy link
Copy Markdown
Owner

mksglu commented Apr 14, 2026

@sebastianbreguel I hope you'd tested on manually also. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants