-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Is it possible to customize the stop words used, so I can provide a different list other than the default one or disable stop words?
Context: I'm setting up Haystack for the search in https://rocketvalidator.com/html-validation - currently it just uses a simple search by substring but I want to use Haystack instead. So far it's going great!
During the integration, I found that the results were not as expected in many searches, and it looks like it was due because most of the titles include characters like double quotes:
So when I searched for something containing double quotes, these guides would appear first as they scored higher because they have many double quotes.
I guess this could be solved by adding the double quotes (and other characters like parenthesis, brackets, < and >, etc.) to the stop words. My workaround was to clean up the strings, both during the load and the search:
defp cleanup(str) do
str
|> String.replace(["“", "”", "<", ">", "(", ")", ".", ",", ";", ":"], "")
|> String.trim()
endAfter that, I found that a search for must not appear like this https://rocketvalidator.com/html-validation?search=must+not+appear provided no results using Haystack, and that's because these are all stop words.
Finally, nor non-English content it would be great to be able to customize the stop words.