suggestion: enable phrase by target length instead of wordcount

# info 

e.g to match things on such cases like https://contest-2023.korelogic.com/password-info.html 

>  Tokenize the resulting short lines into shorter lines or phrases, typically by grabbing N characters and then moving forward until a word boundary is reached. N was ~12+ for English, lower for multibyte languages (because each character might be 2-3+ bytes long).