info
e.g to match things on such cases like https://contest-2023.korelogic.com/password-info.html
Tokenize the resulting short lines into shorter lines or phrases, typically by grabbing N characters and then moving forward until a word boundary is reached. N was ~12+ for English, lower for multibyte languages (because each character might be 2-3+ bytes long).
info
e.g to match things on such cases like https://contest-2023.korelogic.com/password-info.html