Skip to content

suggestion: enable phrase by target length instead of wordcount #7

@bigpick

Description

@bigpick

info

e.g to match things on such cases like https://contest-2023.korelogic.com/password-info.html

Tokenize the resulting short lines into shorter lines or phrases, typically by grabbing N characters and then moving forward until a word boundary is reached. N was ~12+ for English, lower for multibyte languages (because each character might be 2-3+ bytes long).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions