Skip to content

Squish does not work on unicode whitespace #1

@mtarnovan

Description

@mtarnovan

Example:

> WordSmith.squish(String.duplicate(<<194,160>>, 3))
"   "

This would be easily fixable by adding u modifier to the regex used in replace here, however, investigating this I noticed that the implementation uses String.replace for shorter strings (< 150 bytes) and recursive pattern matching for larger strings. This is problematic, because the pattern matching does not cover the entire range of characters matched by the POSIX [:space:] character class. Thus, squish will behave inconsistently depeding on the input length. An example:

iex(4)> String.duplicate("\f", 100) |> WordSmith.squish
"\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f"
iex(5)> String.duplicate("\f", 200) |> WordSmith.squish
""

A possible solution would be to also match all characters matched by [:space:] (including those matched when using the unicode modifier u) in the pattern matching parts (maybe using macros).

In either case, this is a caveat that might be worth mentioning in the readme.

Thanks,
Mihai

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions