Example:
> WordSmith.squish(String.duplicate(<<194,160>>, 3))
" "
This would be easily fixable by adding u modifier to the regex used in replace here, however, investigating this I noticed that the implementation uses String.replace for shorter strings (< 150 bytes) and recursive pattern matching for larger strings. This is problematic, because the pattern matching does not cover the entire range of characters matched by the POSIX [:space:] character class. Thus, squish will behave inconsistently depeding on the input length. An example:
iex(4)> String.duplicate("\f", 100) |> WordSmith.squish
"\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f\f"
iex(5)> String.duplicate("\f", 200) |> WordSmith.squish
""
A possible solution would be to also match all characters matched by [:space:] (including those matched when using the unicode modifier u) in the pattern matching parts (maybe using macros).
In either case, this is a caveat that might be worth mentioning in the readme.
Thanks,
Mihai
Example:
This would be easily fixable by adding
umodifier to the regex used inreplacehere, however, investigating this I noticed that the implementation usesString.replacefor shorter strings (< 150 bytes) and recursive pattern matching for larger strings. This is problematic, because the pattern matching does not cover the entire range of characters matched by the POSIX[:space:]character class. Thus,squishwill behave inconsistently depeding on the input length. An example:A possible solution would be to also match all characters matched by
[:space:](including those matched when using the unicode modifieru) in the pattern matching parts (maybe using macros).In either case, this is a caveat that might be worth mentioning in the readme.
Thanks,
Mihai