Skip to content

Unicode String version 2.1.0

Latest

Choose a tag to compare

@kipcole9 kipcole9 released this 30 Apr 21:56

Bug Fixes

  • Improve line break segmentation conformance and compatibility with ICU.

Enhancements

  • Replaces the regex-based segmentation engine with a single-pass DFA evaluator. Sentence break on a 4 KB unbroken sentence drops from ~9,200 ms to ~11 ms (~840×); word break on a 4 KB sentence from ~7,000 ms to ~12 ms (~580×); scaling is now linear in input length instead of O(N²).