grep: support POSIX equivalence classes#39
Conversation
Merging this PR will not alter performance
Comparing Footnotes
|
910bbd7 to
c1d4187
Compare
lhecker
left a comment
There was a problem hiding this comment.
I'm personally willing to accept this as a broken feature until we have an alternative to Oniguruma. The equivalence classes are somewhat niche. Not sure how others view it...
|
That is a fair concern. This patch is deliberately narrow: it only normalizes single-character Given the feedback on #55, I agree the main tradeoff is not whether this small parser can pass the local case, but whether grep should grow local pre-compile normalizers around Oniguruma at all. If the project direction is to avoid those until there is a broader regex-engine strategy, then this PR is probably not worth merging even though the current checks were green before My view is: this is acceptable only if maintainers want the narrow C-locale compatibility improvement now. If the preferred direction is “leave equivalence classes broken until the engine story changes,” I can close this PR rather than rebase and keep adding review noise. |
I'll close this PR for now then. If other maintainers consider this a priority to fix, let's reopen the PR! |
Fixes #35.
This adds a small BRE/ERE pre-compile normalization step for POSIX equivalence-class entries inside bracket expressions. Because uu_grep does not currently implement locale collation, the implementation only normalizes single-character
[=c=]entries to the equivalent literal character, which matches the C-locale behavior described in the issue. Multi-character collating elements are left unchanged rather than guessing at locale-specific behavior.The parser handles POSIX character-class and collating-symbol tokens inside bracket expressions so patterns such as
[[:alpha:]][[=1=]]keep their existing meaning, and fixed-string mode is intentionally left literal.