grep: support POSIX equivalence classes by wondr-wclabs · Pull Request #39 · uutils/grep

wondr-wclabs · 2026-06-04T22:21:48Z

Fixes #35.

This adds a small BRE/ERE pre-compile normalization step for POSIX equivalence-class entries inside bracket expressions. Because uu_grep does not currently implement locale collation, the implementation only normalizes single-character [=c=] entries to the equivalent literal character, which matches the C-locale behavior described in the issue. Multi-character collating elements are left unchanged rather than guessing at locale-specific behavior.

The parser handles POSIX character-class and collating-symbol tokens inside bracket expressions so patterns such as [[:alpha:]][[=1=]] keep their existing meaning, and fixed-string mode is intentionally left literal.

codspeed-hq · 2026-06-05T05:16:13Z

Merging this PR will not alter performance

✅ 10 untouched benchmarks
⏩ 17 skipped benchmarks¹

_{Comparing wondr-wclabs:codex/posix-equivalence-classes (c1d4187) with main (d28bf76)}

17 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

lhecker

I'm personally willing to accept this as a broken feature until we have an alternative to Oniguruma. The equivalence classes are somewhat niche. Not sure how others view it...

wondr-wclabs · 2026-06-05T16:18:22Z

That is a fair concern. This patch is deliberately narrow: it only normalizes single-character [=c=] entries to the C-locale literal form, and it does not attempt locale collation or multi-character collating elements. That means it fixes the concrete C-locale behavior from #35, but it still leaves the larger POSIX equivalence-class semantics unimplemented.

Given the feedback on #55, I agree the main tradeoff is not whether this small parser can pass the local case, but whether grep should grow local pre-compile normalizers around Oniguruma at all. If the project direction is to avoid those until there is a broader regex-engine strategy, then this PR is probably not worth merging even though the current checks were green before main moved.

My view is: this is acceptable only if maintainers want the narrow C-locale compatibility improvement now. If the preferred direction is “leave equivalence classes broken until the engine story changes,” I can close this PR rather than rebase and keep adding review noise.

lhecker · 2026-06-05T16:34:11Z

the engine story

I'll close this PR for now then. If other maintainers consider this a priority to fix, let's reopen the PR!

grep: support POSIX equivalence classes

c1d4187

wondr-wclabs force-pushed the codex/posix-equivalence-classes branch from 910bbd7 to c1d4187 Compare June 5, 2026 08:42

lhecker requested changes Jun 5, 2026

View reviewed changes

lhecker closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grep: support POSIX equivalence classes#39

grep: support POSIX equivalence classes#39
wondr-wclabs wants to merge 1 commit into
uutils:mainfrom
wondr-wclabs:codex/posix-equivalence-classes

wondr-wclabs commented Jun 4, 2026

Uh oh!

codspeed-hq Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

lhecker left a comment

Uh oh!

wondr-wclabs commented Jun 5, 2026

Uh oh!

lhecker commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wondr-wclabs commented Jun 4, 2026

Uh oh!

codspeed-hq Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

lhecker left a comment

Choose a reason for hiding this comment

Uh oh!

wondr-wclabs commented Jun 5, 2026

Uh oh!

lhecker commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq Bot commented Jun 5, 2026 •

edited

Loading