Skip to content

Support POSIX bracket classes in regex char classes#1552

Merged
matz merged 1 commit into
matz:masterfrom
ryanseys:rs-regex-posix-class
Jun 24, 2026
Merged

Support POSIX bracket classes in regex char classes#1552
matz merged 1 commit into
matz:masterfrom
ryanseys:rs-regex-posix-class

Conversation

@ryanseys

Copy link
Copy Markdown
Contributor

POSIX bracket classes inside a character class — [[:alpha:]], [[:digit:]], and the rest of the family — previously matched nothing. The bracket reader had no recognition of the [:name:] syntax, so [:alpha:] was parsed as a char class containing the literal characters :, a, l, p, h, t, which never matches alphabetic input.

The bracket-class reader in lib/regexp/re_compile.c now recognizes [::] and expands the named class to its ASCII ranges using the same range/bit helpers that back \d and \w (no parallel mechanism). Supported names: alpha, digit, alnum, space, upper, lower, punct, blank, cntrl, xdigit, print, graph, word. Enclosing negation such as [^[:alpha:]] is handled by the existing RE_NCLASS emit, identical to how the engine already negates classes. ASCII semantics, matching CRuby for ASCII input; all 13 names were verified byte-for-byte against ruby. Unicode \p{L} properties remain out of scope.

A new test/regex_posix_class.rb (.expected regenerated from ruby) covers alpha/digit/alnum/space/upper/lower, a negated class, combined literal+POSIX brackets, both scan and a plain match, and a non-constant subject routed through a method parameter so the runtime engine is exercised rather than constant-folded. The full suite is green under -Werror.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements support for POSIX bracket classes (e.g., [:alpha:], [:digit:]) within character classes in the regex engine, accompanied by new test cases. The review feedback identifies two key improvements: raising a compile error for invalid POSIX class names instead of falling back to literal parsing to avoid silent failures, and adding support for the [:ascii:] class to improve compatibility with Ruby's regex engine.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread lib/regexp/re_compile.c Outdated
Comment thread lib/regexp/re_compile.c
Recognize [:name:] POSIX classes inside bracket expressions and expand
them to their ASCII ranges via the existing class-add helpers. Supports
alpha, digit, alnum, space, upper, lower, punct, blank, cntrl, xdigit,
print, graph, word, and ascii. The in-class negated form [:^name:] adds
the ASCII complement plus utf8_any, mirroring the \D/\W/\S shorthands.
Enclosing negation ([^[:alpha:]]) is applied by the existing RE_NCLASS
emit. An unrecognized POSIX class name raises a RegexpError ("invalid
POSIX bracket type"), matching CRuby, instead of silently matching
nothing. Previously [:alpha:] was parsed as the literal set {:,a,l,p,h,t},
so the class never matched alphabetic input.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ryanseys ryanseys force-pushed the rs-regex-posix-class branch from 4c7468a to e0af587 Compare June 24, 2026 05:34
@matz matz merged commit b8040c2 into matz:master Jun 24, 2026
3 checks passed
@ryanseys ryanseys deleted the rs-regex-posix-class branch June 24, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants