regex: accept literal [ in bracket expressions#439
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #439 +/- ##
==========================================
+ Coverage 82.18% 82.50% +0.32%
==========================================
Files 13 13
Lines 5551 5660 +109
Branches 312 320 +8
==========================================
+ Hits 4562 4670 +108
- Misses 986 987 +1
Partials 3 3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
a5d4b13 to
cdd35dd
Compare
Merging this PR will degrade performance by 2.07%
Warning Please fix the performance issues or acknowledge them on CodSpeed. Performance Changes
Tip Investigate this regression by commenting Comparing |
| result | ||
| } | ||
|
|
||
| fn escape_literal_open_brackets_in_classes(pattern: &str) -> String { |
There was a problem hiding this comment.
please add a rustdoc comment here :)
POSIX allows '[' to represent itself inside a bracket expression
unless it starts a character class, collating symbol, or equivalence
class construct. The parser was consuming the following character
after a literal '[', which made '[[]' look unterminated, and the Rust
regex backend also rejects unescaped literal '[' in classes.
Reproduce the compatibility gap by comparing these commands:
printf 'x\n' | ./target/debug/sed -E 's/[[]/X/'
printf 'x\n' | gsed -E 's/[[]/X/'
printf 'x\n' | ./target/debug/sed -E 's/[^[]/X/'
printf 'x\n' | gsed -E 's/[^[]/X/'
Before this change, uutils sed rejected the scripts as unterminated or
invalid regexes while GNU sed parsed them cleanly. After this change
the outputs match GNU sed: x, X, x, and X for the reported cases.
Leave the following character available for normal class parsing, then
escape literal '[' characters inside parsed classes before compiling
with the regex backend. Add parser, compiler, and command-level
regressions for the failing sed -E substitution cases.
Observed this while trying to install Python 3.14.5 with Pyenv and
uutils `sed` on the PATH, which failed.
cdd35dd to
8979d08
Compare
| result | ||
| } | ||
|
|
||
| /// Escape literal `[` characters that appear inside a bracket expression. |
There was a problem hiding this comment.
Well it is a bit long now :/
0d140be to
5dced76
Compare
POSIX allows '[' to represent itself inside a bracket expression unless it starts a character class, collating symbol, or equivalence class construct. The parser was consuming the following character after a literal '[', which made '[[]' look unterminated, and the Rust regex backend also rejects unescaped literal '[' in classes.
Reproduce the compatibility gap by comparing these commands:
Before this change, uutils sed rejected the scripts as unterminated or invalid regexes while GNU sed parsed them cleanly. After this change the outputs match GNU sed: x, X, x, and X for the reported cases.
Leave the following character available for normal class parsing, then escape literal '[' characters inside parsed classes before compiling with the regex backend. Add parser, compiler, and command-level regressions for the failing sed -E substitution cases.
Observed this while trying to install Python 3.14.5 with Pyenv and uutils
sedon the PATH, which failed.