Skip to content

regex: accept literal [ in bracket expressions#439

Open
kevinburke wants to merge 2 commits into
uutils:mainfrom
kevinburke:fix-literal-open-bracket-class
Open

regex: accept literal [ in bracket expressions#439
kevinburke wants to merge 2 commits into
uutils:mainfrom
kevinburke:fix-literal-open-bracket-class

Conversation

@kevinburke
Copy link
Copy Markdown

POSIX allows '[' to represent itself inside a bracket expression unless it starts a character class, collating symbol, or equivalence class construct. The parser was consuming the following character after a literal '[', which made '[[]' look unterminated, and the Rust regex backend also rejects unescaped literal '[' in classes.

Reproduce the compatibility gap by comparing these commands:

printf 'x\n' | ./target/debug/sed -E 's/[[]/X/'

printf 'x\n' | gsed -E 's/[[]/X/'

printf 'x\n' | ./target/debug/sed -E 's/[^[]/X/'

printf 'x\n' | gsed -E 's/[^[]/X/'

Before this change, uutils sed rejected the scripts as unterminated or invalid regexes while GNU sed parsed them cleanly. After this change the outputs match GNU sed: x, X, x, and X for the reported cases.

Leave the following character available for normal class parsing, then escape literal '[' characters inside parsed classes before compiling with the regex backend. Add parser, compiler, and command-level regressions for the failing sed -E substitution cases.

Observed this while trying to install Python 3.14.5 with Pyenv and uutils sed on the PATH, which failed.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 23, 2026

Codecov Report

❌ Patch coverage is 98.24561% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.50%. Comparing base (a9a332a) to head (5dced76).

Files with missing lines Patch % Lines
src/sed/compiler.rs 97.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #439      +/-   ##
==========================================
+ Coverage   82.18%   82.50%   +0.32%     
==========================================
  Files          13       13              
  Lines        5551     5660     +109     
  Branches      312      320       +8     
==========================================
+ Hits         4562     4670     +108     
- Misses        986      987       +1     
  Partials        3        3              
Flag Coverage Δ
macos_latest 83.19% <98.24%> (+0.32%) ⬆️
ubuntu_latest 83.28% <98.24%> (+0.31%) ⬆️
windows_latest 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevinburke kevinburke force-pushed the fix-literal-open-bracket-class branch from a5d4b13 to cdd35dd Compare May 23, 2026 05:19
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 23, 2026

Merging this PR will degrade performance by 2.07%

❌ 1 regressed benchmark
✅ 10 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
number_fix 505.9 ms 516.6 ms -2.07%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing kevinburke:fix-literal-open-bracket-class (5dced76) with main (a9a332a)

Open in CodSpeed

Comment thread src/sed/compiler.rs
result
}

fn escape_literal_open_brackets_in_classes(pattern: &str) -> String {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a rustdoc comment here :)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

POSIX allows '[' to represent itself inside a bracket expression
unless it starts a character class, collating symbol, or equivalence
class construct. The parser was consuming the following character
after a literal '[', which made '[[]' look unterminated, and the Rust
regex backend also rejects unescaped literal '[' in classes.

Reproduce the compatibility gap by comparing these commands:

    printf 'x\n' | ./target/debug/sed -E 's/[[]/X/'

    printf 'x\n' | gsed -E 's/[[]/X/'

    printf 'x\n' | ./target/debug/sed -E 's/[^[]/X/'

    printf 'x\n' | gsed -E 's/[^[]/X/'

Before this change, uutils sed rejected the scripts as unterminated or
invalid regexes while GNU sed parsed them cleanly. After this change
the outputs match GNU sed: x, X, x, and X for the reported cases.

Leave the following character available for normal class parsing, then
escape literal '[' characters inside parsed classes before compiling
with the regex backend. Add parser, compiler, and command-level
regressions for the failing sed -E substitution cases.

Observed this while trying to install Python 3.14.5 with Pyenv and
uutils `sed` on the PATH, which failed.
@kevinburke kevinburke force-pushed the fix-literal-open-bracket-class branch from cdd35dd to 8979d08 Compare June 5, 2026 18:57
Comment thread src/sed/compiler.rs
result
}

/// Escape literal `[` characters that appear inside a bracket expression.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it is a bit long now :/

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, ok.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this

@kevinburke kevinburke force-pushed the fix-literal-open-bracket-class branch from 0d140be to 5dced76 Compare June 5, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants