grep: honor whitespace escapes in BRE#55
Conversation
|
GNU grep testsuite comparison: |
Merging this PR will not alter performance
Comparing Footnotes
|
|
I pushed a follow-up for the GNU comparison failure ( The updated version expands BRE I verified the specific GNU test locally ( |
|
GNU grep testsuite comparison: |
lhecker
left a comment
There was a problem hiding this comment.
I personally strongly prefer the broken \S behavior but using SYNTAX_OPERATOR_ESC_S_WHITE_SPACE for a shorter solution.
...there's is no way to get Oniguruma to work like GNU grep's regex engine in the first place. We shouldn't try filling tiny gaps with big workarounds while still leaving gaping holes.
|
@lhecker That direction makes sense. I was optimizing for the I changed the branch back to the direct Validation after the change:
|
825ff67 to
f319232
Compare
|
One follow-up after the rebase/CI run: the direct I looked for an appropriate way to classify that as an accepted engine mismatch, but the only ignore mechanism in this repo is So I think there are three real options:
Given your feedback, I am leaning toward option 3 unless maintainers want either an explicit known-incompatibility mechanism or the narrower workaround despite the extra code. |
|
GNU grep testsuite comparison: |
|
Closing this for now rather than leaving a known-red PR open. After switching back to the direct The only two ways I see to make this mergeable are either the POSIX-class rewrite workaround, which you reasonably pushed back on because it adds parser code for a small compatibility gap, or a broader project-level mechanism for accepted deterministic GNU incompatibilities. Without one of those, keeping this open would not be useful. I will leave BRE |
Fixes #31.
GNU grep accepts
\sand\Sas whitespace shorthands in basic regular expressions. Extended mode already recognizes those spellings throughSyntax::gnu_regex(), but the default BRE path treated them as literals/SbecauseSyntax::grep()does not enable that GNU extension by default.This now enables Oniguruma's
SYNTAX_OPERATOR_ESC_S_WHITE_SPACEfor BRE. That keeps the implementation small and uses the regex engine's own syntax support instead of rewriting patterns before compilation.One tradeoff is intentionally left in place: Oniguruma's direct
\Sbehavior can differ from GNU grep on lone invalid UTF-8 bytes. I had previously worked around that by expanding\s/\Sinto POSIX classes, but that was a disproportionately large local workaround while other GNU/Oniguruma semantic gaps still remain. This version fixes the common valid-input shorthand behavior and BRE repetition support without adding a custom mini parser.Validation:
cargo fmt --all -- --checkcargo test bre_whitespace_shorthandscargo testcargo clippy --all-targets --workspace -puu_grep -- -D warningsgit diff --check