fix(regex): stop false-rejecting valid char classes & bounded quantifiers (semver/joi/winston)#5277
Open
proggeramlug wants to merge 1 commit into
Open
fix(regex): stop false-rejecting valid char classes & bounded quantifiers (semver/joi/winston)#5277proggeramlug wants to merge 1 commit into
proggeramlug wants to merge 1 commit into
Conversation
…iers (semver/joi/winston)
Three independent false-rejections in JS->Rust regex translation made
valid patterns throw "Invalid regular expression: ...: invalid pattern":
1. Compiled-size cap (semver): the regex crate caps a compiled program at
10 MiB and rejects larger ones as CompiledTooBig. semver's ReDoS-hardened
safeRe rewrites (\d{1,256}, [...]{0,250}, ...) exceed that. JS has no such
limit, so raise the budget to 64 MiB for both the regex crate and the
fancy-regex delegate (build_std_regex / build_fancy_regex helpers).
2. Class hyphen adjacent to a shorthand (joi): inside a class, a '-' next to
a \d/\w/\s shorthand or \p{...} property is a LITERAL hyphen in JS (a
shorthand can't bound a range), but the regex crate reads it as a range and
errors with ClassRangeLiteral. Escape such hyphens to \- during translation.
3. Trivial char classes / ANSI patterns (winston via @colors/colors): covered
by the above plus a regression test for [0m] and escapeStringRegexp output.
Adds unit regression tests for all three. No version/CHANGELOG/Cargo.lock edits
(maintainer folds version + changelog at merge).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Three independent false-rejections in perry's JS→Rust regex translation (
crates/perry-runtime/src/regex*) made valid JS regex literals throwSyntaxError: Invalid regular expression: …: invalid pattern. Surfaced by the npm-corpus native-compile sweep — the top remaining blast-radius pattern (3 packages: semver, joi, winston). NOT a categorical engine gap (perry has bothregexandfancy-regex), just translation/cap bugs.Roots & fixes
regexcrate caps a compiled program at 10 MiB (CompiledTooBig);fancy-regexdelegates to the same backend. semver's ReDoS-hardenedsafeRerewrites (\d{1,256},[…]{0,250}, …) exceed it. JS has no such limit. Newbuild_std_regex/build_fancy_regexhelpers raise the budget to 64 MiB (both engines, in lockstep). Drop-in forRegex::new/fancy_regex::Regex::newat all compile + validate sites.-next to a\d/\w/\sshorthand (or\p{…}/\P{…}property) is a literal hyphen in JS — a shorthand can't bound a range — but the Rust crate reads it as a range and errors withClassRangeLiteral. Translation now escapes such hyphens to\-(with\\dliteral-backslash and escaped-backslash guards). joi's URI/dataURI validators rely on this.@colors/colors). Covered by the above; added a regression test for[0m]and@colors'sescapeStringRegexpoutput.Verification
cargo build --release -p perry -p perry-runtime -p perry-stdlib— clean.cargo test --release -p perry-runtime regex— 24 passed, 0 failed, incl. 3 new regression tests (trivial_char_class_compiles_and_matches,bounded_quantifier_in_class_not_rejected,class_hyphen_adjacent_to_shorthand_is_literal).Cannot read properties of undefined (reading 'COMPARATOR')), i.e. the regex no longer throws.@colors/colors): the exactnew RegExp(escapeStringRegexp("�[0m"), 'g')scenario compiles and matches, byte-identical tonode --experimental-strip-types. (Full winston run is gated behind the still-open fix(cjs): recognize bracket/computed-string-literal export forms (#5275) #5276 bracket-CJS fix; the regex portion is verified in isolation.)String.trimfix.Notes
[workspace.package] version/CHANGELOG.md/Cargo.lockedits (maintainer folds version + changelog at merge).