Regex nfa dfa benchmarks#527
Merged
Merged
Conversation
Previously patterns and events drew emojis from the same pool but independently, so events with random emoji pairs rarely matched any of the random pattern pairs (especially at patterns=32/64), tripping the per-iteration b.Fatalf assertion. Track the (e1, e2) pairs used to build patterns; sample events from that same set so every event matches at least one pattern. The benchmark still measures NFA traversal cost on dense multi-byte UTF-8 input — the only thing that changed is correctness of the match-presence sanity check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds stable baseline benchmarks for representative match-time workloads:
- ExactString: 1 exact pattern
- SingleShellstyle: 1 wildcard pattern
- ManyOverlappingWildcards: N=8..128 overlapping wildcards
- RegexAlternation: 20 regex patterns with alternation
- LiteralInRegex: literal substring inside regex
- QuantifiedCharClass: regex with {n,m} quantifier
- ManyAnchoredRegex: 200 anchored regex patterns
- DeepEpsilonNest: regex with nested alternation/quantifiers
- CacheThrashing: adversarial input over wide state space
- ParallelMatchers: 8..64 goroutines via Copy()
Each warms the matcher with ~100 iterations before resetting the timer
so first-call laziness does not pollute steady-state measurements.
These are intended as stable workload baselines: subsequent matcher
optimization work can be evaluated by re-running these benchmarks
unchanged and comparing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
timbray
approved these changes
May 30, 2026
| for _, sp := range simplePatterns { | ||
| b.Run(sp.name, func(b *testing.B) { | ||
| q, _ := New() | ||
| pattern := fmt.Sprintf(`{"val": [{"shellstyle": %q}]}`, sp.shellstyle) |
Owner
There was a problem hiding this comment.
Not a blocker but let's use wildcard rather than shellstyle in the future.
Contributor
Author
There was a problem hiding this comment.
Sorry to be an epic nitpicker, but there is a problem here. "shellstyle" is the description of the overall pattern, while "wildcard" is the specific production. I don't care how this distinction is resolved, but it's there.
Owner
|
On May 29, 2026 at 6:26:02 PM, RS ***@***.***> wrote:
Sorry to be an epic nitpicker, but there is a problem here. "shellstyle"
is the description of the overall pattern, while "wildcard" is the specific
production. I don't care how this distinction is resolved, but it's there.
I yield to no-one in the pickiness of my nits. What happened was, I
implemented shellstyle but stupidly forgot to put in escaping for *, but
then couldn’t change the API, so wildcard is just shellstyle with escaping
for * and then of course \. -T
—
… Reply to this email directly, view it on GitHub
<#527?email_source=notifications&email_token=AAAEJE4Z622EDBR6OD33OGT45I2CVA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMZZGM4TMOJSGQ3KM4TFMFZW63VMON2GC5DFL5RWQYLOM5S2KZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#discussion_r3327755226>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEJE4TBG74P5YOTKM32V345I2CVAVCNFSM6AAAAACZDCNCFSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DGOJTHE3DSMRUGY>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Contributor
Author
|
I am surprised by your strong opinion here, although I do not disagree with it. I'll try to knock out something to cover this issue tomorrow morning. Happy to do it your way, it's just confusing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
These are tests that subsume #492.