NFA => DFA optimization

It turns out that traversing NFAs, which are produced by any instance that has wildcard or regexp patterns, is a lot slower than with DFAs.  More than I'd like, I'm going to have to re-write the performance section of README.md. However, the finite-automaton theory says that any NFA can be turned into a DFA, and I just finished debugging and testing the `nfa2Dfa()` function.  It makes a huge difference, tests where the instance has a significant number of NFA valueMatchers, converting everything to a DFA makes things run at least twice as fast and occasionally much, much faster.  But, we can't just go ahead and blast all the NFAs into DFAs, because:

1. It slows down the `addPattern()` call quite a bit.
2. The state machines occupy a lot more memory.
3. In some cases, the slowdown is O(2^N), ouch. An example is in `TestShellStyleBuildTime`, where if you put 1000 patterns in, it basically never comes back.

So… there are things we could do.

### Heuristics

Put on a hard-wired time limit, for example one second, and if any NFA=>DFA conversion runs longer than that, give up and let it be an NFA. This means that users don't have to think about it and will usually get about the best result.

### Controls

Have a new `Option` on the `New()` function to specify one of three modes: Never optimize, Always optimize, or Optimize-with-limit, where the caller provides the time threshold and Quamina gives up on the DFA conversion if it runs longer than that.  You could imagine an even finer-grained control where you could specify this per-Field but we could do that later.

This is good for people who need flexibility i.e. they're doing a lot of `AddPattern()` calls and want low latency on that. Or on the other end of the scale, are willing to amortize a one-time build cost for a big NFA to buy themselves minimal latency down the pipe.

### Stats

We could add a call to report the current stats of the state machine, i.e. the data that is now reported by `matcherStats()`, you can see the text output quite a bit in the unit-test outputs. This would give callers the option of doing a test run and finding out how much DFA conversions are going to bloat the memory consumption of a Quamina instance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NFA => DFA optimization #481

Heuristics

Controls

Stats

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

NFA => DFA optimization #481

Description

Heuristics

Controls

Stats

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions