kaizen: Embed smalltable by sayrer · Pull Request #535 · timbray/quamina

sayrer · 2026-05-31T19:25:49Z

This one carries #521 further, fixes some bugs that caused buildshellstyle quadratic behavior, and also speeds up Nfa2dfa.

Adds tableShareKey, a stable identifier for the share group of slice-headers inside a smallTable. After smallTable is embedded into faState by value, two states that hold copies of one source smallTable still share the underlying steps/ceilings/epsilons backing arrays — and a key built from SliceData(steps) + len(steps) captures that equivalence. This replaces *smallTable-pointer-identity as the dedup key in the next commit. No behavioral change yet; this commit only adds the helper and unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switches closureBuffers.tables from map[*smallTable]*tableMark to map[tableShareKey]*tableMark. With *smallTable pointer identity still intact (pre-embed), the new key is equivalent: two states that share a *smallTable trivially share their steps backing. No behavioral change. Sets up the next commit, which embeds smallTable in faState by value — at which point pointer-identity goes away but slice-backing identity remains.

faState.table changes from *smallTable to smallTable (inline). This shrinks per-state memory: - faState 64B + smallTable 80B-class = 144B per state pair - embedded faState fits 128B size class = 128B per state - saves 16B/state on workloads without table-pointer sharing vmFields.startTable becomes startState *faState (the start state owns the start table inline). traverseNFA, epsilonClosure, and related helpers take *faState instead of *smallTable. Epsilon-closure dedup continues to work via tableShareKey (added in the previous two commits) — value-copies of one source smallTable still share their slice backings, and SliceData(steps) is the new identity. Size-assertion tests are not yet recalibrated; that follows in the next commit.

Per-state size shrunk (faState now 128B with inline smallTable). Update hand-calibrated constants in TestMcBasicSizes, TestQuaminaMemoryCost, TestMcNfaSizes. Update TestPP s/t numbers and restore trailing whitespace in wanted literal (producer still emits "[s/t X/Y] \n" with the trailing space before the newline). TestTablePointerDedup's tableSharing and totalEntries expectations also needed updating — the dedup metric now uses slice-backing identity (via tableShareKey from earlier commits) rather than *smallTable pointer identity, so value-copies of a source smallTable register as shared. Recalibrated to the new ground-truth values. Minor cleanups: stale "startTable" reference in TestQuaminaMemoryCostSingleton comment (renamed to startState); inline a dead local in copyShellNode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Port of the build-context-extract fix (6947edf) to the embedded-smallTable design. embed-smalltable inherited the same two regressions from the shared 32dc2a9 ancestor; shellstyle build at 1000 words was ~7x slower than main. 1. closureForNfa's walk dedup was broken. The refactor collapsed two independent counters into a single bufs.gen that closureForState mutates, so the walk's visited check never matched after the first state and the heavily-shared shellstyle graph got re-traversed (O(V*E)). Restored via bufs.walkGen, a snapshot closureForState never touches. 2. Per-call allocation. epsilonClosure allocated fresh maps per call and tableMarkOf heap-allocated a *tableMark per share group. closureBuffers are now pooled (sync.Pool, GC-reclaimable so no steady-state cost), maps are reused via a monotonic generation (no clearing), and tableMark is stored by value. Unlike the *smallTable-keyed bce branch, the walk here dedups by *faState identity rather than tableShareKey. Share-key dedup is unsafe for the walk: distinct states can share a steps backing array yet have different epsilons, and the zero key collapses all no-byte tables. The faState pointer is the natural unique identity now that smallTable is embedded by value. The post-pass table-pointer dedup keeps tableShareKey (collision-safe there, as it re-checks sameFieldTransitions) and now skips the zero key explicitly. Shellstyle build at 1000 words: 1363ms -> 473ms. Full suite passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

nfa2Dfa/intern is slated to go live. Profiling BenchmarkNfa2Dfa showed it ~64% slower than main, entirely inside intern()'s dedup: clear(sl.seen) plus a per-state map assign into a map[*faState]struct{} cost ~600ms where main's faState.closureSetGen generation-counter compare cost ~50ms. That field was removed to shrink steady-state memory, so the map was the fallback. intern already sorts the state set by pointer to build a canonical key, so duplicates are adjacent after the sort. Replacing the map-based dedup with slices.Compact over the sorted buffer removes the map (and its clear()) with no per-faState field and no extra sort — sorting was already happening. Nfa2Dfa vs main (geomean, n=6): time +63.8% -> +2.65%, B/op -1.05% -> -1.65%, allocs/op -7.8% -> -9.6%. Full suite passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # epsi_closure.go # memory_cost_test.go # state_lists.go

timbray · 2026-05-31T19:31:08Z

(afk for a while, will take a closer look)

timbray · 2026-05-31T20:14:59Z

OK, checked out 535 and ran the research-main CSV generator. I'll attach the xlsx files and a graph of sec/100-addP-calls

[2026-05-31.xlsx](https://github.com/user-attachments/files/28443966/2026-05-31.xlsx) [2026-05-30.xlsx](https://github.com/user-attachments/files/28443967/2026-05-30.xlsx)

timbray · 2026-05-31T20:16:24Z

Now, granted, all these AddPattern calls are wildcards, and the epsilon merging gets kind of ugly, so probably not a representative workload.

timbray · 2026-05-31T20:17:33Z

I'm just trying to figure out whether I care about this. Don't have an opinion yet.

sayrer · 2026-05-31T21:55:28Z

I just got that workload, seems like I can do some damage to it with some small changes.

timbray · 2026-05-31T22:17:04Z

I gotta say, if those numbers are correct, the permanent memory saving is just not that dramatic and the AddPattern impact is pretty significant. So maybe the thing to do is back out the last PR and then cherrypick the good stuff from the faState/smallTable branch with the add-ons that made nfa2Dfa faster and so on?

sayrer · 2026-05-31T22:20:21Z

I think I can get it with no regression, just measuring.

sayrer · 2026-05-31T22:33:10Z

Even just the first step went well. The closure walk was improved, and had been going over the more space-efficient structures too much.

That's here:
5fe237b

  Incremental epsilon closure — results

  Plan executed (commits on embed-smalltable-research): spec 5fe237b → plan
  534327a → immutability gate 18547f6 → order-independence guard a6c44a9 →
  prune 5976a4f → review nits f96e39a (plus 20c980d research -cpuprofile
  flag).

  The win (10k-pattern research harness):

  ┌──────────────────────┬──────────────┬─────────────────────────────────┐
  │                      │ before prune │           after prune           │
  ├──────────────────────┼──────────────┼─────────────────────────────────┤
  │ patterns/sec         │ 133–140      │ 418 (3×)                        │
  ├──────────────────────┼──────────────┼─────────────────────────────────┤
  │ wall time            │ ~72 s        │ 24.6 s                          │
  ├──────────────────────┼──────────────┼─────────────────────────────────┤
  │ build-scale @ N=1000 │ 473 ms       │ 225 ms (~2.1×)                  │
  ├──────────────────────┼──────────────┼─────────────────────────────────┤
  │ closureForNfa flat   │ 8.46 s       │ 0.06 s (walk re-traversal gone) │
  └──────────────────────┴──────────────┴─────────────────────────────────┘

sayrer · 2026-05-31T22:51:39Z

So, this is already getting close on AddPattern, and the footprint improvements improve the Matcher.

timbray · 2026-05-31T23:37:02Z

Is this PR up-to-date with your latest?

sayrer · 2026-06-01T00:39:18Z

No, this is on https://github.com/sayrer/quamina/tree/embed-smalltable-research, which has everything in this PR, your research PR, and further patches to see if the AddPattern cost is really necessary. I didn't want to pile them all in to one review, but I did want to see whether it's possible to avoid any unpleasant trade-offs, and that's looking promising.

timbray · 2026-06-01T00:43:51Z

Oh, ok, I'll pull that and have a look around tomorrow sometime.

sayrer and others added 7 commits May 30, 2026 14:33

Merge remote-tracking branch 'origin/main' into embed-smalltable

885141d

# Conflicts: # epsi_closure.go # memory_cost_test.go # state_lists.go

sayrer changed the title ~~Embed smalltable~~ kaizen: Embed smalltable May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kaizen: Embed smalltable#535

kaizen: Embed smalltable#535
sayrer wants to merge 7 commits into
timbray:mainfrom
sayrer:embed-smalltable

sayrer commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

sayrer commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

sayrer commented May 31, 2026

Uh oh!

sayrer commented May 31, 2026 •

edited

Loading

Uh oh!

sayrer commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

sayrer commented Jun 1, 2026

Uh oh!

timbray commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sayrer commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

sayrer commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

sayrer commented May 31, 2026

Uh oh!

sayrer commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayrer commented May 31, 2026

Uh oh!

timbray commented May 31, 2026

Uh oh!

sayrer commented Jun 1, 2026

Uh oh!

timbray commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sayrer commented May 31, 2026 •

edited

Loading