Skip to content

perf(attribution): speed up the Authors overlay producer (~30× at 10k changes)#225

Merged
shikokuchuo merged 4 commits into
mainfrom
perf/char-to-byte-uint32array
May 21, 2026
Merged

perf(attribution): speed up the Authors overlay producer (~30× at 10k changes)#225
shikokuchuo merged 4 commits into
mainfrom
perf/char-to-byte-uint32array

Conversation

@shikokuchuo
Copy link
Copy Markdown
Member

@shikokuchuo shikokuchuo commented May 21, 2026

What

Four commits, each landing a discrete win on the JS-side attribution producer (buildRunListAttribution / useAttribution, the code that builds the payload behind the Authors overlay).

  1. Replay history via A.applyChanges instead of per-step A.diff (7a6bc8a4). The old loop called A.diff(doc, prevHeads, currHeads) per history step, which scales super-linearly with doc state. The new loop pre-indexes changes by hash and forward-replays through applyChanges + patchCallback, which is O(1) per change.

    N diff() applyChanges speedup
    500 22 ms 8 ms 2.6×
    2 000 197 ms 22 ms 9.0×
    10 000 4 121 ms 116 ms 35.5×
    30 000 36 325 ms 375 ms 96.8×

    Runs verified equivalent across append-only, mixed-edit, and multi-actor (sequential + diamond-DAG) fixtures.

  2. Drop one yield per build (abee1827). Moved the chunk-yield from the top of the loop to between chunks; a history that fits in one chunk (≤ CHUNK_SIZE) now never pays an idle-callback round-trip and short builds finish synchronously.

  3. Uint32Array for the char→byte map (a35e33e0). Rebuilt on every debounced payload; switching from boxed Array<number> to a single contiguous Uint32Array cuts allocations and halves the per-codeunit heap footprint. 5–30 % faster from ~10 KB documents up; smaller docs are at-worst-noise.

  4. CHUNK_SIZE doc comment (a5081334) — refreshed to reflect the new applyChanges cost profile (≈15 µs/entry, roughly constant in N) instead of the obsolete A.diff numbers.

Why

Cold-start of the Authors overlay on long histories was the dominant wallclock, driven by A.diff reloading historical doc state on every step. The rewrite makes it linear; the smaller wins remove the leftover papercuts so the producer is fast even on a 30 k-change document (375 ms vs the old 36 s).

Rebuilt on every debounced payload update; switching the storage from
boxed `Array<number>` to a contiguous `Uint32Array` is 5-30% faster
beyond ~10KB documents (V8 microbench across ASCII/mixed/CJK) and
halves the per-codeunit heap footprint. Smaller docs unchanged.
Move the chunk yield from the top of the loop to between chunks,
guarded by `chunkEnd < history.length`. A history that fits in one
chunk now never pays an idle-callback round-trip; longer builds
save one yield as well. Runs output unchanged; existing tests pass.
buildRunListAttribution pre-indexes changes by hash and forward-replays
via `A.applyChanges` with patchCallback. Eliminates the super-linear
`A.diff` cost: ~30x faster at N=10000 changes (4.1s → 142ms in Node
bench). Runs verified equivalent across append-only, mixed-edit, and
multi-actor (sequential + diamond-DAG) fixtures.

The RunListAttribution state carries an internal `_workDoc` so
incremental updates apply only new changes; hand-built states without
it fall back to HistoryCompactedError → full rebuild.
@shikokuchuo shikokuchuo merged commit 49617d4 into main May 21, 2026
5 checks passed
@shikokuchuo shikokuchuo deleted the perf/char-to-byte-uint32array branch May 21, 2026 11:27
@cscheid
Copy link
Copy Markdown
Member

cscheid commented May 21, 2026

Excellent!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants