fork-instrument: reduce synthetic frame-state locals#713
Conversation
Phase B-1 matrix build status —
|
| Package | Arch | Status | Sha |
|---|---|---|---|
| bc | wasm32 | built | 3a78fb67 |
| bzip2 | wasm32 | built | 2ddc287d |
| coreutils | wasm32 | built | 2a18f004 |
| curl | wasm32 | built | a1ab622d |
| dash | wasm32 | built | a4ab83b1 |
| diffutils | wasm32 | built | 89790993 |
| dinit | wasm32 | built | b793ac0f |
| fbdoom | wasm32 | built | d9296a33 |
| file | wasm32 | built | 6a1455c3 |
| findutils | wasm32 | built | 1237e4d0 |
| gawk | wasm32 | built | 57c5d5cf |
| git | wasm32 | built | 7c88a387 |
| grep | wasm32 | built | 235ddcd5 |
| gzip | wasm32 | built | 481eb543 |
| kandelo-sdk | wasm32 | built | 18869e79 |
| kernel | wasm32 | built | 7f3002ad |
| less | wasm32 | built | a4f09d75 |
| lsof | wasm32 | built | cbd163cd |
| m4 | wasm32 | built | f0aff0d3 |
| make | wasm32 | built | 250d1d25 |
| mariadb | wasm32 | built | bccf7260 |
| mariadb | wasm64 | built | 3df390c0 |
| modeset | wasm32 | built | 703cfd80 |
| msmtpd | wasm32 | built | ec5df629 |
| nano | wasm32 | built | 1cea23a0 |
| ncurses | wasm32 | built | 8c54f1c2 |
| netcat | wasm32 | built | 361622f8 |
| nginx | wasm32 | built | d3f35370 |
| php | wasm32 | built | c81ee5be |
| posix-utils-lite | wasm32 | built | 26ecd803 |
| sed | wasm32 | built | 03befd20 |
| tar | wasm32 | built | bede734f |
| tcl | wasm32 | built | 9070fddb |
| unzip | wasm32 | built | 6591e7ef |
| userspace | wasm32 | built | c3d7e154 |
| vim | wasm32 | built | 0d02b67d |
| wget | wasm32 | built | 04bb22d8 |
| xz | wasm32 | built | f54f1e93 |
| zip | wasm32 | built | 19152752 |
| zstd | wasm32 | built | 45fa0e20 |
| bash | wasm32 | built | 74f8e39c |
| mariadb-test | wasm32 | built | 76d2883a |
| mariadb-vfs | wasm32 | built | 37cbbded |
| mariadb-vfs | wasm64 | built | a37c83f6 |
| nethack | wasm32 | built | 51827b74 |
| vim-browser-bundle | wasm32 | built | 63be56eb |
| nethack-browser-bundle | wasm32 | built | 6f83697f |
| rootfs | wasm32 | built | bed032dc |
| shell | wasm32 | built | bc60e6a2 |
| lamp | wasm32 | built | 5fe3410c |
| node-vfs | wasm32 | built | f286a3c5 |
| wordpress | wasm32 | built | 33e16f14 |
Auto-generated; replaced on each push. Raw data in the publish-status workflow artifact.
|
Updated this draft with the deeper frame-header state change in Fresh local MRE context:
Local validation run:
Broader kernel/host/libc/POSIX/browser suites were not run because this patch changes |
PR #701's recursive MRE showed that wasm-fork-instrument's synthetic locals reduce V8's survivable recursion depth. PR #713 already removed unused catch-state locals from no-catch functions, but fork-path functions still declared frame_ptr and call_idx locals just to cache state already present in the continuation frame header. Reuse the existing frame header instead. REWIND now moves the save-buffer cursor to the active frame and loads frame.call_index from offset +4 for top-level and nested switch dispatch. UNWIND call sites write their call index into that same header field before branching to the shared postamble. The wpk_fork_* exports, save-buffer header, and per-frame offsets stay unchanged. Measured with /tmp/kandelo-pr701-mre on Node v24.15.0 after rebuilding tools/bin/wasm-fork-instrument from this branch: baseline locals=4/max_survived=9154, PR #713 catch-state-only locals=10/max_survived=6865, deeper frame-header-state locals=8/max_survived=7846. Historical bead notes recorded original instrumented locals=12/max_survived=6639 and PR #713 catch-state-only locals=10/max_survived=7469; the PR table labels run context because historical and fresh rows are not strictly cross-run comparable. Also tighten PR/reporting guidance so nontrivial runtime, ABI-adjacent, generated-code, package-artifact, or measurement-sensitive work carries explanatory commit bodies and before/after evidence tables with comparability notes. Validation run: - bash scripts/dev-shell.sh bash scripts/build-fork-instrument-tool.sh - focused switch_dispatch, instrument, and large_dispatcher tests on aarch64-apple-darwin - bash scripts/dev-shell.sh cargo test -p fork-instrument --target aarch64-apple-darwin - bash scripts/dev-shell.sh bash scripts/ci-run-test-suite.sh fork-instrument - bash scripts/dev-shell.sh bash scripts/check-abi-version.sh - git diff --check for the touched fork-instrument and reporting-guidance files Not run: broader kernel, host, libc, POSIX, Sortix, and browser suites, because this patch is isolated to wasm-fork-instrument generated Wasm shape/tests/docs/reporting guidance and does not change runtime, syscall, libc, host, package, VFS, or browser behavior. cargo fmt was not run because cargo fmt is not installed in the dev shell.
93e8827 to
7546a7b
Compare
|
Build-size examples for the These are raw
Tool build command for each checkout:
Expected tradeoff: PR #713's first catch-state-only commit usually trims a small number of bytes by deleting unused locals/state handling in no-catch functions. The deeper frame-header-state commit reduces declared Wasm locals and improves the PR #701 stack-depth MRE, but it can increase raw Additional candidate measured but not used as an “affected size” example: Package artifact note: local package outputs such as |
|
Some notes from the agent about further improvement possibilities. I'm currently reviewing this and deciding what might be reasonable to do now. Wasm Fork PR713 Design OptionsDate: 2026-06-18 Context: PR #713 reduces 1. Keep Instrumented Code Size Low
2. Further Reduce Stack Pressure
3. Dedicated Tables For References Restored During RewindConstraint:
Practical Sequence
|
Keep PR #713's frame-header reuse narrow by replacing the two no-catch postamble zero stores for catch_region_id and exnref_slot with one i64.store at frame offset +8. Catch-capable functions still write the dynamic fields separately, preserving the documented frame layout and catch resume state. Tests cover the generated WAT shape for both paths and update the postamble store-count expectations.
|
kd-7d1 PR update is published in Scope:
PR #701 MRE evidence (
Verification:
|
The browser staging failure for PR #713 reproduced locally from the failed CI test-workspace artifact: the shell demo timed out waiting for KANDELO_BASH_OK after running the same bash array/prompt command seen in CI. The regression boundary was the no-catch postamble change that packed catch_region_id and exnref_slot zeroes into a single i64.store offset=8. The prior green commit already preserved frame-header reuse and avoided synthetic frame_ptr/call_idx locals, so keep that intent but emit the two 32-bit zero stores used by the working shape. This leaves catch-capable functions on dynamic 32-bit catch header stores, does not change ABI layout, and adds tests that reject the packed no-catch i64 store. Validation: focused browser Playwright repro failed before the fix with the CI symptom; cargo test -p fork-instrument --target $HOST_TARGET --test instrument passed; bash scripts/dev-shell.sh bash scripts/ci-run-test-suite.sh fork-instrument passed; bash scripts/dev-shell.sh bash scripts/check-abi-version.sh passed; git diff --check passed. Post-fix browser verification is deferred to PR CI because it requires rebuilt staged package artifacts.
|
prepare-merge: test-gate passed against the synthetic PR merge and |
Summary
The PR #701 reproducer shows that fork instrumentation increases V8 Wasm stack-frame pressure enough to reduce recursive survival depth. This PR reduces synthetic frame-state locals in
wasm-fork-instrumentwhile preserving the existingwpk_fork_*exports and continuation-frame layout.Current PR head:
64b8977d5bf14f5c70c60beb32f210ec45153e05.Implemented reductions:
97f51fd90: no-catch functions no longer declare unusedcatch_region_idandexnref_slotlocals.7546a7b75: top-level and nested switch dispatch reuse the continuation frame header instead of declaring syntheticframe_ptrandcall_idxlocals.a876580fa: tried packing no-catch zero stores into onei64.store offset=8; this reduced generated bytes but caused the browser staging shell demo regression.64b8977d: keeps the frame-header reuse/local-count result, but restores the no-catch header zeroes to two scalari32.stores because the packedi64.storewas the browser regression boundary.The current PR result is therefore the local-pressure reduction from frame-header reuse, not the packed-store code-size cleanup.
PR #701 MRE Results
MRE:
/tmp/kandelo-pr701-mre, functionbenchmark_walk, Nodev24.15.0. Lower declared locals are a proxy for lower V8 per-frame pressure.max_survivedis the largest recursive depth completed before V8 stack overflow.Local-count progression
walklocals97f51fd907546a7b75frame_ptrandcall_idxlocals by reusing the frame header.a876580fa64b8977dSize/depth measurements
walklocalsmax_survived0x0aec0x0cc87546a7b75generated shape; staging run was green.0x0cbea876580fa; browser staging later failed.0x0cc864b8977drestores the scalar no-catch generated shape from7546a7b75; no new depth claim beyond the last same-shape measurement.Comparability note: V8 stack-depth numbers are only strict within the same measurement context. The durable local-count result is stable: original forkinstr
12-> catch-state-only10-> current head8.Browser CI Regression And Fix
The packed-store commit
a876580facaused PR #713 staging browser failure inapps/browser-demos/test/kandelo-merge-gate.spec.tswhile running the shell demo. The failing job timed out waiting forKANDELO_BASH_OKafter the bash array/prompt command was echoed but did not complete.The failure reproduced locally from the failed CI test-workspace artifact. The regression boundary was the no-catch postamble change from two scalar stores:
to one packed store:
64b8977drestores the two scalar no-catch stores while keeping frame-header reuse and avoiding syntheticframe_ptr/call_idxlocals. Tests now reject the packed no-catchi64.store offset=8shape.CI status at this update:
7546a7b75staging run: green.a876580fastaging run: browser failure reproduced locally.64b8977dstaging run27803748679, attempt 1: cancelled before the browser test gate completed, so it did not prove the post-fix browser result.64b8977dstaging run27803748679, attempt 2: in progress as of this update. Current visible status is 52 successful jobs, 3 skipped jobs, and 3 queued package-matrix jobs (zstd,node-vfs,lamp); themerge-gatecontext remains pending. This PR body does not claim current-head browser green status yet.Validation
Passed for the frame-header reuse work:
bash scripts/dev-shell.sh bash scripts/build-fork-instrument-tool.shbash scripts/dev-shell.sh cargo test -p fork-instrument --target aarch64-apple-darwin --test switch_dispatchbash scripts/dev-shell.sh cargo test -p fork-instrument --target aarch64-apple-darwin --test instrumentbash scripts/dev-shell.sh cargo test -p fork-instrument --target aarch64-apple-darwin --test large_dispatcherbash scripts/dev-shell.sh cargo test -p fork-instrument --target aarch64-apple-darwinbash scripts/dev-shell.sh bash scripts/ci-run-test-suite.sh fork-instrumentbash scripts/dev-shell.sh bash scripts/check-abi-version.shgit diff --checkfor touched fork-instrument, docs, and reporting-guidance filesPassed after the current-head scalar-store fix:
cargo test -p fork-instrument --target $HOST_TARGET --test instrumentbash scripts/dev-shell.sh bash scripts/ci-run-test-suite.sh fork-instrumentbash scripts/dev-shell.sh bash scripts/check-abi-version.shgit diff --check -- crates/fork-instrument/src/instrument.rs crates/fork-instrument/tests/instrument.rsNot run or not claimed:
27803748679was cancelled before the browser test gate completed and attempt 2 is still in progress./Users/brandon/src/kandelo-kd-fbt, butwasm32posix-cccould not use that worktree's incompletesysroot(sysroot/lib/libc.amissing). The current-head table therefore uses the last measured same generated-code shape for size/depth rather than claiming a new depth sweep.cargo test -p kandelo --target aarch64-apple-darwin --lib: not run because this change is limited to the fork-instrument code generator, generated Wasm shape, docs, and reporting guidance; it does not change kernel/libc/syscall behavior.cd host && npx vitest run: not run because no host runtime or browser/Node adapter code changed.scripts/run-libc-tests.sh: not run because no libc, syscall, or POSIX runtime behavior changed.scripts/run-posix-tests.sh: not run because no kernel/POSIX semantics changed.cargo fmt --check -p fork-instrument: not run successfully because this dev shell has nocargo fmtsubcommand installed.