Skip to content

Add standalone fork-instrument crash reproducer#701

Draft
mho22 wants to merge 1 commit into
mainfrom
benchmark-fork-instrument
Draft

Add standalone fork-instrument crash reproducer#701
mho22 wants to merge 1 commit into
mainfrom
benchmark-fork-instrument

Conversation

@mho22

@mho22 mho22 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • add benchmarks/fork/chain.c, a recursive walker shaped after Zend's zend_compile_short_circuiting so wasm-fork-instrument's per-frame cost is measurable on a binary apart from PHP itself
  • add benchmarks/fork/build.sh to produce baseline and fork-instrumented variants from one chain.c source
  • add benchmarks/fork/run.mjs to binary-search the maximum recursion depth that survives V8's call-stack budget on each variant
  • exit non-zero from run.mjs when forkinstr's max depth is not strictly less than baseline's — that invariant is what the benchmark exists to verify

Why

PR #669 asks for a standalone test case apart from PHP that reproduces the V8 RangeError: Maximum call stack size exceeded on deep recursion through fork-instrumented frames. Iterating on candidate fork-instrument changes via the full PHP build+run cycle takes ~45 minutes; this benchmark drops that to ~10 seconds.

The PHP crash composes two causes: unbounded recursion in zend_compile_short_circuiting (fixed PHP-side by #669) and ~8 extra wasm locals per instrumented frame from wasm-fork-instrument's switch-dispatch state machine. `chain.c` bounds recursion by its `depth` argument so it reproduces only the kandelo-side cause in isolation; any candidate reduction of the per-frame cost can be validated by watching the forkinstr depth move.

Notes

  • ABI impact: none. Adds a benchmark harness only; no kernel, host, or program changes.

  • Empirical numbers on macOS arm64 / Node v24.13.0, deterministic across 4 runs:

    variant `walk` declared locals max depth before `RangeError` % of baseline
    baseline 4 9 972 100%
    forkinstr 12 6 648 66.7%

    Differential = 3 324 frames lost (33% drop) attributable to wasm-fork-instrument's switch-dispatch state machine adding ~8 locals per frame on the recursive function. The ratio (~1.50×) is the load-bearing finding; absolute numbers are platform/tier-dependent.

Testing

  • bash scripts/dev-shell.sh bash benchmarks/fork/build.sh produces identical wasm sizes (3882 / 4587 bytes) and walk-locals counts (4 / 12) across runs
  • node --experimental-wasm-exnref benchmarks/fork/run.mjs exits 0 with baseline 9 972 / forkinstr 6 648
  • with the wasms manually swapped so forkinstr > baseline, the same command exits 1 with INVARIANT FAILED: ... — confirms the self-test fires on regression
  • bash scripts/check-abi-version.sh exits 0 (benchmark touches no kernel/host/programs/ABI)

## Summary
- add `benchmarks/fork/chain.c`, a recursive walker shaped after Zend's
  `zend_compile_short_circuiting` so wasm-fork-instrument's per-frame
  cost is measurable on a binary apart from PHP itself
- add `benchmarks/fork/build.sh` to produce baseline and fork-
  instrumented variants from one chain.c source
- add `benchmarks/fork/run.mjs` to binary-search the maximum recursion
  depth that survives V8's call-stack budget on each variant
- exit non-zero from `run.mjs` when forkinstr's max depth is not
  strictly less than baseline's — that invariant is what the benchmark
  exists to verify

## Why
PR #669 asks for a standalone test case apart from
PHP that reproduces the V8 `RangeError: Maximum call stack size
exceeded` on deep recursion through fork-instrumented frames. Iterating
on candidate fork-instrument changes via the full PHP build+run cycle
takes ~45 minutes; this benchmark drops that to ~10 seconds.

The PHP crash composes two causes: unbounded recursion in
`zend_compile_short_circuiting` (fixed PHP-side by #669) and ~8 extra
wasm locals per instrumented frame from wasm-fork-instrument's
switch-dispatch state machine. `chain.c` bounds recursion by its
`depth` argument so it reproduces only the kandelo-side cause in
isolation; any candidate reduction of the per-frame cost can be
validated by watching the forkinstr depth move.

## Testing
- `bash scripts/dev-shell.sh bash benchmarks/fork/build.sh` produces
  identical wasm sizes (3882 / 4587 bytes) and walk-locals counts
  (4 / 12) across runs
- `node --experimental-wasm-exnref benchmarks/fork/run.mjs` is
  deterministic at baseline 9 972 / forkinstr 6 648 across 4 runs on
  macOS arm64 / Node v24.13.0; exits 0
- with the wasms manually swapped so forkinstr > baseline, the same
  command exits 1 with `INVARIANT FAILED: ...` — confirms the
  self-test fires on regression
- `bash scripts/check-abi-version.sh` exits 0 (benchmark touches no
  kernel/host/programs/ABI)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
brandonpayton added a commit that referenced this pull request Jun 18, 2026
PR #701's recursive MRE showed that wasm-fork-instrument's synthetic locals reduce V8's survivable recursion depth. PR #713 already removed unused catch-state locals from no-catch functions, but fork-path functions still declared frame_ptr and call_idx locals just to cache state already present in the continuation frame header.

Reuse the existing frame header instead. REWIND now moves the save-buffer cursor to the active frame and loads frame.call_index from offset +4 for top-level and nested switch dispatch. UNWIND call sites write their call index into that same header field before branching to the shared postamble. The wpk_fork_* exports, save-buffer header, and per-frame offsets stay unchanged.

Measured with /tmp/kandelo-pr701-mre on Node v24.15.0 after rebuilding tools/bin/wasm-fork-instrument from this branch: baseline locals=4/max_survived=9154, PR #713 catch-state-only locals=10/max_survived=6865, deeper frame-header-state locals=8/max_survived=7846. Historical bead notes recorded original instrumented locals=12/max_survived=6639 and PR #713 catch-state-only locals=10/max_survived=7469; the PR table labels run context because historical and fresh rows are not strictly cross-run comparable.

Also tighten PR/reporting guidance so nontrivial runtime, ABI-adjacent, generated-code, package-artifact, or measurement-sensitive work carries explanatory commit bodies and before/after evidence tables with comparability notes.

Validation run:

- bash scripts/dev-shell.sh bash scripts/build-fork-instrument-tool.sh

- focused switch_dispatch, instrument, and large_dispatcher tests on aarch64-apple-darwin

- bash scripts/dev-shell.sh cargo test -p fork-instrument --target aarch64-apple-darwin

- bash scripts/dev-shell.sh bash scripts/ci-run-test-suite.sh fork-instrument

- bash scripts/dev-shell.sh bash scripts/check-abi-version.sh

- git diff --check for the touched fork-instrument and reporting-guidance files

Not run: broader kernel, host, libc, POSIX, Sortix, and browser suites, because this patch is isolated to wasm-fork-instrument generated Wasm shape/tests/docs/reporting guidance and does not change runtime, syscall, libc, host, package, VFS, or browser behavior. cargo fmt was not run because cargo fmt is not installed in the dev shell.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant