fix(web): _withInnerWebClient re-entrancy — depth-tracked inline run#152
fix(web): _withInnerWebClient re-entrancy — depth-tracked inline run#152WiktorStarczewski wants to merge 3 commits into
Conversation
SantiagoPittella
left a comment
There was a problem hiding this comment.
What about a "trusted handle" instead of a flag? Essentially replacing the "flip a global flag while fn runs" logic with "hand fn a different object whose methods don't go through _serializeWasmCall at all":
// In MidenClient
_withInnerWebClient(fn) {
return this.#inner._serializeWasmCall(() => {
const trusted = createTrustedHandle(this.#inner); // bypasses chain
return fn(trusted);
});
}
createTrustedHandle(inner) would return an object that exposes the same method surface as the proxy-wrapped inner, but each method calls the underlying wasm-bindgen client directly, skipping _serializeWasmCall entirely. The serialization is provided once, by the outer _serializeWasmCall slot that wraps the whole fn.
With this approach we can remove the shared counter and global flag, the external callers don't need to know about this.
If you prefer to keep the existing approach, I think that the new tests aren't running in the CI, would you mind to fix that?
Summary
_withInnerWebClient(fn)deadlocks any timefncalls back into a proxied async wasm method on the sameinnerclient — which is exactly the entire point of the escape hatch. Every Miden Wallet usage of the local-prove (offscreen-doc) pipeline on Chrome MV3 hangs indefinitely because of this.Fix is a depth-tracked re-entrancy on
_serializeWasmCall: while_withInnerWebClient's callback is running, the inner WebClient's_withInnerLockDepthis> 0and_serializeWasmCallruns its callback inline instead of enqueueing it on the same chain that the outer slot is already awaiting. External callers still queue behind the outer slot — no behavior change for any other code path.The deadlock
_withInnerWebClient(fn)wrapsfnin_serializeWasmCall(() => fn(inner)). The intent: hold the chain slot for the entire callback so all inner wasm calls run atomically.Inside
fn, any call to a proxied async wasm method —inner.getInputNote(...)(proxy-fallback wraps it),inner.executeTransaction(...)(wrapper class method wraps it),inner.submitProvenTransaction(...),inner.applyTransaction(...)— also calls_serializeWasmCall(fnB). Since the chain is currently held by the outer slot (fnA),fnBenqueues afterfnA. ButfnAis awaitingfnB's result.fnBis waiting forfnAto settle. Classic re-entrant-lock deadlock with no timeout.Diagnostic trace from the wallet's
proveLocallyViaOffscreenreproducing it:getInputNotenever returns. Downstream, the wallet's SWSyncManagertimes out onwithWasmClientLockwhile the consume holds the wallet's outer mutex, and surfaces a misleading "cannot reach the miden node" banner.Fix
External callers still queue behind the outer slot — they hit
_serializeWasmCallwith_withInnerLockDepth === 0and go through the chain. The chain still serializes against them; the outer slot only resolves afterfn(including every inline re-entry) settles.Safety contract (documented inline)
Re-entrancy is safe only if the caller of
_withInnerWebClientholds an external mutex preventing concurrent access to this same client instance via other code paths duringfn. The chain serializes against external callers but if an external task runs during one offn's awaits and calls into the SDK, it will see_withInnerLockDepth > 0and run inline, racing wasm-bindgen's borrow check. The wallet's ownwithWasmClientLockdiscipline satisfies this; the method is@internalso callers should already be aware.Tests
Added
crates/web-client/test/with_inner_web_client_reentrancy.test.tswith four cases against the live SDK:fn(inner.getInputNote(...)) doesn't deadlock.fn(inner.syncState()) doesn't deadlock — exercises the re-entry path through wrapper-defined methods._withInnerWebClientslot (ordering invariant).fnreleases the depth counter so subsequent calls work.Each test wraps the call in
Promise.racewith an explicit timeout so the deadlock case fails loud withTIMEOUTinstead of hanging the suite.Empirical verification with Miden Wallet stress harness
Patched the Miden Wallet to consume this branch via
file:deps, enabled local proving on both stress wallets, ran the 5-tx stress against testnet.Pre-fix: every consume hung at
getInputNote; zero sends completed; 5-minute Playwright timeout.Post-fix: 5 of 5 sends completed end-to-end through the offscreen-doc prove pipeline:
sendLatencyMs10–18 s/send is the expected offscreen-doc MT-WASM prove window.
[mt-offscreen-prove] tx_completed prove_ms=…fires for every op.Remaining stress failures are downstream of
miden-vm#3144(BasicFungibleFaucet procedure-digest drift across miden-assembly patch versions) — the test faucet renders as "Unknown" instead of "TST", which the stress harness'sgetByText('TST', exact: true)selector can't match. Unrelated to this fix.Test plan
scripts/dev-with-web-sdk-pr.sh, enable local proving in Settings, send a tx → should complete in ~10 s instead of hanging