Wasm merge timeout enforcement#2000
Wasm merge timeout enforcement#2000cursor[bot] wants to merge 1 commit intofeat/sync-wasm-merge-callbackfrom
Conversation
Implement timeout enforcement for WASM merge callbacks to prevent buggy or malicious merge functions from blocking sync indefinitely. Changes: - Wrap store and instance in Arc to enable thread sharing - Execute WASM merge calls in background threads with configurable timeout - Return MergeError::WasmTimeout when timeout expires (default: 5 seconds) - Remove unused helper methods (get_memory, write_to_wasm, read_from_wasm) - Remove #[allow(dead_code)] from timeout_ms field (now actively used) Note: Due to Wasmer limitations, the underlying WASM execution may continue in the background thread after timeout. This is documented in the module-level docs.
|
Cursor Agent can help with this pull request. Just |
|
Your PR title does not adhere to the Conventional Commits convention: Common errors to avoid:
|
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 94% | Review time: 130.8s
🔴 1 critical, 🟡 3 warnings, 💡 2 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-f77127a9
| local_len = local.len(), | ||
| remote_len = remote.len(), | ||
| timeout_ms, | ||
| "Calling WASM merge function with timeout" |
There was a problem hiding this comment.
🔴 Mutex held indefinitely after timeout causes subsequent calls to block
When timeout occurs, the background thread still holds store.lock() and continues execution; subsequent call_merge_function calls will block on the mutex until the WASM completes, defeating the timeout purpose.
Suggested fix:
Consider using a try_lock with retry, or create a new WASM instance after timeout, or use a per-call store clone if Wasmer supports it.
| })?; | ||
| debug!( | ||
| target: "calimero_runtime::merge", | ||
| export_name, |
There was a problem hiding this comment.
🟡 Thread resource leak on timeout - spawned threads never terminate
After timeout, the spawned thread keeps running indefinitely if WASM is in an infinite loop; repeated timeouts will accumulate orphaned threads consuming CPU and memory.
Suggested fix:
Document this limitation clearly; consider tracking active threads with a counter/metric; investigate Wasmer's interrupt mechanisms for future improvement.
| &local_owned, | ||
| &remote_owned, | ||
| ); | ||
| // Send result; ignore error if receiver dropped (timeout occurred) |
There was a problem hiding this comment.
🟡 WASM instance state undefined after timeout - subsequent merges may corrupt data
After timeout, the WASM instance continues modifying its memory; subsequent merge calls on the same instance may read partially-written or corrupted state.
Suggested fix:
Mark the RuntimeMergeCallback as poisoned after timeout and return an error on subsequent calls, or create a fresh instance.
| timeout_ms, | ||
| "Calling WASM merge function with timeout" | ||
| ); | ||
|
|
There was a problem hiding this comment.
🟡 Unbounded thread spawning enables resource exhaustion DoS
Each merge call spawns an unbounded thread that continues running after timeout, holding Arc references to Store and Instance; an attacker triggering many slow WASM merge operations could exhaust thread/memory resources.
Suggested fix:
Consider using a bounded thread pool, adding a semaphore to limit concurrent merge calls, or exploring WASM metering/fuel-based cancellation in future iterations.
| })?; | ||
| debug!( | ||
| target: "calimero_runtime::merge", | ||
| export_name, |
There was a problem hiding this comment.
💡 Thread spawning overhead on every merge call
Creating a new OS thread for each merge operation adds overhead; for high-frequency merges this could become a bottleneck.
Suggested fix:
Consider using a thread pool or async task spawning if merge frequency is expected to be high.
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
💡 Long function could be decomposed for readability
The execute_merge_call function is ~110 lines; extracting helper closures or local functions for memory write/read operations would improve readability.
Suggested fix:
Extract local helper closures like `write_to_memory` and `read_from_memory` within the function to reduce repetition and clarify intent.
|
This pull request has been automatically marked as stale. If this pull request is still relevant, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize reviewing it yet. Your contribution is very much appreciated. |
runtime: Enforce WASM merge function timeout
Description
This PR fixes a high-severity logic bug (bug_id: 381f5661-ec7b-4a91-b616-57c22661471b) where the WASM merge function could block indefinitely, leading to permanent stalls in sync processing. The
RuntimeMergeCallback::call_merge_functionnow executes the WASM merge call in a separate thread with a timeout, using thetimeout_msfield and returningMergeError::WasmTimeoutif the timeout is exceeded. This prevents untrusted app merge logic from becoming a denial-of-service vector.Test plan
The following tests were run to verify the changes:
cargo testwithin thecrates/runtimedirectory. All existing tests passed.cargo clippyandcargo fmtwere run to ensure code style and catch potential issues.No new end-to-end tests were added as the change is internal to the runtime's merge callback mechanism and covered by existing unit tests.
Documentation update
No public or internal documentation updates are required for this change.