fix(web): wire up the full version-vector catch-up path (#57, #59) by pvg13 · Pull Request #58 · pvg13/WaveSyncDB

pvg13 · 2026-05-13T12:43:10Z

Closes #57 and #58 issue followup #59.

The fix lands in three commits, in the order the bugs surface:

fix(web): trigger version-vector catch-up on swarm ConnectionEstablished (#57) — Adds the queue-based trigger so the web actually asks peers for catch-up data on connect. Without this, the web only ever saw real-time Push traffic.
chore(fmt): run rustfmt on parse_table_name — CI fmt nit from the earlier round.
fix(web): apply ChangesetResponse + bump snapshot timeout to 30s (#59) — Handles the responses (the previous arm was a no-op so the catch-up data was dropped on the floor) and matches the native side's 30s with_request_timeout so catch-up against a populated peer doesn't time out at 10s.

Together these make a fresh WebSyncClient converge with native peers within one round-trip on connect, even with months of historical data.

#57 — version-vector trigger

run_swarm now triggers a VersionVector catch-up request when a non-relay peer's ConnectionEstablished fires, paralleling run_loopback's offline→online edge. The trigger lives on a new pending_version_vectors: Vec<LibPeerId> queue drained at the top of each loop iteration (mirrors pending_announces) — direct synchronous call from handle_event would re-enter wasm_bindgen_futures::Inner::run while holding the swarm borrow.

#59 sub-bug 1 — apply the response

handle_snapshot_event's Message::Response { .. } arm was a no-op marked "PushAck etc. — no-op for now", so ChangesetResponse arrived and was silently dropped. The loopback path doesn't hit this because its responder sends a fresh Push request through out_tx instead of send_response, which routes through the existing Push handler that already calls apply_remote_changeset.

The new handler mirrors the native engine's handle_changeset_response (engine/sync_handler.rs:52-93): verify topic, verify HMAC against bytes that include the real your_last_db_version value, reconstruct the same SyncChangeset shape the inbound-Push path builds, broadcast on inbound_tx, then call apply_remote_changeset — which already persists each winning change, broadcasts on resolved_tx, and updates peer_versions[peer].

#59 sub-bug 2 — 30s snapshot timeout

request_response::Config::default() carries a 10s request timeout. Field repro: against a phone with weeks of synced data, the responder's get_changes_since(0) scan + JSON serialization + HMAC + circuit-relay hop exceeded 10s and surfaced as OutboundFailure: Timeout while waiting for a response. The native side already uses with_request_timeout(Duration::from_secs(30)) on its snapshot behaviour (engine/behaviour.rs:72); web now matches.

Invariants preserved

HMAC mandatory on ALL message paths. The new outbound request is signed when group_key is configured; the new inbound response handler drops unauthenticated ChangesetResponse silently. Mirrors what the native engine does in both directions.
db_version=0 semantics. get_peer_version returns 0 for peers never seen before, so a fresh tab's first VersionVector carries your_last_db_version=0 and asks for full history — the only mechanism for initial state transfer.
Swarm not held across awaits. The drain phase takes &mut swarm for the synchronous send_request, but get_peer_version().await and db_version.lock().await happen before any swarm borrow. The response handler awaits apply_remote_changeset but only with &state + &peer, releasing the swarm borrow first.

Test plan

Automated:

cargo check --target wasm32-unknown-unknown -p wavesyncdb --features web,dioxus — clean
cargo check --target wasm32-unknown-unknown -p example-qr-pairing -p wavesync-website — clean
cargo clippy -p wavesyncdb -p wavesyncdb_derive -p wavesync_relay -- -D warnings — clean
cargo test -p wavesyncdb --lib — 167 unit tests pass
cargo fmt --check — clean

Manual repro (requires a browser + relay + at least one peer holding history):

Boot wavesync_relay.
Boot a native test peer (tests-e2e/test-peer) joined to the same relay+topic+passphrase. Submit a few rows so the peer has history.
Open a browser tab on clean IndexedDB and run WebSyncClient::connect_via_relay(...).
Before: use_synced_table::<E>(handle) stays empty.
After: within one round-trip of relay_connected = true, the console logs swarm: requesting catch-up from <peer> since db_version=0 (we are at 0) followed by WebSyncClient: received ChangesetResponse from <peer> with N changes, and the rows materialize.

Known followup (deliberately not in this PR)

Reconnect-storm dedup: the queue can fire multiple VersionVectors for the same peer if ConnectionClosed+ConnectionEstablished cycle inside a single loop iteration. Wasteful, not incorrect.
Response chunking: if real-data catch-ups ever exceed even 30s, split the response into multiple Push request fans instead of one fat ChangesetResponse. Avoids a RAM cliff on very large histories.

…hed (#57) A fresh `WebSyncClient` joining a relay-mediated mesh was receiving only *future* writes from peers. Historical state already on the peers (rows written before this tab existed) never landed in IndexedDB because the swarm path never sent a `VersionVector` request when a peer connected. The loopback path already did this correctly on every offline→online transition (`send_version_vector`, called from `run_loopback`'s `was_online` edge), but `run_swarm`'s `ConnectionEstablished` arm only inserted the peer into `connected` and pushed a status update. The persistence side was always ready — `BrowserStore` writes `peer_versions[peer]` on every successful inbound Push and exposes `get_peer_version()` — but nothing read it. After this commit, every non-relay `ConnectionEstablished` enqueues the peer into a new `pending_version_vectors: Vec<LibPeerId>` queue that's drained at the top of each `run_swarm` loop iteration, paralleling the existing `pending_announces` pattern. The drain calls the new `send_version_vector_swarm` helper which mirrors the loopback variant but routes the request via `swarm.behaviour_mut().snapshot.send_request(&peer, req)`. Why deferred via queue instead of called synchronously from `handle_event`: same wasm_bindgen_futures executor-reentrancy hazard the relay branch already documents at length — calling `request_response.send_request` synchronously inside `ConnectionEstablished` can wake a task whose poll re-enters `Inner::run` while we still hold the outer borrow from `swarm.select_next_some()`. The queue → drain pattern keeps all async send sites at the top of the loop, between selects. Invariants preserved: - HMAC mandatory on ALL request paths. When `group_key` is configured the new request is signed with `gk.mac(&bytes)` before being sent, identical to the loopback variant. Peers silently drop unauthenticated `VersionVector` requests. - `db_version=0` is the new-peer onboarding signal. `get_peer_version` returns 0 for peers never seen before, so a fresh tab's first `VersionVector` to each peer carries `your_last_db_version=0`, asking for full history. This is the only mechanism for initial state transfer over the swarm path. - Swarm cannot be held across awaits. The drain phase takes `&mut swarm` for `send_version_vector_swarm`'s own `swarm.behaviour_mut().snapshot.send_request` call but the `get_peer_version().await` and `state.db_version.lock().await` happen before any swarm borrow — the swarm is only touched at the final synchronous `send_request` line. Validation: - `cargo check --target wasm32-unknown-unknown -p wavesyncdb --features web,dioxus`: clean - `cargo check --target wasm32-unknown-unknown -p example-qr-pairing -p wavesync-website`: clean - `cargo clippy -p wavesyncdb -p wavesyncdb_derive -p wavesync_relay -- -D warnings`: clean - `cargo test -p wavesyncdb --lib`: 167 unit tests pass - `cargo fmt --check`: clean No automated test for this path — it requires a real browser dialing a real relay with at least one connected peer holding history. Manual repro plan is in the PR body. Known followup (deliberately not in this PR): on rapid reconnect storms the queue can fire multiple `VersionVector`s for the same peer if `ConnectionClosed`+`ConnectionEstablished` cycle inside a single loop iteration. The receiver responds idempotently so this is wasteful rather than incorrect. Switch the queue to a `HashSet` or add a per-peer in-flight tracker if production traffic shows this matters. Closes #57

Follow-up to the version-vector trigger from #57. With the trigger in place the web sends the request and the native peer responds, but two issues remained that left the catch-up still broken in the field. ## Sub-bug 1: response was dropped on the floor `handle_snapshot_event`'s `Message::Response { .. }` arm was a no-op ("PushAck etc. — no-op for now"), so the `ChangesetResponse` carrying the actual catch-up data ran straight to `/dev/null`. The loopback path doesn't hit this because its responder sends a fresh `Push` request through `out_tx` instead of a `send_response`, which routes through the regular Push handler that already calls `apply_remote_changeset`. The swarm path had neither apply logic nor a "Push back" fallback. Fix: the response arm now matches on `SyncResponse::ChangesetResponse`, mirrors the native engine's `handle_changeset_response` (`engine/sync_handler.rs:52-93`): - verify the response's `topic` against ours, - verify the HMAC against bytes that include the *real* `your_last_db_version` value (the sender computed the tag over the real value; using a placeholder would always fail verification), - log the count + responder's db_version for diagnostic visibility, - reconstruct the same `SyncChangeset` shape the inbound-Push path builds, broadcast it on `inbound_tx`, and call `apply_remote_changeset` — which already persists each winning change, broadcasts on `resolved_tx`, and updates `peer_versions[peer]`. No separate `set_peer_version` is needed. `PushAck` and `IdentityAck` keep their no-op semantics. ## Sub-bug 2: 10s default timeout was too short for full history `request_response::Config::default()` carries a 10s request timeout. For a fresh tab pulling months of history against a populated peer over a circuit relay, the responder's `get_changes_since(0)` scan + JSON serialization + HMAC + circuit-relay hop frequently exceeded 10s, surfacing as `OutboundFailure: Timeout while waiting for a response`. The native side already uses `with_request_timeout(Duration::from_secs(30))` on its snapshot behaviour (`engine/behaviour.rs:72`). The web was asymmetrically slower than the native side's tolerance; matching the 30s value keeps the two ends symmetric. If real-world catch-ups ever need more, chunking the response into multiple Pushes is the next lever (followup, not this PR). Invariants preserved (same as #57): - HMAC mandatory on ALL message paths — the response handler drops unauthenticated `ChangesetResponse` silently. - `db_version=0` semantics still drive full-history onboarding. - Swarm not held across awaits — `apply_remote_changeset` is awaited but only `&state` (and `&peer`) are passed; the swarm borrow ends before the await. Closes #59

pvg13 force-pushed the fix/web-version-vector-catchup branch from a1f8dfc to 522ea5a Compare May 13, 2026 12:43

pvg13 changed the title ~~fix(web): trigger version-vector catch-up on swarm ConnectionEstablished (#57)~~ fix(web): wire up the full version-vector catch-up path (#57, #59) May 13, 2026

pvg13 merged commit e2a166c into main May 13, 2026
1 check passed

pvg13 deleted the fix/web-version-vector-catchup branch May 13, 2026 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(web): wire up the full version-vector catch-up path (#57, #59)#58

fix(web): wire up the full version-vector catch-up path (#57, #59)#58
pvg13 merged 2 commits into
mainfrom
fix/web-version-vector-catchup

pvg13 commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pvg13 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

#57 — version-vector trigger

#59 sub-bug 1 — apply the response

#59 sub-bug 2 — 30s snapshot timeout

Invariants preserved

Test plan

Known followup (deliberately not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pvg13 commented May 13, 2026 •

edited

Loading