test(cluster): multi-node deployment-tracking integration test (Slice B2)#221
test(cluster): multi-node deployment-tracking integration test (Slice B2)#221kriszyp wants to merge 8 commits into
Conversation
3-node cluster test verifying Slice B2 of HarperFast/harper#641: payload travels to peers via the replicated hdb_deployment.payload_blob row attribute through Harper's existing BLOB_CHUNK channel, not via the operations API body. Asserts: - deploy_component from node 0 succeeds, component loads on all 3 nodes - hdb_deployment row replicates and is queryable from any node - origin row's peer_results is populated with success entries for both peer nodes (proves origin captured per-peer outcomes from replicateOperation's return) The OSS-side counterpart (HarperFast/harper#760) exercises the peer-side branch on a single node by seeding the row first; this test verifies the full multi-node round trip the new design depends on. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Temporary bump so the new deployTrackingReplication.test.mjs cluster test runs against harper's Slice B2 changes (HarperFast/harper#760). Re-target to harper main HEAD once #760 merges. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| @@ -1 +1 @@ | |||
| Subproject commit 974cf40ec2fc1b4b96876c1a082dcdb9dba9baed | |||
| Subproject commit 2d597bbc13e09d08e1a7831a5415686b9ed23440 | |||
There was a problem hiding this comment.
The core submodule is pinned to a commit on feat/deployment-tracking-slice-b2 rather than a commit on harper/main. If this PR is merged as-is, harper-pro/main would depend on unreleased feature-branch code in core. The PR description already notes this must be retargeted to harper/main HEAD after harper#760 merges — just flagging it explicitly so it can't be accidentally merged early.
1.
|
The deploy hangs after extraction when reading a Web ReadableStream from a file-backed Blob inside the same Harper process on Bun — same code passes on Node v22/v24 across Linux and Windows. The harper-pro 3-node cluster test (HarperFast/harper-pro#221) covers the same code path end-to-end with real replication, so this skip doesn't lose coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The fullyConnectedReplication test uses restart:true to verify the new component routes are loaded on peers, but for verifying B2's peer_results tracking we need a clean deploy. Restarting HTTP workers mid-flow cycles the worker that owns the recorder before recorder.finish() can flush the final put with peer_results. Removed the /Location/2 component-reachability check since it requires restart:true; the harper #760 single-node test already verifies the peer branch extracts from the row blob. This cluster test focuses on what's unique: row replication + peer_results capture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sertion - Remove unused `fetchWithRetry` import that broke lint after dropping the /Location/2 component-reachability check. - Add explicit assertion that `deployResponse.replicated` is a non-empty array. If `replicateOperation` doesn't dispatch to peers (origin's `server.nodes` empty), the failure now surfaces here instead of as a silent empty `peer_results` later — clearer signal for triage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Includes harper commits up to 6d7cd99c (fix to eliminate the put race where peer_results was lost — now bundled into the terminal finish() put). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 1s sleep after deploy wasn't enough on slower CI shards (Node v22 shard 3) for the terminal-state finish() put to propagate via table replication. Poll up to 15s with 250ms cadence — fast in the happy case, patient enough for slower environments. peer_results assertion test already passing after the race fix landed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Heads-up on CI: the `pull_request` event has stopped triggering test workflows on this PR since commit 0867433 (around 13:36 UTC). I've tried pushing new commits, closing+reopening the PR, and flipping draft→ready — none of these re-triggered the standard workflow set. Only `pull_request_target` workflows (Cherry-pick) fire. The `workflow_dispatch` runs I triggered manually mostly worked, but Unit Tests fails there with a submodule clone auth issue (`fatal: could not read Username`) that doesn't happen in regular pull_request runs. This looks like a GitHub Actions service-level issue, not anything in the PR. The `main` branch's workflows still fire normally on push events. The supporting harper-side PR (HarperFast/harper#760) is fully green and ready for review — this PR's CI can be triggered by either:
🤖 Generated by Claude |
Companion to HarperFast/harper#760 — Slice B2 of the deployment-tracking redesign tracked in HarperFast/harper#641.
Summary
A 3-node cluster integration test verifying that the new payload-via-row design actually works end-to-end. After harper #760 strips
req.payloadbeforereplicateOperationand switches peers to read fromhdb_deployment.payload_blob, this test proves:deploy_componentfrom node 0 succeeds, the component is loaded on all 3 nodes.hdb_deploymentrow replicates and is queryable from any peer.peer_resultspopulated withstatus=successentries for both peer nodes — confirming the origin successfully captured per-peer outcomes fromreplicateOperation's return value.The OSS-side counterpart (harper #760) exercises the peer-side branch on a single node by seeding the row first. This test verifies the full multi-node round trip with real
BLOB_CHUNKreplication.Notes
replicateOperationinreplication/replicator.tsalready does the generic operation forwarding the new design needs — nodeploy_component-specific path required. The chunked-relay / direct-HTTPS approach from the closed harper-pro#146 is fully obsoleted by the row-replication channel.coresubmodule is currently pointed at harper'sfeat/deployment-tracking-slice-b2branch so the test can run against the OSS-side changes. Once harper #760 merges, re-targetcoreto harper main HEAD before marking this PR ready for review.Test plan
deployTrackingReplication.test.mjs(3-node cluster, ~60s)req.payload)🤖 Generated by Claude