Skip to content

feat(deploy): Slice B2 — peer-side blob reads + install-line streaming#760

Draft
kriszyp wants to merge 4 commits into
mainfrom
feat/deployment-tracking-slice-b2
Draft

feat(deploy): Slice B2 — peer-side blob reads + install-line streaming#760
kriszyp wants to merge 4 commits into
mainfrom
feat/deployment-tracking-slice-b2

Conversation

@kriszyp
Copy link
Copy Markdown
Member

@kriszyp kriszyp commented May 23, 2026

Slice B2 of the deployment-tracking redesign in #641. Closes #758.

Summary

What changed

Origin side (components/operations.js):

  • Strip req.payload before replicateOperation so peers don't receive a consumed Readable.
  • After replicateOperation returns, normalize the per-peer outcomes and write them to row.peer_results via the new DeploymentRecorder.recordPeers().

Peer side (components/operations.js + components/deploymentRecorder.ts):

  • New branch in deployComponent: when req._deploymentId is set and req.payload is absent, look up hdb_deployment[deployment_id] via the new awaitDeploymentRow() helper, then extract from row.payload_blob.stream(). The Blob API already handles in-flight BLOB_CHUNK writes by blocking until chunks arrive.
  • awaitDeploymentRow polls with exponential backoff (5ms → 100ms) so the fast path — replication has already caught up — sees no human-perceptible latency.

Install-line streaming (components/Application.ts):

  • nonInteractiveSpawn gains an optional onLine(stream, line) callback with proper line buffering and StringDecoder so multi-byte UTF-8 characters split across chunk boundaries are reassembled, not corrupted into U+FFFD.
  • installApplication threads an onLine closure through all three install paths (custom command, devEngines packageManager, default npm).
  • deployComponent wires onInstallLine to emit('install', {manager, stream, line}) so the SSE channel carries npm install output as it happens.

Defensive write (components/deploymentRecorder.ts):

  • recordPeers swallows audit-write failures with a warn log — peer_results is observability, not critical path. A disk-full on the audit table shouldn't fail a deploy that successfully replicated.

Where to look

  • components/operations.js lines ~420 (peer-side branch + strip) and ~480 (peer_results capture)
  • components/deploymentRecorder.tsrecordPeers, awaitDeploymentRow, normalizePeerResult
  • components/Application.tscreateLineSplitter, nonInteractiveSpawn onLine param, three install paths threading

Cross-model review

agy (Google's model) reviewed the diff and flagged three real issues that are addressed in this PR:

  1. UTF-8 multi-byte char split across chunks was being corrupted by naive chunk.toString(). Fixed with StringDecoder.
  2. recordPeers would have crashed the deploy on an audit table write error. Now logged-and-swallowed.
  3. awaitDeploymentRow had a 100ms fixed polling interval that introduced a latency floor on the fast path. Now exponential backoff starting at 5ms.

Test plan

  • unitTests/components/applicationSpawn.test.js — 6 line-buffering scenarios including UTF-8 split reassembly
  • unitTests/components/deploymentRecorder.test.js — 12 scenarios (recordPeers normalization, finish-then-recordPeers no-op, audit-write resilience, awaitDeploymentRow timeout + fast path)
  • integrationTests/deploy/deploy-tracking-peer-branch.test.ts — exercises the peer-side branch on a single node by seeding an hdb_deployment row first
  • No regression on existing deploy-tracking.test.ts (11 tests) and deploy-tracking-events.test.ts (5 tests)
  • True 3-node verification lives in the matching harper-pro PR — replicateOperation is implemented there

Follow-ups (out of scope)

  • Slice B3: delete_deployment_payload operation + 3-node cluster integration test in harper-pro
  • Slice C: rollback (deploy_component {rollback_from}) + onStorageReclamation blob pruning
  • Cross-node install line forwarding (peers' npm install lines visible on origin SSE) — not yet wired; peers have no emitter

🤖 Generated by Claude

Companion PR

  • harper-pro#221: 3-node cluster integration test that verifies the full round trip. No code changes there — the existing replicateOperation already does generic operation forwarding (no deploy_component-specific path needed). The closed harper-pro#146's chunked-relay was working around a constraint the row-replication channel doesn't have.

Slice B2 of the deployment-tracking redesign in #641. Makes multi-node
deploy_component work after Slice A's payload_blob-on-replicated-row design.

Origin side (components/operations.js):
- Strip req.payload before replicateOperation so peers receive a clean
  operation body (the consumed Readable would otherwise leak).
- Capture per-peer outcomes from replicateOperation's `replicated` return
  into the origin row's peer_results via DeploymentRecorder.recordPeers().

Peer side (components/operations.js + deploymentRecorder.ts):
- When req._deploymentId is set and there's no req.payload, look up the
  hdb_deployment row (await replication arrival with exponential-backoff
  polling, 5ms → 100ms), then extract from row.payload_blob.stream().
- awaitDeploymentRow encapsulates the polling with a 30s default timeout.

Install-line streaming (components/Application.ts):
- nonInteractiveSpawn gains an optional onLine(stream, line) callback
  with proper line buffering (StringDecoder so a multi-byte UTF-8 char
  split across chunks is reassembled instead of corrupted into U+FFFD).
- installApplication threads an onLine closure through all three install
  paths (custom command, devEngines packageManager, default npm).
- deployComponent wires this to emit('install', {manager, stream, line})
  so the SSE channel carries `npm install` output live as it streams.
  Finishes the residual install-streaming work from the closed #531.

DeploymentRecorder.recordPeers swallows audit-write failures so an
hdb_deployment write error doesn't fail a deploy that successfully
replicated to peers (peer_results is observability, not critical path).

Tests:
- unit: 5 line-buffering scenarios including UTF-8 split-byte reassembly
- unit: 11 recordPeers / awaitDeploymentRow scenarios
- integration: peer-side branch via seeded row (single-node simulation
  of the operation shape origin produces for peers)

The true 3-node cluster verification lives in the matching harper-pro
PR — replicateOperation is implemented there.

Cross-model reviewed: agy flagged the UTF-8 split-chunk bug, the
recordPeers crash-on-DB-error, and the polling latency floor — all
addressed in this commit.

Closes #758

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 23, 2026

Reviewed; no blockers found.

The deploy hangs after extraction when reading a Web ReadableStream from
a file-backed Blob inside the same Harper process on Bun — same code
passes on Node v22/v24 across Linux and Windows. The harper-pro 3-node
cluster test (HarperFast/harper-pro#221) covers the same code path
end-to-end with real replication, so this skip doesn't lose coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kriszyp kriszyp requested review from Ethan-Arrowood and heskew May 23, 2026 04:00
kriszyp and others added 2 commits May 23, 2026 08:39
recordPeers was issuing its own put() concurrent with coalesced
emitter-triggered puts from phase events. Each put captures `this.record`
as the serializer sees it, and we were observing peer_results=[] in the
persisted row even though recordPeers mutated the in-memory state to a
populated array. The race: an earlier scheduleFlush's put (called when
peer_results was still []) sometimes completes AFTER recordPeers' direct
put, overwriting our write with the stale snapshot.

Fix: recordPeers no longer puts. It stashes the input on the recorder
and mutates the in-memory record (so live SSE readers still see the
update immediately). finish() re-applies the stash right before the
terminal status-transition put, bundling peer_results with status=success
into one atomic write. No concurrent puts after this point.

Unit test rewritten to reflect the new contract: recordPeers populates
the in-memory record but persistence happens at finish().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The .catch in recordPeers' put was removed when peer_results moved into
finish(). The logger import is no longer needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deployment tracking — Slice B2: peer-side blob reads + install-line streaming

1 participant