Skip to content

docs(adr): WebSocket transport contract + resilience (ADR-0032, ADR-0033)#62

Merged
NotAProfDev merged 5 commits into
mainfrom
docs/adr-ws-transport-stack
Jul 1, 2026
Merged

docs(adr): WebSocket transport contract + resilience (ADR-0032, ADR-0033)#62
NotAProfDev merged 5 commits into
mainfrom
docs/adr-ws-transport-stack

Conversation

@NotAProfDev

@NotAProfDev NotAProfDev commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Closes #61.

Designs the WebSocket transport — the streaming sibling of the HTTP pair (ADR-0030 + ADR-0031) that ADR-0029 deferred. Documentation only; no crate implementation. Grounded in IBKR's Client Portal WS and cross-checked against Binance and Coinbase so the generic transport carries no IBKR-shaped assumption (all venue semantics verified against live docs, not recalled).

ADR-0032 — WebSocket transport contract

Untyped duplex frame channel (grammar/demux stay adapter-side); asymmetric Stream recv / RPITIT one-shot send, split owned halves; minimal Frame enum (default stack delivers only data frames); epoch-stamped lifecycle channel (feed-down first-class for ADR-0004/0022); uniform no-silent-drop backpressure guarantee; WsConnector leaf over tokio-tungstenite + rustls; per-transport AuthSource, one shared IbkrAuthSource.

ADR-0033 — WebSocket resilience stack (the ADR-0031 analogue)

  • Two-seam composition — uniform WsConnector inside / richer ReconnectingConnection out (the tower LayerService split); ServiceBuilder reuse is assembly ergonomics, not the resilience abstraction.
  • Reconnect = spawned actor over a new runtime-neutral Spawn seam (mirrors Timer; backend supplies the tokio impl); channel-backed stable sink, per-(re)connect auth re-inject, epoch bump, capped-exp backoff + connection-attempt-rate cap.
  • Two-axis stack (connect-time chain + per-frame recv/send pipelines), not ADR-0031's single line; ordering invariants preserved (Auth-inside-Reconnect, Heartbeat socket-side of Buffer, Tracing outermost).
  • Heartbeat = transport-liveness only (auto-Pong mandatory, passive idle-timeout + active keepalive-when-idle probe → Stale); session keepalive is adapter-side (venue grammar).
  • Lifecycle = watch of an epoch-stamped LifecycleSnapshot (runtime-neutral async-watch) — lossless for the feed-down edge, never blocks the actor; all fields level-or-cumulative.
  • Buffer = actor-owned drop-oldest ring, dual min(count, bytes) bound (Coinbase multi-MB frames); total_lagged is a coarse grammar-blind hint → adapter conservative reconcile.
  • Circuit breaker inverts for transient, survives for permanent: transient → retry forever + risk-layer halt (ADR-0022); permanent → terminal Unrecoverable (avoid IP ban).
  • Send-axis SendRateLimit (Binance ~5 msg/s → disconnect/ban); force_reconnect on a control handle, not the sink.
  • Construction: net-ws-api::stack() / net-ws-tungstenite::build() split; non-generic WsConfig (no RateKey); new dev-only net-ws-mock with a deterministic manually-pumped MockSpawn.

ADR-0032 amended in place (unmerged, same PR): smd 10→15 min silent per-topic expiry; Unrecoverable added to ConnState; watch/LifecycleSnapshot delivery form and cumulative total_lagged resolved.

Numbering: the net-http construction-surface amendments (separate workstream) take ADR-0034, not 0032.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Added new ADRs covering the WebSocket transport contract (duplex frame transport, drop-oldest backpressure, control frame bypass) and an epoch-based connection lifecycle model.
    • Documented WebSocket resilience behavior, including reconnect supervision, per-attempt auth refresh, heartbeat/control handling, and circuit-breaker/backoff semantics.
  • Chores
    • Updated the unreleased changelog entry to reflect the new WebSocket design and its validation against major exchange WebSocket semantics.

NotAProfDev and others added 2 commits June 30, 2026 18:16
Fix the oath-adapter-net-ws-api contract that ADR-0029 deferred as "a
deliberate later session", mirroring ADR-0030 for HTTP and grounded in
IBKR's Client Portal WebSocket:

- untyped duplex frame channel; subscription grammar + demux adapter-side
- asymmetric shape: Stream recv / RPITIT one-shot send, split owned halves
- minimal Frame enum; default stack delivers only data frames to the adapter
- separate epoch-stamped lifecycle channel (feed-down is first-class)
- recovery split: transport reconnects, adapter replays + differential reconcile
- uniform no-silent-drop backpressure guarantee; per-stream policy adapter-side
- WsConnector leaf over tokio-tungstenite + rustls
- per-transport AuthSource trait, one shared IbkrAuthSource impl

Resilience layer stack follows in ADR-0033.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add ADR-0033, the WS analogue of ADR-0031 (HTTP resilience), completing the
pair ADR-0032 deferred. Grounded in IBKR's Client Portal WS and cross-checked
against Binance and Coinbase so the generic transport carries no IBKR-shaped
assumption.

Decisions: two-seam composition (uniform WsConnector inside / richer
ReconnectingConnection out, the tower Layer->Service split); reconnect as a
spawned actor over a new runtime-neutral Spawn seam (mirrors Timer); a two-axis
layer stack (connect-time chain + per-frame recv/send pipelines) with the
0031 ordering invariants; transport-liveness vs adapter session-keepalive split
(mandatory auto-Pong, passive idle + active keepalive-when-idle probe); lifecycle
as a watch of an epoch-stamped LifecycleSnapshot (lossless for the feed-down edge,
never blocks the actor); dual count+byte drop-oldest buffer; a circuit breaker
that retries transient loss forever but surfaces permanent failure as
Unrecoverable; a send-axis rate limit; and force_reconnect on a control handle.

Amends ADR-0032 in place (same PR, unmerged): smd 10->15min silent per-topic
expiry; Unrecoverable added to ConnState; watch/LifecycleSnapshot delivery form
and cumulative total_lagged resolved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c7d1b672-423b-4cf3-92de-036d4e44a7d3

📥 Commits

Reviewing files that changed from the base of the PR and between cd92798 and f123bb3.

📒 Files selected for processing (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

This PR adds a changelog entry and two ADRs: ADR-0032 defines the WebSocket transport contract, and ADR-0033 defines the resilience stack around reconnects, lifecycle reporting, buffering, circuit breaking, and send limiting. No implementation code is added.

Changes

WebSocket ADR documentation

Layer / File(s) Summary
ADR-0032 scope and core contract decision
CHANGELOG.md, docs/adr/0032-websocket-transport-contract-duplex-frames-lifecycle.md
Adds a changelog bullet and defines ADR-0032's scope, IBKR grounding, and the untyped duplex frame channel decision.
ADR-0032 channel shape, Frame enum, and lifecycle channel
docs/adr/0032-websocket-transport-contract-duplex-frames-lifecycle.md
Specifies asymmetric send/recv halves, the minimal Frame enum, and the epoch-stamped Lifecycle/ConnState signaling plane.
ADR-0032 recovery, backpressure, seams, and rationale
docs/adr/0032-websocket-transport-contract-duplex-frames-lifecycle.md
Documents recovery responsibilities, no-silent-drop backpressure, WsConnector/AuthSource seams, rejected alternatives, consequences, and ADR relationships.
ADR-0033 scope, seams, actor, and stack ordering
docs/adr/0033-websocket-resilience-reconnect-actor-watch-lifecycle.md
Defines ADR-0033 scope, grounding cases, the two-seam composition, the spawned reconnect actor, two-axis stack ordering, and heartbeat/keepalive split.
ADR-0033 lifecycle watch, backpressure, circuit breaker, rate limit
docs/adr/0033-websocket-resilience-reconnect-actor-watch-lifecycle.md
Specifies the LifecycleSnapshot watch, dual-bound drop-oldest buffering, circuit breaker behavior, SendRateLimit, and force_reconnect.
ADR-0033 construction, mock testability, alternatives, relationships
docs/adr/0033-websocket-resilience-reconnect-actor-watch-lifecycle.md
Documents stack()/build() construction, the mock crate, considered alternatives, consequences, and relationships to other ADRs.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Possibly related PRs

  • NotAProfDev/oath#56: Extends the earlier transport-split/layering and circuit-breaker ideas into a WebSocket-specific contract and resilience design.

Suggested labels: documentation, architecture

Suggested reviewers: NotAProfDev

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title uses a valid Conventional Commits docs scope and accurately summarizes the WebSocket ADR additions.
Linked Issues check ✅ Passed The ADRs cover the requested WebSocket contract and resilience design, including lifecycle, buffering, reconnect, and venue cross-checks.
Out of Scope Changes check ✅ Passed The PR stays within documentation-only ADR and changelog updates, with no implementation or unrelated feature changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/adr-ws-transport-stack

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/adr/0032-websocket-transport-contract-duplex-frames-lifecycle.md`:
- Around line 60-63: Make WsSink::close terminal by updating the contract so
once close is called, the sink cannot be used for send again. Clarify the
lifecycle in the WsSink trait and any related notes in the duplex frames section
so shutdown is one-way and final, and adjust the close/send semantics to reflect
that the sink becomes unusable after close is requested.

In `@docs/adr/0033-websocket-resilience-reconnect-actor-watch-lifecycle.md`:
- Around line 203-210: The “dual bound” description in the ADR is misleading
because the policy is not a strict byte cap if an oversized frame is still
retained. Reword the buffer policy language around the byte budget to describe
it as a soft memory budget with an exception for a single oversized newest
frame, and keep the explanation aligned with the websocket ring-buffer behavior
described in the same section.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1fa54f2e-2d70-4a1a-ae1e-b44186f66de4

📥 Commits

Reviewing files that changed from the base of the PR and between 19f7b56 and 6aec463.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • docs/adr/0032-websocket-transport-contract-duplex-frames-lifecycle.md
  • docs/adr/0033-websocket-resilience-reconnect-actor-watch-lifecycle.md

Comment thread docs/adr/0033-websocket-resilience-reconnect-actor-watch-lifecycle.md Outdated
NotAProfDev and others added 3 commits July 1, 2026 18:44
Review + CodeRabbit fixes on the unmerged pair:

- ADR-0032 §4: drop `Lagged { count }` from the `ConnState` enum — lag is not a
  connection phase (a socket can be `Connected` *and* lagging); it is carried as
  the cumulative `total_lagged` field on the ADR-0033 §5 `LifecycleSnapshot`.
  Reconciles §4 with §5 (which already omitted it) and fixes the §6 `Lagged{count}`
  literal to reference the cumulative signal.
- ADR-0033 §1/§9: define the `ReconnectingConnector` trait `stack()`/`build()`
  return (`impl ReconnectingConnector`) — previously undefined and easily read as
  a typo of `ReconnectingConnection`; add it to the glossary.
- ADR-0033 §7: stop referencing an undefined `Failed{}` state; describe the
  optional `max_attempts` cap as a distinct optional terminal outcome not added to
  the core `ConnState`.
- ADR-0033 §5: mark the top-level snapshot `epoch` canonical (the epoch echoed in
  `Connected`/`Resumed` is the same value) — one source of truth.
- CodeRabbit: `WsSink::close(self)` (terminal, one-way shutdown) instead of
  `close(&mut self)`; reword the §6 buffer bound as a soft backlog budget with a
  never-drop-newest exception, not a hard memory cap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Quality pass over the review-fix commit — no design change, only trims
redundancy the fixes introduced:

- ADR-0033 §1: drop the ReconnectingConnector explanatory paragraph (the inline
  comment + the following Layer->Service paragraph already carry the
  usage-vs-composition-seam distinction); fold the -or/-ion note into the block.
- ADR-0033 §6: drop the closing clause restating "not a hard ceiling".
- ADR-0033 §7: drop the parenthetical re-defining Unrecoverable.
- ADR-0033 §5: tighten the canonical-epoch comment (no all-caps / mechanism noise).
- ADR-0032 §4: tighten the Lagged-not-a-phase comment; make line 131 say the
  buffer layer "advances total_lagged (the Lagged signal)" rather than "emits
  Lagged", so it no longer reads as a ConnState variant parallel to Resumed/Stale.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Design the WebSocket transport stack: contract (ADR-0032) + resilience (ADR-0033)

1 participant