docs(adr): WebSocket transport contract + resilience (ADR-0032, ADR-0033)#62
Conversation
Fix the oath-adapter-net-ws-api contract that ADR-0029 deferred as "a deliberate later session", mirroring ADR-0030 for HTTP and grounded in IBKR's Client Portal WebSocket: - untyped duplex frame channel; subscription grammar + demux adapter-side - asymmetric shape: Stream recv / RPITIT one-shot send, split owned halves - minimal Frame enum; default stack delivers only data frames to the adapter - separate epoch-stamped lifecycle channel (feed-down is first-class) - recovery split: transport reconnects, adapter replays + differential reconcile - uniform no-silent-drop backpressure guarantee; per-stream policy adapter-side - WsConnector leaf over tokio-tungstenite + rustls - per-transport AuthSource trait, one shared IbkrAuthSource impl Resilience layer stack follows in ADR-0033. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add ADR-0033, the WS analogue of ADR-0031 (HTTP resilience), completing the pair ADR-0032 deferred. Grounded in IBKR's Client Portal WS and cross-checked against Binance and Coinbase so the generic transport carries no IBKR-shaped assumption. Decisions: two-seam composition (uniform WsConnector inside / richer ReconnectingConnection out, the tower Layer->Service split); reconnect as a spawned actor over a new runtime-neutral Spawn seam (mirrors Timer); a two-axis layer stack (connect-time chain + per-frame recv/send pipelines) with the 0031 ordering invariants; transport-liveness vs adapter session-keepalive split (mandatory auto-Pong, passive idle + active keepalive-when-idle probe); lifecycle as a watch of an epoch-stamped LifecycleSnapshot (lossless for the feed-down edge, never blocks the actor); dual count+byte drop-oldest buffer; a circuit breaker that retries transient loss forever but surfaces permanent failure as Unrecoverable; a send-axis rate limit; and force_reconnect on a control handle. Amends ADR-0032 in place (same PR, unmerged): smd 10->15min silent per-topic expiry; Unrecoverable added to ConnState; watch/LifecycleSnapshot delivery form and cumulative total_lagged resolved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds a changelog entry and two ADRs: ADR-0032 defines the WebSocket transport contract, and ADR-0033 defines the resilience stack around reconnects, lifecycle reporting, buffering, circuit breaking, and send limiting. No implementation code is added. ChangesWebSocket ADR documentation
Estimated code review effort: 3 (Moderate) | ~25 minutes Possibly related PRs
Suggested labels: documentation, architecture Suggested reviewers: NotAProfDev 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/adr/0032-websocket-transport-contract-duplex-frames-lifecycle.md`:
- Around line 60-63: Make WsSink::close terminal by updating the contract so
once close is called, the sink cannot be used for send again. Clarify the
lifecycle in the WsSink trait and any related notes in the duplex frames section
so shutdown is one-way and final, and adjust the close/send semantics to reflect
that the sink becomes unusable after close is requested.
In `@docs/adr/0033-websocket-resilience-reconnect-actor-watch-lifecycle.md`:
- Around line 203-210: The “dual bound” description in the ADR is misleading
because the policy is not a strict byte cap if an oversized frame is still
retained. Reword the buffer policy language around the byte budget to describe
it as a soft memory budget with an exception for a single oversized newest
frame, and keep the explanation aligned with the websocket ring-buffer behavior
described in the same section.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 1fa54f2e-2d70-4a1a-ae1e-b44186f66de4
📒 Files selected for processing (3)
CHANGELOG.mddocs/adr/0032-websocket-transport-contract-duplex-frames-lifecycle.mddocs/adr/0033-websocket-resilience-reconnect-actor-watch-lifecycle.md
Review + CodeRabbit fixes on the unmerged pair:
- ADR-0032 §4: drop `Lagged { count }` from the `ConnState` enum — lag is not a
connection phase (a socket can be `Connected` *and* lagging); it is carried as
the cumulative `total_lagged` field on the ADR-0033 §5 `LifecycleSnapshot`.
Reconciles §4 with §5 (which already omitted it) and fixes the §6 `Lagged{count}`
literal to reference the cumulative signal.
- ADR-0033 §1/§9: define the `ReconnectingConnector` trait `stack()`/`build()`
return (`impl ReconnectingConnector`) — previously undefined and easily read as
a typo of `ReconnectingConnection`; add it to the glossary.
- ADR-0033 §7: stop referencing an undefined `Failed{}` state; describe the
optional `max_attempts` cap as a distinct optional terminal outcome not added to
the core `ConnState`.
- ADR-0033 §5: mark the top-level snapshot `epoch` canonical (the epoch echoed in
`Connected`/`Resumed` is the same value) — one source of truth.
- CodeRabbit: `WsSink::close(self)` (terminal, one-way shutdown) instead of
`close(&mut self)`; reword the §6 buffer bound as a soft backlog budget with a
never-drop-newest exception, not a hard memory cap.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Quality pass over the review-fix commit — no design change, only trims redundancy the fixes introduced: - ADR-0033 §1: drop the ReconnectingConnector explanatory paragraph (the inline comment + the following Layer->Service paragraph already carry the usage-vs-composition-seam distinction); fold the -or/-ion note into the block. - ADR-0033 §6: drop the closing clause restating "not a hard ceiling". - ADR-0033 §7: drop the parenthetical re-defining Unrecoverable. - ADR-0033 §5: tighten the canonical-epoch comment (no all-caps / mechanism noise). - ADR-0032 §4: tighten the Lagged-not-a-phase comment; make line 131 say the buffer layer "advances total_lagged (the Lagged signal)" rather than "emits Lagged", so it no longer reads as a ConnState variant parallel to Resumed/Stale. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes #61.
Designs the WebSocket transport — the streaming sibling of the HTTP pair (ADR-0030 + ADR-0031) that ADR-0029 deferred. Documentation only; no crate implementation. Grounded in IBKR's Client Portal WS and cross-checked against Binance and Coinbase so the generic transport carries no IBKR-shaped assumption (all venue semantics verified against live docs, not recalled).
ADR-0032 — WebSocket transport contract
Untyped duplex frame channel (grammar/demux stay adapter-side); asymmetric
Streamrecv / RPITIT one-shot send, split owned halves; minimalFrameenum (default stack delivers only data frames); epoch-stamped lifecycle channel (feed-down first-class for ADR-0004/0022); uniform no-silent-drop backpressure guarantee;WsConnectorleaf over tokio-tungstenite + rustls; per-transportAuthSource, one sharedIbkrAuthSource.ADR-0033 — WebSocket resilience stack (the ADR-0031 analogue)
WsConnectorinside / richerReconnectingConnectionout (the towerLayer→Servicesplit);ServiceBuilderreuse is assembly ergonomics, not the resilience abstraction.Spawnseam (mirrorsTimer; backend supplies the tokio impl); channel-backed stable sink, per-(re)connect auth re-inject, epoch bump, capped-exp backoff + connection-attempt-rate cap.Stale); session keepalive is adapter-side (venue grammar).watchof an epoch-stampedLifecycleSnapshot(runtime-neutralasync-watch) — lossless for the feed-down edge, never blocks the actor; all fields level-or-cumulative.min(count, bytes)bound (Coinbase multi-MB frames);total_laggedis a coarse grammar-blind hint → adapter conservative reconcile.Unrecoverable(avoid IP ban).SendRateLimit(Binance ~5 msg/s → disconnect/ban);force_reconnecton a control handle, not the sink.net-ws-api::stack()/net-ws-tungstenite::build()split; non-genericWsConfig(noRateKey); new dev-onlynet-ws-mockwith a deterministic manually-pumpedMockSpawn.ADR-0032 amended in place (unmerged, same PR):
smd10→15 min silent per-topic expiry;Unrecoverableadded toConnState;watch/LifecycleSnapshotdelivery form and cumulativetotal_laggedresolved.Numbering: the net-http construction-surface amendments (separate workstream) take ADR-0034, not 0032.
🤖 Generated with Claude Code
Summary by CodeRabbit