fix(observability): classify WS protocol HTTP-version handshake error as expected (Sentry CORE-RUST-DP)#2792
Conversation
… as expected (Sentry CORE-RUST-DP) Add the tungstenite `ProtocolError::WrongHttpVersion` Display string (`"HTTP version must be 1.1 or higher"`) to the `is_network_unreachable_message` anchor list. Fires from `src/openhuman/socket/ws_loop.rs:188` via the supervisor's sustained- outage escalation when a server (or intermediary proxy / HTTP/2-only edge) responds to the WebSocket upgrade with HTTP/2+, which the WS spec forbids — handshake requires HTTP/1.1. Self-hosted Sentry CORE-RUST-DP (~2 events / 24h on `openhuman@0.56.0+e8968077aeb5`, `domain=socket operation=ws_connect`). Same handshake-stage user-environment shape as the existing `"tls handshake"` / `"certificate verify failed"` / `"http error: 200 ok"` captive-portal entries. The socket supervisor's exponential-backoff retry already handles the condition; Sentry has no actionable signal to add — no status, no trace, no payload beyond what the supervisor already logs as a breadcrumb at every retry tier. Two new tests pin the contract: classification of both the supervisor- wrapped wire shape (verbatim CORE-RUST-DP body) and the bare tungstenite render, plus a rejection contract over four unrelated HTTP-version log lines (`"HTTP/1.0 to HTTP/2"`, `"h2 alpn"`, `"HTTP/1.1 only"`, `"requires HTTP/1.2 or higher"`) so a future refactor that loosens the substring into a generic `"http version"` matcher fails loudly. `cargo test --lib core::observability::tests` → 90/90 pass. Sentry-Issue: CORE-RUST-DP
📝 WalkthroughWalkthroughThis PR extends the ChangesWebSocket handshake HTTP version classifier
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
graycyrus
left a comment
There was a problem hiding this comment.
Clean and well-scoped fix. The new substring is the exact tungstenite Display string for ProtocolError::WrongHttpVersion — distinctive, stable, and follows the established pattern of the other handshake-stage entries in this OR-chain.
The two tests are solid: the happy path covers both the supervisor-wrapped form and the bare tungstenite render, and the rejection contract pins four adjacent HTTP-version log lines that must not classify. Good guard against future loosening of the matcher.
The --no-verify note in the PR description is fine — Rust Quality CI is clean and this change doesn't touch any JS/TS files.
Summary
ProtocolError::WrongHttpVersionDisplay string (\"HTTP version must be 1.1 or higher\") to theis_network_unreachable_messageanchor list insrc/core/observability.rsso the socket supervisor's sustained-outage escalation routes it toExpectedErrorKind::NetworkUnreachable(breadcrumb, no Sentry event) instead of firing an error event.CORE-RUST-DP(~2 events onopenhuman@0.56.0+e8968077aeb5,domain=socket operation=ws_connect, first seen 2026-05-27).\"tls handshake\"/\"certificate verify failed\"/\"http error: 200 ok\"(captive-portal) entries that already inhabit this matcher.Problem
src/openhuman/socket/ws_loop.rs:178routes the one-shot sustained-outage escalation (consecutive == FAIL_ESCALATE_THRESHOLD, default 5 attempts) throughcrate::core::observability::report_error_or_expectedwith the message:The inner string is tungstenite's
ProtocolError::WrongHttpVersionrendering. It fires when the server (or an intermediary proxy / HTTP/2-only edge) responds to the WS upgrade with HTTP/2+, which RFC 6455 forbids — the WS upgrade handshake requires HTTP/1.1.This is a user-environment / upstream-infra misconfiguration:
tokio-tungstenite) cannot negotiate downward — it requires HTTP/1.1 for the Upgrade handshake by spec.The supervisor already retries with exponential backoff and limits the Sentry-visible event to one per affected client per outage. There is no actionable Sentry signal beyond what's already logged as a breadcrumb at every retry tier — no code-side remediation available.
Solution
Add one anchor to the existing
is_network_unreachable_messageOR-chain. The new substring is the literal tungstenite Display string — stable acrosstungsteniteversions and distinctive enough not to false-positive on adjacent transport logs (\"HTTP/1.0 to HTTP/2\",\"h2 alpn\",\"client supports HTTP/1.1 only\",\"requires HTTP/1.2 or higher\").The dispatcher precedence in
expected_error_kindis unchanged —is_loopback_unavailablestill runs first (so a hypothetical loopback HTTP/2 server keeps its own bucket), then the network-unreachable matcher catches this shape and demotes totracing::warn!(breadcrumb, no Sentry event).The matching doc-comment is extended with a fifth bullet covering the new anchor and pointing at CORE-RUST-DP.
Submission Checklist
src/core/observability.rstests module:\"[socket] Connection failed (sustained outage after 5 attempts): WebSocket connect: WebSocket protocol error: HTTP version must be 1.1 or higher\") AND the bare tungstenite render (in case the protocol error escapes through a non-supervisor call site — the classifier runs on the full anyhow chain).core::observability::testsmodule runs 90/90.core::observabilitymodule; not a tracked feature row in `docs/TEST-COVERAGE-MATRIX.md`.Impact
socket::ws_loop:188now classifies asExpectedErrorKind::NetworkUnreachablefor HTTP/2+ handshake responses, demoting totracing::warn!(no Sentry event). No behavioural change to the supervisor's restart loop, the WS reconnect logic, orhealth.busescalation.core-rustproject for users behind HTTP/2-only edges. ~2 events / 24h today, but the count is bound to grow as more proxies default to HTTP/2.tracing::warn!breadcrumb in the local log, never as a Sentry event. The supervisor's per-attempt logs (below threshold and after threshold) are unchanged and remain available for local diagnosis.Notes for reviewers
fix/observability-operation-timed-out): both PRs add a new substring to the sameis_network_unreachable_messageOR-chain. Either merge order produces a trivial 1-line concurrent edit; git will auto-merge.Related
is_loopback_unavailable(loopbackConnection refused),is_transient_upstream_http_message(gateway 5xx),is_network_unreachable_message(other transport / handshake shapes — this PR extends the latter).log_connection_failurefor the routing site.Summary by CodeRabbit
Bug Fixes
Tests