fix(observability): demote backend Cloudflare anti-bot wrap (Sentry TAURI-RUST-34H)#2692
Conversation
…AURI-RUST-34H) The composio backend endpoint (e.g. `/agent-integrations/composio/connections`) wraps an upstream Cloudflare anti-bot challenge as `Backend returned 500 Internal Server Error … 403 <!DOCTYPE html>…<title>Just a moment...</title>…`. The 500 escapes the 4xx-only `is_backend_user_error_message` classifier and floods Sentry with ~8.9k events / 14d on self-hosted `tauri-rust` (TAURI-RUST-34H, sibling -32G / -34J / -323 share the same cascade). The CF interstitial is keyed by the user's network reputation / geo / cookie state — there is nothing in `openhuman_core` that can act on it. Backend ops or the user's network is the remediation path; Sentry has no signal. Add a double-anchor body-shape arm to `is_provider_user_state_message`: `"just a moment..."` AND `"cloudflare"` must both be present to demote. The double-anchor avoids colliding with unrelated bodies that merely mention either phrase in a different context. Tests: - Positive: canonical TAURI-RUST-34H wire shape (with full HTML body) classifies as `ExpectedErrorKind::ProviderUserState`. - Positive: minimal `"Just a moment...\ncloudflare\n"` body classifies. - Negative: half-anchor only (`"Just a moment, while we restart..."` or `"Powered by Cloudflare"` alone) does NOT classify. - Negative: genuine backend 500 (`"database connection pool exhausted"`) does NOT classify — stays a real backend bug that reaches Sentry. Sentry-Issue: TAURI-RUST-34H The real root fix belongs in `tinyhumansai/backend`: the IntegrationClient should propagate Cloudflare's 403 as 403, not wrap it as 500. That follow-up will be tracked separately.
📝 WalkthroughWalkthroughError classification is extended to recognize Cloudflare anti-bot interstitial bodies as expected errors. A double-anchor check for both "just a moment..." and "cloudflare" tokens in lowercased messages demotes those errors from Sentry capture. Comprehensive tests cover full HTML, minimal bodies, single-anchor discrimination, and non-Cloudflare backend failures. ChangesCloudflare Anti-Bot Error Demotion
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/core/observability.rs (1)
2354-2377: 💤 Low valueConsider adding a true half-anchor test for the first anchor.
half_auses a comma ("Just a moment,") rather than three dots, so it doesn't actually contain the anchor"just a moment...". This tests a "neither anchor" scenario rather than a true half-anchor case. Consider adding a message that contains the exact anchor to verify the double-anchor logic:// True half-anchor: has "just a moment..." but lacks "cloudflare" let true_half_a = "Just a moment... while we check your connection"; assert_ne!( expected_error_kind(true_half_a), Some(ExpectedErrorKind::ProviderUserState), "`Just a moment...` alone (no `cloudflare`) must NOT match the CF anti-bot arm" );The current test is still valid for real-world scenarios (similar-sounding phrases), but the additional case would ensure the double-anchor conjunction is explicitly verified.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/core/observability.rs` around lines 2354 - 2377, In the test does_not_classify_half_anchor_cloudflare_messages_as_user_state add a true half-anchor case: create a new string (e.g., true_half_a) that contains the exact anchor "Just a moment..." but omits "cloudflare", then call expected_error_kind(true_half_a) and assert_ne it against Some(ExpectedErrorKind::ProviderUserState) to ensure the double-anchor logic doesn't match when only the first anchor is present; keep the same assertion style as the existing half_a/half_b checks using expected_error_kind and ExpectedErrorKind::ProviderUserState.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/core/observability.rs`:
- Around line 2354-2377: In the test
does_not_classify_half_anchor_cloudflare_messages_as_user_state add a true
half-anchor case: create a new string (e.g., true_half_a) that contains the
exact anchor "Just a moment..." but omits "cloudflare", then call
expected_error_kind(true_half_a) and assert_ne it against
Some(ExpectedErrorKind::ProviderUserState) to ensure the double-anchor logic
doesn't match when only the first anchor is present; keep the same assertion
style as the existing half_a/half_b checks using expected_error_kind and
ExpectedErrorKind::ProviderUserState.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4b001216-6f6c-4058-b16c-856745313afe
📒 Files selected for processing (1)
src/core/observability.rs
graycyrus
left a comment
There was a problem hiding this comment.
Clean, focused fix — double-anchored body-shape matcher for the CF anti-bot interstitial is the right approach. Tests are solid (canonical wire shape, minimal body, both half-anchor negatives, genuine 500 negative). No new dispatch arms needed, existing precedence preserved.
Good call on the follow-up note about the backend root fix (propagating 403 instead of wrapping as 500) — that's where the real fix belongs long-term.
|
@sanil-23 check on the CI checks |
Summary
Backend returned 500 … <title>Just a moment...</title> … cloudflare …) from Sentry errors totracing::info!breadcrumbs via a new arm inis_provider_user_state_message.ExpectedErrorKind::ProviderUserStateinfo-tier — no new kind, no new dispatch arm.tauri-rust(Sentry TAURI-RUST-34H); sibling IDs in the same cascade (-32G,-34J,-323) may also collapse under this fix.Problem
Self-hosted Sentry's #5 unresolved issue by event count on
tauri-rust(8,851 events / 14 d) is the wire shape:This is the same cascade as
BACKEND-NODEJS-2(679 user impact — the Cloudflare 403 as seen by the backend itself). The composio backend endpoint's upstream is behind a Cloudflare zone with bot-challenge enabled and trips on cold caches / specific geos.Two layers wrong:
tinyhumansai/backend):IntegrationClientwraps the upstream CF 403 as a 500 and embeds the CF HTML body. Should propagate the 403 as a 403.is_backend_user_error_message. The body is unmistakably a Cloudflare anti-bot interstitial — Sentry has no remediation path (the CF challenge is keyed by the user's network reputation / geo / cookie state, not anythingopenhuman_corecan act on).Solution
Add a body-shape arm to
is_provider_user_state_messageinsrc/core/observability.rsthat double-anchors on"just a moment..."AND"cloudflare". Both must be present to demote — half-anchors (daemon-restart spinner blurb, unrelated CF Workers footer) still reach Sentry.The new arm sits in the existing
is_provider_user_state_messagechain so it routes through the establishedExpectedErrorKind::ProviderUserStatedispatch (info-tier breadcrumb only, no Sentry event). No new variant, no new dispatch arm.Tests cover:
ExpectedErrorKind::ProviderUserState."Just a moment...\ncloudflare\n"classifies — guards against future renderings with stripped HTML / alternate caller wrappers."Just a moment, while we restart the daemon"alone OR"Powered by Cloudflare"alone does NOT classify."Backend returned 500 … database connection pool exhausted"still classifies asNone(reaches Sentry as an actionable signal).Submission Checklist
src/core/observability.rscovering both positive shapes (full wire + minimal) and both negative shapes (half-anchor + genuine 500).pnpm test:coverage/pnpm test:rustnot run locally;cargo test --lib core::observability::testspasses (92/92).## Related— same reason; no matrix rows touched.Closes \#NNNin the## Relatedsection — Sentry-only fix, no GitHub issue.Impact
openhuman_core.tauri-rust(TAURI-RUST-34H). Sibling IDs-32G,-34J,-323are believed to share the same cascade and may also collapse under this fix — not verified in this PR; tracked in retrospect.Related
Sentry-Issue: TAURI-RUST-34H
Follow-ups:
tinyhumansai/backend:IntegrationClientshould propagate Cloudflare's 403 as 403, not wrap it as 500. Tracked as a separate PR against the backend repo.TAURI-RUST-*prefix (current regex is anchored toOPENHUMAN-(TAURI|REACT|CORE)). Without it, the post-merge Sentry-resolve sweep will skip self-hostedtauri-rustIDs.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/sentry-34h-cloudflare-antibot-wrap889744797ffb4803782c170c524bba17c9016c6cValidation Run
pnpm --filter openhuman-app format:check— Rust-only change, no frontend touched.pnpm typecheck— Rust-only change.cargo test --lib core::observability::tests— 92 passed / 0 failed, including 4 new TAURI-RUST-34H tests.cargo fmt --checkclean;cargo check --manifest-path Cargo.tomlclean;cargo clippy --lib --manifest-path Cargo.toml= 168 warnings (matchesmainbaseline, no new warnings introduced).app/src-taurinot touched.Validation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
"just a moment..."and"cloudflare") are demoted from Sentry errors totracing::info!breadcrumbs.tauri-rustproject's "what's burning" view.Parity Contract
is_provider_user_state_messageleft untouched; the new arm is appended at the end of the chain so existing matches keep their precedence. Existingis_backend_user_error_message4xx coverage unchanged (the new arm only catches 500-wrapped CF bodies that were never in scope of the 4xx classifier).Duplicate / Superseded PR Handling
Summary by CodeRabbit
Bug Fixes
Tests