Skip to content

fix(observability): demote backend Cloudflare anti-bot wrap (Sentry TAURI-RUST-34H)#2692

Merged
senamakel merged 1 commit into
tinyhumansai:mainfrom
oxoxDev:fix/sentry-34h-cloudflare-antibot-wrap
May 28, 2026
Merged

fix(observability): demote backend Cloudflare anti-bot wrap (Sentry TAURI-RUST-34H)#2692
senamakel merged 1 commit into
tinyhumansai:mainfrom
oxoxDev:fix/sentry-34h-cloudflare-antibot-wrap

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented May 26, 2026

Summary

  • Demote backend-wrapped Cloudflare anti-bot challenge interstitials (Backend returned 500 … <title>Just a moment...</title> … cloudflare …) from Sentry errors to tracing::info! breadcrumbs via a new arm in is_provider_user_state_message.
  • Reuses existing ExpectedErrorKind::ProviderUserState info-tier — no new kind, no new dispatch arm.
  • Drops ~8.9 k events / 14 d on self-hosted tauri-rust (Sentry TAURI-RUST-34H); sibling IDs in the same cascade (-32G, -34J, -323) may also collapse under this fix.

Problem

Self-hosted Sentry's #5 unresolved issue by event count on tauri-rust (8,851 events / 14 d) is the wire shape:

Backend returned 500 Internal Server Error for GET https://api.tinyhumans.ai/agent-integrations/composio/connections: 403 <!DOCTYPE html><html lang="en-US"><head><title>Just a moment...</title>...Powered by Cloudflare...

This is the same cascade as BACKEND-NODEJS-2 (679 user impact — the Cloudflare 403 as seen by the backend itself). The composio backend endpoint's upstream is behind a Cloudflare zone with bot-challenge enabled and trips on cold caches / specific geos.

Two layers wrong:

  1. Backend root (separate follow-up in tinyhumansai/backend): IntegrationClient wraps the upstream CF 403 as a 500 and embeds the CF HTML body. Should propagate the 403 as a 403.
  2. Client classifier: a 500 escapes the 4xx-only is_backend_user_error_message. The body is unmistakably a Cloudflare anti-bot interstitial — Sentry has no remediation path (the CF challenge is keyed by the user's network reputation / geo / cookie state, not anything openhuman_core can act on).

Solution

Add a body-shape arm to is_provider_user_state_message in src/core/observability.rs that double-anchors on "just a moment..." AND "cloudflare". Both must be present to demote — half-anchors (daemon-restart spinner blurb, unrelated CF Workers footer) still reach Sentry.

The new arm sits in the existing is_provider_user_state_message chain so it routes through the established ExpectedErrorKind::ProviderUserState dispatch (info-tier breadcrumb only, no Sentry event). No new variant, no new dispatch arm.

Tests cover:

  • Positive (full wire shape): canonical TAURI-RUST-34H body with embedded HTML classifies as ExpectedErrorKind::ProviderUserState.
  • Positive (minimal): "Just a moment...\ncloudflare\n" classifies — guards against future renderings with stripped HTML / alternate caller wrappers.
  • Negative (half-anchor): "Just a moment, while we restart the daemon" alone OR "Powered by Cloudflare" alone does NOT classify.
  • Negative (real backend bug): "Backend returned 500 … database connection pool exhausted" still classifies as None (reaches Sentry as an actionable signal).

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy — 4 new tests in src/core/observability.rs covering both positive shapes (full wire + minimal) and both negative shapes (half-anchor + genuine 500).
  • Diff coverage ≥ 80% — every changed line in this PR is the new classifier arm (covered by the 4 new tests) or test code itself. Full pnpm test:coverage / pnpm test:rust not run locally; cargo test --lib core::observability::tests passes (92/92).
  • N/A: Coverage matrix updated — behaviour-only change in an observability classifier; no new user-facing feature row.
  • N/A: All affected feature IDs from the matrix are listed in the PR description under ## Related — same reason; no matrix rows touched.
  • No new external network dependencies introduced (mock backend used per Testing Strategy) — pure in-process classifier change, no network surface.
  • N/A: Manual smoke checklist updated if this touches release-cut surfaces — no UI / RPC surface change; the only visible effect is Sentry event volume.
  • N/A: Linked issue closed via Closes \#NNN in the ## Related section — Sentry-only fix, no GitHub issue.

Impact

  • Runtime/platform: desktop (all OSes). No mobile/web/CLI surface change.
  • Performance: zero — one extra substring check inside an already-cold error-classification path that only runs on the unexpected-error route.
  • Security: zero — does not silence any actionable error; the CF interstitial is keyed by network / geo / cookie state with no remediation path inside openhuman_core.
  • Migration / compatibility: none.
  • Sentry signal: drops ~8.9 k events / 14 d on self-hosted tauri-rust (TAURI-RUST-34H). Sibling IDs -32G, -34J, -323 are believed to share the same cascade and may also collapse under this fix — not verified in this PR; tracked in retrospect.

Related

Sentry-Issue: TAURI-RUST-34H

  • Believed-similar (same cascade, not verified): TAURI-RUST-32G, TAURI-RUST-34J, TAURI-RUST-323
  • Backend cascade twin: BACKEND-NODEJS-2 (679 user impact — same CF 403, raw at the backend layer)

Follow-ups:

  • Root fix belongs in tinyhumansai/backend: IntegrationClient should propagate Cloudflare's 403 as 403, not wrap it as 500. Tracked as a separate PR against the backend repo.
  • Phase 7 Step 4.5 grep regex needs widening to accept the TAURI-RUST-* prefix (current regex is anchored to OPENHUMAN-(TAURI|REACT|CORE)). Without it, the post-merge Sentry-resolve sweep will skip self-hosted tauri-rust IDs.

AI Authored PR Metadata (required for Codex/Linear PRs)

Keep this section for AI-authored PRs. For human-only PRs, mark each field N/A.

Linear Issue

  • Key: N/A — Sentry-only fix, no Linear ticket.
  • URL: N/A

Commit & Branch

  • Branch: fix/sentry-34h-cloudflare-antibot-wrap
  • Commit SHA: 889744797ffb4803782c170c524bba17c9016c6c

Validation Run

  • N/A: pnpm --filter openhuman-app format:check — Rust-only change, no frontend touched.
  • N/A: pnpm typecheck — Rust-only change.
  • Focused tests: cargo test --lib core::observability::tests — 92 passed / 0 failed, including 4 new TAURI-RUST-34H tests.
  • Rust fmt/check (if changed): cargo fmt --check clean; cargo check --manifest-path Cargo.toml clean; cargo clippy --lib --manifest-path Cargo.toml = 168 warnings (matches main baseline, no new warnings introduced).
  • N/A: Tauri fmt/check (if changed) — app/src-tauri not touched.

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: Backend-wrapped Cloudflare anti-bot interstitials (HTTP 500 body containing both "just a moment..." and "cloudflare") are demoted from Sentry errors to tracing::info! breadcrumbs.
  • User-visible effect: None on the desktop client (the call still fails; the UI surfaces its existing error toast); Sentry-side, ~8.9 k events / 14 d disappear from the tauri-rust project's "what's burning" view.

Parity Contract

  • Legacy behavior preserved: All existing arms in is_provider_user_state_message left untouched; the new arm is appended at the end of the chain so existing matches keep their precedence. Existing is_backend_user_error_message 4xx coverage unchanged (the new arm only catches 500-wrapped CF bodies that were never in scope of the 4xx classifier).
  • Guard/fallback/dispatch parity checks: Negative tests pin both half-anchors and the genuine-backend-500 surface so a future narrowing or broadening of the matcher surfaces here instead of silently re-bucketing.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: N/A
  • Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

  • Bug Fixes

    • Improved error classification for Cloudflare anti-bot interstitial messages, now treated as expected errors to reduce noise in error reporting.
  • Tests

    • Added comprehensive test cases for error handling including Cloudflare interstitials, minimal payloads, and edge cases to ensure accurate error discrimination.

Review Change Stack

…AURI-RUST-34H)

The composio backend endpoint (e.g. `/agent-integrations/composio/connections`)
wraps an upstream Cloudflare anti-bot challenge as `Backend returned 500
Internal Server Error … 403 <!DOCTYPE html>…<title>Just a moment...</title>…`.
The 500 escapes the 4xx-only `is_backend_user_error_message` classifier and
floods Sentry with ~8.9k events / 14d on self-hosted `tauri-rust`
(TAURI-RUST-34H, sibling -32G / -34J / -323 share the same cascade).

The CF interstitial is keyed by the user's network reputation / geo / cookie
state — there is nothing in `openhuman_core` that can act on it. Backend ops
or the user's network is the remediation path; Sentry has no signal.

Add a double-anchor body-shape arm to `is_provider_user_state_message`:
`"just a moment..."` AND `"cloudflare"` must both be present to demote.
The double-anchor avoids colliding with unrelated bodies that merely mention
either phrase in a different context.

Tests:
- Positive: canonical TAURI-RUST-34H wire shape (with full HTML body)
  classifies as `ExpectedErrorKind::ProviderUserState`.
- Positive: minimal `"Just a moment...\ncloudflare\n"` body classifies.
- Negative: half-anchor only (`"Just a moment, while we restart..."` or
  `"Powered by Cloudflare"` alone) does NOT classify.
- Negative: genuine backend 500 (`"database connection pool exhausted"`)
  does NOT classify — stays a real backend bug that reaches Sentry.

Sentry-Issue: TAURI-RUST-34H

The real root fix belongs in `tinyhumansai/backend`: the IntegrationClient
should propagate Cloudflare's 403 as 403, not wrap it as 500. That follow-up
will be tracked separately.
@oxoxDev oxoxDev requested a review from a team May 26, 2026 10:51
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

📝 Walkthrough

Walkthrough

Error classification is extended to recognize Cloudflare anti-bot interstitial bodies as expected errors. A double-anchor check for both "just a moment..." and "cloudflare" tokens in lowercased messages demotes those errors from Sentry capture. Comprehensive tests cover full HTML, minimal bodies, single-anchor discrimination, and non-Cloudflare backend failures.

Changes

Cloudflare Anti-Bot Error Demotion

Layer / File(s) Summary
Cloudflare anti-bot classification logic
src/core/observability.rs
is_provider_user_state_message now classifies errors containing both "just a moment..." and "cloudflare" tokens (case-insensitive) as ProviderUserState, demoting them from Sentry capture via the expected error reporting path.
Cloudflare classification test suite
src/core/observability.rs
Four test cases verify the double-anchor logic: full backend-wrapped Cloudflare HTML, minimal "Just a moment...\ncloudflare\n" bodies, rejection when only one anchor is present, and confirmation that genuine backend 500 errors without Cloudflare content are not demoted.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • tinyhumansai/openhuman#2090: Both extend is_provider_user_state_message to demote specific provider error patterns (Cloudflare anti-bot vs Kimi access_terminated 403).
  • tinyhumansai/openhuman#2239: Both demote provider-derived error messages from Sentry capture via ExpectedErrorKind classification (Cloudflare anti-bot vs provider config-rejection).
  • tinyhumansai/openhuman#1795: Both extend ProviderUserState classification logic to recognize provider-shaped error bodies and envelopes.

Suggested labels

bug

Suggested reviewers

  • graycyrus
  • senamakel

Poem

A Cloudflare wall blocks the weary, with just a moment's plea,
No more shall Sentry capture sighs of anti-bot majesty,
Double anchors demote the interstitial cry,
Tests confirm the logic—no false alarms nearby. 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: detecting and demoting backend-wrapped Cloudflare anti-bot messages to prevent Sentry noise.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the bug label May 26, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/core/observability.rs (1)

2354-2377: 💤 Low value

Consider adding a true half-anchor test for the first anchor.

half_a uses a comma ("Just a moment,") rather than three dots, so it doesn't actually contain the anchor "just a moment...". This tests a "neither anchor" scenario rather than a true half-anchor case. Consider adding a message that contains the exact anchor to verify the double-anchor logic:

// True half-anchor: has "just a moment..." but lacks "cloudflare"
let true_half_a = "Just a moment... while we check your connection";
assert_ne!(
    expected_error_kind(true_half_a),
    Some(ExpectedErrorKind::ProviderUserState),
    "`Just a moment...` alone (no `cloudflare`) must NOT match the CF anti-bot arm"
);

The current test is still valid for real-world scenarios (similar-sounding phrases), but the additional case would ensure the double-anchor conjunction is explicitly verified.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/observability.rs` around lines 2354 - 2377, In the test
does_not_classify_half_anchor_cloudflare_messages_as_user_state add a true
half-anchor case: create a new string (e.g., true_half_a) that contains the
exact anchor "Just a moment..." but omits "cloudflare", then call
expected_error_kind(true_half_a) and assert_ne it against
Some(ExpectedErrorKind::ProviderUserState) to ensure the double-anchor logic
doesn't match when only the first anchor is present; keep the same assertion
style as the existing half_a/half_b checks using expected_error_kind and
ExpectedErrorKind::ProviderUserState.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/core/observability.rs`:
- Around line 2354-2377: In the test
does_not_classify_half_anchor_cloudflare_messages_as_user_state add a true
half-anchor case: create a new string (e.g., true_half_a) that contains the
exact anchor "Just a moment..." but omits "cloudflare", then call
expected_error_kind(true_half_a) and assert_ne it against
Some(ExpectedErrorKind::ProviderUserState) to ensure the double-anchor logic
doesn't match when only the first anchor is present; keep the same assertion
style as the existing half_a/half_b checks using expected_error_kind and
ExpectedErrorKind::ProviderUserState.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4b001216-6f6c-4058-b16c-856745313afe

📥 Commits

Reviewing files that changed from the base of the PR and between b616f48 and 8897447.

📒 Files selected for processing (1)
  • src/core/observability.rs

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, focused fix — double-anchored body-shape matcher for the CF anti-bot interstitial is the right approach. Tests are solid (canonical wire shape, minimal body, both half-anchor negatives, genuine 500 negative). No new dispatch arms needed, existing precedence preserved.

Good call on the follow-up note about the backend root fix (propagating 403 instead of wrapping as 500) — that's where the real fix belongs long-term.

@graycyrus
Copy link
Copy Markdown
Contributor

@sanil-23 check on the CI checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants