fix(observability): suppress context-window-exceeded provider error from Sentry (Sentry TAURI-RUST-501)#2820
Conversation
…rom Sentry (Sentry TAURI-RUST-501) A custom / self-hosted provider that mis-reports context-overflow as HTTP 500 (`"custom API error (500 …): Context size has been exceeded."`) escaped every existing matcher and fired a raw Sentry event from the `providers::ops::api_error` suppression cascade (`domain=llm_provider operation=api_error`). Root cause: context-overflow is already handled — `reliable.rs` marks it non-retryable — but the matcher's phrase list missed this custom wording, AND the `api_error` cascade had no context-overflow branch at all (only budget / access-policy / config-rejection / auth). So the 500 fell through to `should_report_provider_http_failure` → Sentry. Fix (single-source matcher, three call sites): - `ops.rs`: new `pub fn is_context_window_exceeded_message` — the 8 established phrasings + `"context size has been exceeded"`, the one source of truth. New `log_context_window_exceeded` helper + an `is_context_window_exceeded` branch in the `api_error` cascade (status-agnostic — some gateways send 400, some 500) that demotes to a `warn!` breadcrumb instead of `report_error`. - `reliable.rs`: `is_context_window_exceeded(err)` now delegates to the shared matcher (preserves non-retryable behavior, picks up the new phrase, can't drift). - `observability.rs`: new `ExpectedErrorKind::ContextWindowExceeded` arm that delegates to the same matcher, catching the higher-layer re-report (`agent.run_single` / `web_channel.run_chat_task` under a different `domain` tag) — same two-emit-site pattern as the empty-response (4JX/4Z1) and session-expired (4P0/4K5) fixes. Polarity preserved: anchors are context-overflow specific, so rate-limit "exceeded", budget errors, and "model not found" still reach Sentry. Tests: 4 new in `provider::ops` (TAURI-RUST-501 500 body, established phrasings, rejection contract, log smoke) + 2 in `core::observability` (re-report classification + rejection). cargo test → observability 94, provider::ops 28, reliable 56, all pass. cargo fmt --check clean. Sentry-Issue: TAURI-RUST-501
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a shared classifier for "context window exceeded" provider messages, uses it in provider error handling to route such failures to a demotional breadcrumb ChangesContext Window Error Suppression and Observability
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/inference/provider/ops.rs`:
- Around line 569-580: The current HINTS array and the simple substring check
(HINTS.iter().any(|hint| lower.contains(hint))) are too broad and cause false
positives; tighten the matchers by removing or scoping generic phrases like "too
many tokens", "prompt is too long", and "input is too long", and/or replace the
plain contains check with anchored/word-boundary regexes (e.g., require
"\bcontext\b" nearby or use /\bcontext (window|length|size)\b/ and
/\btoken(s)?\b/ with context anchors) so only errors explicitly about model
context windows match; update the HINTS constant and the matching logic to use
these more specific patterns or regexes instead of bare substrings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cf9eab68-924f-4357-82ba-7a1bd87107cd
📒 Files selected for processing (3)
src/core/observability.rssrc/openhuman/inference/provider/ops.rssrc/openhuman/inference/provider/reliable.rs
graycyrus
left a comment
There was a problem hiding this comment.
@CodeGhost21 the code looks solid to me — clean refactor, single-source matcher is the right call, and the test coverage (4 in ops, 2 in observability, rejection contracts) is thorough. The cascade branch placement (status-agnostic, after config-rejection, before the should_report gate) is correct. One thing to sort out before I approve:
CI: "Build & smoke-test core image" is failing. Once that's green I'll come back and approve.
Minor observation (not blocking): In observability.rs, the new ContextWindowExceeded arm logs error = %message, but log_context_window_exceeded in ops.rs deliberately omits the body — it only logs operation/provider/status. At the re-report layer the message string is whatever agent.run_single / web_channel.run_chat_task propagated up, which may wrap more than just the sanitized provider body. Consistent with how the other usage-state warn arms are written (none log the full message), worth dropping the error = %message field here too.
…-501-context-window-exceeded # Conflicts: # src/core/observability.rs
…imit false-positives (review of tinyhumansai#2820) CodeRabbit flagged that `is_context_window_exceeded_message` — now also feeding `expected_error_kind` (Sentry suppression) and the `reliable` non-retryable decision — was too broad. Addressed the valid concern with a targeted two-tier anchor (the suggested `||contains("token")` guard was a no-op for the token phrases, which already contain "token", and would have broken valid "prompt is too long" / "input is too long" matches): - CONTEXT_HINTS (context/length phrases) match alone — unambiguous. - TOKEN_HINTS ("too many tokens", "token limit exceeded") collide with per-minute token RATE limits, which are transient 429s that must stay retryable AND keep reaching Sentry. They now count as context-overflow only when no rate-limit marker ("per minute", "rate limit", "tpm", "retry after", "try again in", …) is present. New test `token_rate_limits_are_not_context_overflow` pins the contract: three TPM rate-limit bodies must NOT classify, a marker-less token overflow still does. cargo test → observability 107, provider::ops context-window 5, reliable 56, all pass.
|
Actionable comments posted: 0 |
graycyrus
left a comment
There was a problem hiding this comment.
@CodeGhost21 the two-tier anchor approach in the latest commit is the right call — splitting CONTEXT_HINTS (unambiguous) from TOKEN_HINTS (ambiguous with TPM rate limits) and using the rate-limit marker set to disambiguate is clean and pins the contract with a dedicated rejection test. The CodeRabbit concern is fully addressed.
CI is still running on this new commit set. Once everything's green I'll approve this.
M3gA-Mind
left a comment
There was a problem hiding this comment.
All checks pass. Single-source context-overflow matcher with TPM disambiguation is the right call — clean refactor across ops/reliable/observability, thorough rejection contracts.
oxoxDev
left a comment
There was a problem hiding this comment.
Walkthrough
Routes context-window-exceeded through a single-source body matcher (ops::is_context_window_exceeded_message) at 3 call sites: retry classifier (reliable.rs), api_error Sentry-suppression cascade (ops.rs), and the higher-layer re-report path via new ExpectedErrorKind::ContextWindowExceeded arm in observability.rs. Adds the missing "context size has been exceeded" wording. Drops self-hosted Sentry TAURI-RUST-501 (custom gateway mis-reporting context overflow as HTTP 500). +284/-13, 3 files. 6 new tests including a TPM-rate-limit polarity guard. All CI green. CodeRabbit raised the broad-hints concern, author addressed with a STRONG/WEAK + TPM-marker split that's actually tighter than CR's suggestion.
Verified
- Single source of truth:
reliable.rs::is_context_window_exceedednow delegates toops::is_context_window_exceeded_message; phrase list lives once ✓ - Status-agnostic body match — correctly handles the TAURI-RUST-501 500 mis-report from custom gateway; OpenAI's 400
context_length_exceededstill matches the same body anchors ✓ - TPM-rate-limit disambiguation:
TOKEN_HINTS("too many tokens","token limit exceeded") only match when body carries NO rate-limit marker ("per minute","per min","rate limit","tpm","requests per","retry after") — pinned bytoken_rate_limits_are_not_context_overflowtest with 3 TPM shapes + 1 positive overflow case ✓ - Polarity guard
does_not_match_unrelated_bodiescovers rate-limit / model-not-found / insufficient-budget / tool-budget — confirms anchors are context-overflow specific ✓ report_expected_messagearm demotes towarn!breadcrumb witherror = %message— matches the SessionExpired / BackendUserError tier per the sibling arm convention ✓
Blockers
-
Dispatcher ordering doc mismatch in
expected_error_kind. The new arm sits betweenPromptInjectionBlockedandMemoryStoreBreakerOpen, but the doc comment says"Runs last so a more specific matcher always wins"— it does NOT run last in the dispatcher chain. The intent is probably "runs after the prompt-injection check so the prompt-injection anchor wins on overlapping bodies" — but reads as if every later matcher is less specific, which isn't true. Reword to "Runs after PromptInjectionBlocked so the prompt-injection anchor wins on overlapping bodies; ordering vs later arms is independent (no shared anchors)" — OR move the arm to the actual bottom of the chain to match the comment. -
Layering inversion
core::observability→openhuman::inference::provider— direct cross-module call inexpected_error_kind.corehistorically sits belowopenhuman. The shared-classifier pattern is reasonable, but I want a maintainer ack on the precedent before this lands — once one arm reaches sideways like this, future arms will follow the same shortcut. @graycyrus another pass — is the direction acceptable, or should the matcher move intocoreso the dispatcher stays self-contained?
Nits
RATE_LIMIT_MARKERSincludes"retry after"— a context-overflow body that incidentally includes "retry after 0s" hint (some providers tack it on for non-retryable too) would fail the WEAK match. Vanishingly rare; flag only."tpm"as a 3-char substring is broad."http://www.tpm.example.com"in a URL would trip it. Astronomically unlikely in error bodies, but if you ever surface URLs in upstream prose this gets a false-negative. Optional: anchor with" tpm "/"(tpm)"/"tpm:".
Questions
- Did you check the matcher across the full Sentry "domain=llm_provider operation=api_error" window for any past events that match WEAK hints + rate-limit marker simultaneously? Want to confirm the new TPM polarity doesn't accidentally let a real overflow slip through as a 429.
- Self-hosted Sentry — confirm
Closes TAURI-RUST-501is in the PR body so Phase 7 Step 4.5 self-hosted sweep flips it on merge.
Summary
"custom API error (500 …): Context size has been exceeded.") escaped every existing matcher and fired a raw Sentry event from theproviders::ops::api_errorsuppression cascade (domain=llm_provider operation=api_error provider=custom)."context size has been exceeded"phrasing.TAURI-RUST-501(openhuman@0.56.0+e8968077aeb5).Problem
Context-overflow is already a recognised condition:
providers::reliable::is_context_window_exceededmarks it non-retryable (retrying the same oversized request can't help). But two gaps let TAURI-RUST-501 through:"maximum context length","context length exceeded","prompt is too long", … but not"context size has been exceeded".providers::ops::api_errorSentry-suppression cascade (ops.rs) had no context-overflow branch at all — only budget / upstream-bad-request / access-policy / config-rejection / backend-auth. So the 500 fell through toshould_report_provider_http_failure(500)→report_error→ Sentry.api_errordoes not consultcore::observability::expected_error_kind— it has its own cascade — so the fix had to land in the cascade itself, not just the classifier.Solution — single source of truth, three call sites
ops.rs: newpub fn is_context_window_exceeded_message(body)— the 8 established phrasings +"context size has been exceeded"— the one place the phrasing lives. Newlog_context_window_exceededhelper (demotes towarn!breadcrumb), and a new branch in theapi_errorcascade:400 context_length_exceeded; this custom gateway →500). Matching the body keeps them in one bucket.reliable.rs:is_context_window_exceeded(err)now delegates to the shared matcher — preserves the non-retryable behavior, picks up the new phrase, can't drift.observability.rs: newExpectedErrorKind::ContextWindowExceededarm, also delegating to the shared matcher, to catch the higher-layer re-report (agent.run_single/web_channel.run_chat_taskre-raise the same error under a differentdomaintag). Same two-emit-site pattern as the empty-response (4JX/4Z1) and session-expired (4P0/4K5) fixes — prevents a follow-on Sentry issue from the re-report.Polarity preserved: anchors are context-overflow specific, so rate-limit
"exceeded", budget errors,"model not found", and tool-budget errors still reach Sentry (pinned by rejection tests).Submission Checklist
provider::opscontext_window_exceeded_suppressionmod (4): TAURI-RUST-501 500 body, the 7 established phrasings, a 4-case rejection contract (rate-limit / model-not-found / budget / tool-budget), and alog_context_window_exceededsmoke.core::observability(2): re-report classification (501 body + established phrasings) and a 3-case rejection contract.cargo test→ observability 94, provider::ops 28, reliable 56, all pass.docs/TEST-COVERAGE-MATRIX.md.## Related— no matrix feature IDs affected.Closes #NNN— Sentry-only fix; no GitHub issue. TheSentry-Issuetrailer below carries the back-reference.Impact
warn!breadcrumb at the api_error cascade and classify asContextWindowExceededat the re-report layer. No behaviour change to the non-retryable decision, the user-facing chat error, or any caller — only the Sentry/log severity tier.reliable.rsrefactor is behaviour-preserving (delegation to the same phrase set).warn!breadcrumb. Unrelated provider failures (rate-limit, model-not-found, budget) are unaffected and still reach Sentry.Notes for reviewers
4zf,reliable-aggregate,35-does-not-support-tools) all touchconfig_rejection.rs; this PR deliberately avoids that file (context-overflow ≠ config-rejection) and touches onlyops.rs/reliable.rs/observability.rs.FilesystemUserPathInvalid) and fix(observability): classify web-channel empty-provider-response re-report as expected (Sentry TAURI-RUST-4Z1) #2808 (EmptyProviderResponse): all add a newExpectedErrorKindvariant afterPromptInjectionBlocked. Trivial concurrent edit; git auto-merges (keep all variants).pnpm format) skipped via--no-verify— fresh worktree has nonode_modules; Rust-only change,cargo fmt --manifest-path Cargo.toml --checkclean.Related
src/openhuman/inference/provider/ops.rsapi_errorcascade (primary,operation=api_error); higher-layer re-report viaagent.run_single/web_channel.run_chat_task.provider::ops::is_context_window_exceeded_message(shared byreliable.rs, the api_error cascade, andobservability.rs).Summary by CodeRabbit
Bug Fixes
Refactor
Tests