Skip to content

fix(observability): demote "Config loading timed out" out of Sentry#2920

Merged
graycyrus merged 4 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/sentry-config-load-timed-out
May 29, 2026
Merged

fix(observability): demote "Config loading timed out" out of Sentry#2920
graycyrus merged 4 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/sentry-config-load-timed-out

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented May 29, 2026

Summary

  • Classify the "Config loading timed out" wrapper string as an expected, non-actionable error so it no longer fires Sentry events.
  • Add ExpectedErrorKind::ConfigLoadTimedOut variant + is_config_load_timed_out_message matcher, wired into the central expected_error_kind dispatcher.
  • Demote the condition to warn! (breadcrumb only) in report_expected_message, matching the existing DiskFull treatment.
  • Add unit tests covering the canonical string, the RPC-wrapped shape, and negative cases (network/HTTP/cron timeouts must NOT match).

Problem

Sentry issue TAURI-RUST-5X (Config loading timed out, ~318 events, ongoing across openhuman@0.54.0 and openhuman@0.56.0) is pure noise. The string originates from load_config_with_timeout / reload_config_snapshot_with_timeout in src/openhuman/config/ops.rs:48,70 when tokio::time::timeout(CONFIG_LOAD_TIMEOUT, …) elapses around Config::load_or_init.

The timeout fires when the user's local filesystem stalls — cloud-sync daemons (iCloud Drive / OneDrive) holding ~/.openhuman, antivirus scanning the config file on open, a nearly-full or network-mounted home directory. None of these are operator-actionable; Sentry has no remediation path. Events surface across many RPC methods (openhuman.config_get, inference_status, local_ai_downloads_progress, …), all wrapped by rpc.invoke_methodreport_error_or_expected.

Solution

Followed the established ExpectedErrorKind pattern in src/core/observability.rs:

  • New enum variant ConfigLoadTimedOut with a doc-comment citing the emit sites and TAURI-RUST-5X.
  • is_config_load_timed_out_message anchors on the full lowercase phrase "config loading timed out" — substring match so the RPC-wrapped form still classifies, but tight enough that bare "… timed out" / "operation timed out" (network) keep flowing to Sentry.
  • Matcher slotted after is_disk_full_message and before is_memory_store_pii_rejection in expected_error_kind.
  • Demote arm emits tracing::warn! (breadcrumb survives for local diagnosis; no Sentry error event).

No behavior change beyond Sentry routing: the error string is unchanged, still returned to the caller, still surfaced in the UI. Classification is the only effect.

Submission Checklist

  • Tests added or updated — classifies_config_load_timed_out (happy + RPC-wrapped) and does_not_classify_unrelated_timeouts_as_config_load_timed_out (network/HTTP/cron negatives).
  • Diff coverage ≥ 80% — Rust cargo-llvm-cov; changed lines are matcher + dispatch arm, all exercised by the new tests.
  • Coverage matrix updated — N/A: observability-only routing change, no feature row.
  • All affected feature IDs listed — N/A: no matrix feature touched.
  • No new external network dependencies — N/A: classifier-only change.
  • Manual smoke checklist — N/A: no release-cut surface touched.
  • Linked issue closed — Sentry issue, no GitHub issue.

Impact

  • Platform: desktop core (Rust). No UI, schema, or RPC contract change.
  • Performance: one extra substring check per already-failing RPC error path — negligible.
  • Security/migration/compat: none. The returned error string is identical; only Sentry severity routing changes.

Related

Summary by CodeRabbit

  • Bug Fixes

    • Classify "config loading timed out" messages as expected warnings (now recorded as a warning breadcrumb rather than a full error event).
    • Simplified handling of session/invalid-token messages to streamline classification and avoid mislabeling.
  • Tests

    • Added tests ensuring the new config-load timeout phrase (including wrapped forms) is recognized and that other timeout/gateway messages are not misclassified.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3badb1e0-e9a0-4db7-b557-4e1311da4755

📥 Commits

Reviewing files that changed from the base of the PR and between 1c7f18b and f481ef8.

📒 Files selected for processing (1)
  • src/core/observability.rs

📝 Walkthrough

Walkthrough

Adds ExpectedErrorKind::ConfigLoadTimedOut and a matcher for the case-insensitive literal "config loading timed out", wires it into expected_error_kind and report_expected_message (emit tracing::warn! breadcrumb), condenses a session-expiry matcher expression, and adds unit tests for positive and negative classification cases.

Changes

Config Load Timeout Observability Classification

Layer / File(s) Summary
Config timeout variant and matcher
src/core/observability.rs
Adds ExpectedErrorKind::ConfigLoadTimedOut enum variant and implements is_config_load_timed_out_message() to detect the exact phrase "config loading timed out".
Classification and reporting integration
src/core/observability.rs
Wires the matcher into expected_error_kind to return ConfigLoadTimedOut and adds a report_expected_message arm that emits a tracing::warn! breadcrumb tagged kind = "config_load_timed_out".
Session matcher condense & comments
src/core/observability.rs
Condenses boolean expression in is_session_expired_message and applies non-functional comment/doc reflows around updater transient phrase matching and the enum region.
Tests
src/core/observability.rs
Adds unit tests: positive case for "Config loading timed out" (and RPC-wrapped form) and negative cases ensuring other timeout/gateway/supervisor messages are not classified as ConfigLoadTimedOut.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • tinyhumansai/openhuman#2808: Similar extensions to src/core/observability.rs adding message predicates and breadcrumb-only handling for new ExpectedErrorKind variants.
  • tinyhumansai/openhuman#2928: Related modifications to expected_error_kind classification and tests that may interact with this change.
  • tinyhumansai/openhuman#2830: Other observability classifier updates introducing different expected-error variants.

Suggested reviewers

  • graycyrus
  • oxoxDev

Poem

🐰 A tiny hop for code so bright,
I sniffed the timeout in the night.
"Config loading timed out" — I knew,
I whispered warn, not Sentry's view.
Tests secure the path I drew.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: demotion of 'Config loading timed out' errors out of Sentry error capture.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@YellowSnnowmann YellowSnnowmann marked this pull request as ready for review May 29, 2026 09:46
@YellowSnnowmann YellowSnnowmann requested a review from a team May 29, 2026 09:46
@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage working A PR that is being worked on by the team. labels May 29, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 29, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 29, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YellowSnnowmann the code looks good to me — tight matcher anchored on the exact canonical phrase, proper dispatch placement relative to DiskFull/MemoryStorePiiRejection, and solid test coverage (positive + RPC-wrapped positive + three meaningful negatives). CI is still pending so I'll hold the formal approval until those are green, but I don't expect any blockers there. A couple minor things while I had the diff open:

Missing doc comment on ConfigLoadTimedOut

Your PR description explicitly says the new variant includes a doc-comment citing the emit sites and TAURI-RUST-5X, but the diff just shows a bare ConfigLoadTimedOut, with no preceding /// block. Every other variant in the enum has one — ChannelSupervisorRestart, DiskFull, MemoryStorePiiRejection, etc. all document the Sentry issue ID, event count, and affected emit sites. Worth adding for consistency and so the next maintainer doesn't have to dig through git blame to understand why this is expected.

Bundled doc comment deletions

The PR deletes substantial doc comment blocks from LoopbackUnavailable, PromptInjectionBlocked, ContextWindowExceeded, the UPDATER_TRANSIENT_MESSAGE_PHRASES const, and the three is_session_expired_message arms. Those comments carried Sentry issue IDs (OPENHUMAN-TAURI-R5/R6, OPENHUMAN-TAURI-140, TAURI-RUST-501, TAURI-RUST-CD, 4P0, 4K5, 1EE), event counts, and the specific reasoning for conjunctive vs. substring anchoring. None of that is in git history in a way that's easy to surface later. The cleanup makes the code shorter but strips future-maintainer context. If the goal is readability, the issue IDs and emit-site references are the parts worth keeping even if the prose gets trimmed.

Neither of these blocks the merge — the classifier logic itself is correct and the fix is exactly the right call for TAURI-RUST-5X. Once CI goes green I'll approve.

Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new matcher itself is clean — is_config_load_timed_out_message is tightly anchored on the literal "config loading timed out" phrase, and the negative tests (network/HTTP timeouts, bare "timed out") are solid. But the diff has a blocking side effect.

Blocker — large collateral deletion of existing doc comments (stale base)

Beyond adding the ConfigLoadTimedOut arm, this diff deletes ~80 lines of doc comments and Sentry-ID annotations that other merged PRs added:

  • UPDATER_TRANSIENT_MESSAGE_PHRASES — the tauri-plugin-updater rationale + Drops TAURI-RUST-CD note
  • LoopbackUnavailable — the OPENHUMAN-TAURI-R5/R6 annotation
  • PromptInjectionBlocked — the OPENHUMAN-TAURI-140 rationale
  • ContextWindowExceeded — the full single-source-matcher explanation + TAURI-RUST-501 note
  • is_session_expired_message — the per-arm 4P0 / 4K5 / 1EE anchor comments (logic preserved, but the explanatory blocks are stripped)

I confirmed those comments still exist on current main, so merging this as-is would revert that documentation. This looks like a stale base / bad rebase rather than an intentional change.

Please rebase on the latest tinyhumansai/openhuman:main and re-push so the diff contains only the ConfigLoadTimedOut addition. Once it's just the matcher + tests, this is a straightforward approve.

@graycyrus graycyrus merged commit 1144274 into tinyhumansai:main May 29, 2026
35 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants