Skip to content

[Bug] Safe retry failure notice does not explain provider response timeout #963

@Astro-Han

Description

@Astro-Han

What happened?

When a model stream times out before the provider sends any output, PawWork currently surfaces the terminal state as a generic safe-retry failure notice:

Recovery failed. Try again later or switch models.

In Chinese this appears as:

恢复失败。你可以稍后再试,或换一个模型。

That wording is misleading for non-technical users. It says PawWork failed to recover, but it does not explain the actual situation: the selected model provider never started responding, PawWork waited and retried safely, and the user can resend or switch models.

This came up with xiaomi-token-plan-cn/mimo-v2.5-pro, while another provider/model in the same environment (opencode-go/deepseek-v4-pro) returned normally. The core failure is upstream/provider responsiveness, but the product bug is that PawWork's terminal UI does not make that understandable or actionable.

Related but separate: #943 tracks a grouping/visual-boundary issue where the terminal notice can look attached to reasoning output. This issue is about the failure copy, action buttons, and compact card design for the terminal state itself.

Which area seems affected?

UI or design system; Model harness, prompts, tools, or session mechanics

How much does this affect you?

Makes a workflow harder, but there is a workaround

Steps to reproduce

  1. Use PawWork with provider xiaomi-token-plan-cn and model mimo-v2.5-pro.
  2. Send a prompt when the provider does not produce the first streaming output frame.
  3. Let PawWork run through its watchdog timeout and safe retry.
  4. Observe the final user-facing notice.

Observed from a session export:

  • App version: 0.0.0-prod-202605271646
  • Provider/model: xiaomi-token-plan-cn/mimo-v2.5-pro
  • User prompt: Hi
  • The assistant message had no text, reasoning, tool call, or visible output before failure.
  • PawWork attempted safe recovery once.
  • Both attempts saw no provider progress:
    • attempt 1: connect_timeout_ms = 60000, provider_progress_seen = false
    • attempt 2: connect_timeout_ms = 120000, provider_progress_seen = false
  • Final classification: external_stream_disconnect
  • Terminal cause: watchdog_timeout / connect
  • Stream error: LLM stream connection timed out after 120000ms without provider progress
  • Notice part written: type: "notice", kind: "safe_retry_failed"

What did you expect to happen?

The terminal UI should explain the user-visible problem directly and provide clear next actions.

Suggested Chinese copy:

  • Title: 响应超时
  • Description: 小米 MiMo 暂未开始回复。重新发送试试,或切换到其他模型继续。
  • Buttons: 重新发送 / 切换模型
  • Technical details: 模型:mimo-v2.5-pro · 服务商:xiaomi-token-plan-cn · 原因:等待首次输出超时 · 爪印已自动重试一次

Suggested English copy:

  • Title: Response Timeout
  • Description: Xiaomi MiMo hasn't started replying. Resend your message or switch to another model to continue.
  • Buttons: Resend / Switch Model
  • Technical details: Model: mimo-v2.5-pro · Provider: xiaomi-token-plan-cn · Reason: Timeout waiting for first output · PawWork auto-retried once

Design direction:

  • Prefer a compact terminal-state card rather than a plain text notice.
  • Use the existing RateLimitCard architecture as a reference: pure UI component in packages/ui, app-layer wiring for side effects in packages/app.
  • Do not reuse RateLimitCard directly; this is not a quota/exhaustion decision card.
  • Primary action: resend the last user message.
  • Secondary action: open or focus model switching.
  • Put auto-retry details in a technical/details area, not the main sentence.

PawWork version

0.0.0-prod-202605271646

OS version

macOS / Darwin 25.5.0

Can you reproduce it again?

Sometimes

Diagnostics

Likely files and surfaces:

  • packages/ui/src/components/message-part/parts/notice.tsx
  • packages/ui/src/components/session-retry.tsx
  • packages/ui/src/components/rate-limit-card.tsx and .css as design/architecture references
  • packages/app/src/components/rate-limit-card-wiring.tsx as the app-layer side-effect wiring reference
  • packages/app/e2e/snap/fixtures/safe-retry-snap-fixture.tsx
  • packages/app/e2e/snap/safe-retry.snap.ts
  • packages/ui/src/i18n/en.ts
  • packages/ui/src/i18n/zh.ts
  • packages/ui/src/i18n/zht.ts

Acceptance criteria:

  • A safe-retry-failed terminal state caused by no provider progress no longer renders as only Recovery failed / 恢复失败.
  • The UI explains that the model did not start replying and offers Resend plus Switch Model actions.
  • Chinese and English copy are both updated; other locales may fall back to English if that is the current project convention.
  • The technical details preserve provider/model/reason/retry facts without crowding the main message.
  • The safe-retry snap target is updated or expanded to cover the new terminal card state.
  • The implementation keeps presentation in packages/ui and side effects in packages/app.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium prioritybugSomething isn't workingharnessModel harness, prompts, tool descriptions, and session mechanicsuiDesign system and user interface

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions