Summary
Backend should debug and fix the rate-limit response path so chat receives structured retry metadata and can show a clear recovery action instead of a vague “brief wait” message.
Problem
When an upstream AI provider rate-limits a request, the app currently shows:
“Your AI provider is rate-limiting requests. This is a transient upstream limit, not a thread-level block — you can retry in this thread.”
and:
“Rate limit exceeded. Please retry after a brief wait.”
Expected behavior: the backend should classify the rate-limit source and return structured metadata the frontend can use, such as provider/source, whether the error is retryable, retry-after timing when available, and whether fallback provider/model routing is possible.
Actual behavior: the user only gets generic copy. There is no concrete retry time, no clear provider/source, no fallback instruction, and no structured recovery action.
Impact: users may think the thread is broken, retry too quickly, or abandon the conversation. This also makes it hard for frontend to build a proper retry/countdown UI because the backend does not appear to expose enough structured detail.
Steps to reproduce:
- Trigger an upstream model/provider rate-limit response during chat.
- Observe the backend/provider error mapping.
- Confirm the chat receives a generic rate-limit message instead of structured retry metadata.
- Confirm the UI cannot show a countdown, retry button state, provider/source, or fallback option.
Version / platform: desktop app, screenshot captured May 25, 2026. Exact app version unknown.
Scope (backend)
Backend developer should:
- Trace which provider/backend layer produces this rate-limit response.
- Preserve upstream
Retry-After or equivalent cooldown metadata when present.
- Normalize rate-limit errors into a typed response shape for chat/tool execution.
- Distinguish upstream provider throttling from OpenHuman budget/rate limits.
- Indicate whether the same thread can retry and whether fallback routing is available.
- Add backend logs with provider, model/workload, retry metadata, and request correlation ID, without logging secrets or prompt contents.
Frontend follow-up can then use the structured response to render countdown/retry/fallback UI.
Acceptance criteria
Related
Summary
Backend should debug and fix the rate-limit response path so chat receives structured retry metadata and can show a clear recovery action instead of a vague “brief wait” message.
Problem
When an upstream AI provider rate-limits a request, the app currently shows:
and:
Expected behavior: the backend should classify the rate-limit source and return structured metadata the frontend can use, such as provider/source, whether the error is retryable, retry-after timing when available, and whether fallback provider/model routing is possible.
Actual behavior: the user only gets generic copy. There is no concrete retry time, no clear provider/source, no fallback instruction, and no structured recovery action.
Impact: users may think the thread is broken, retry too quickly, or abandon the conversation. This also makes it hard for frontend to build a proper retry/countdown UI because the backend does not appear to expose enough structured detail.
Steps to reproduce:
Version / platform: desktop app, screenshot captured May 25, 2026. Exact app version unknown.
Scope (backend)
Backend developer should:
Retry-Afteror equivalent cooldown metadata when present.Frontend follow-up can then use the structured response to render countdown/retry/fallback UI.
Acceptance criteria
retryable,source,provider,retry_after_ms/equivalent, and fallback availability when known..github/workflows/coverage.yml).Related
Screenshot 2026-05-25 at 11.16.05 AM.png