The LLM and embedding fetchers classify HTTP 429 as transient and retry with exponential backoff — the same path used for 5xx. Retry-After header is never read.
Evidence
File: apps/memos-local-plugin/core/llm/fetcher.ts
Line 58:
const transient = resp.status >= 500 || resp.status === 429;
backoff() function line 236:
const ms = base * 2 ** (attempt - 1) + jitter;
No Retry-After header read. Same pattern in apps/memos-local-plugin/core/embedding/fetcher.ts line 48.
Consequence
Under provider rate limiting, retries fire before the upstream-requested cooldown expires. Premature retries extend rate limiting, increase failed requests, and waste paid API calls.
Suggested fix
In both fetchers, read Retry-After before computing backoff delay. Support integer seconds and HTTP-date formats. Use header value when present, fall back to existing exponential backoff when absent.
Related pattern
retry-after-ignored-under-concurrency
Corpus reference: https://github.com/SirBrenton/pitstop-truth
cc @CaralHsi @Ki-Seki
The LLM and embedding fetchers classify HTTP 429 as transient and retry with exponential backoff — the same path used for 5xx. Retry-After header is never read.
Evidence
File: apps/memos-local-plugin/core/llm/fetcher.ts
Line 58:
backoff() function line 236:
No Retry-After header read. Same pattern in apps/memos-local-plugin/core/embedding/fetcher.ts line 48.
Consequence
Under provider rate limiting, retries fire before the upstream-requested cooldown expires. Premature retries extend rate limiting, increase failed requests, and waste paid API calls.
Suggested fix
In both fetchers, read Retry-After before computing backoff delay. Support integer seconds and HTTP-date formats. Use header value when present, fall back to existing exponential backoff when absent.
Related pattern
retry-after-ignored-under-concurrency
Corpus reference: https://github.com/SirBrenton/pitstop-truth
cc @CaralHsi @Ki-Seki