Skip to content

feat: API timeout config, Retry-After support, and configurable retry#2816

Open
TheArchitectit wants to merge 10 commits into
ultraworkers:mainfrom
TheArchitectit:feat/api-timeout-retry
Open

feat: API timeout config, Retry-After support, and configurable retry#2816
TheArchitectit wants to merge 10 commits into
ultraworkers:mainfrom
TheArchitectit:feat/api-timeout-retry

Conversation

@TheArchitectit
Copy link
Copy Markdown

@TheArchitectit TheArchitectit commented Apr 27, 2026

Summary

  • Add TimeoutConfig to HTTP client builder with connect_timeout (30s default) and request_timeout (5min default)
  • Configurable via CLAW_API_CONNECT_TIMEOUT and CLAW_API_REQUEST_TIMEOUT env vars
  • Add with_timeout() builder to both AnthropicClient and OpenAiCompatClient
  • Parse Retry-After header on 429 responses and respect it over exponential backoff
  • Add apiTimeout config block to ~/.claw/settings.json with connectTimeout, requestTimeout, and maxRetries fields
  • Add retry_after field to ApiError::Api for propagating rate-limit backoff hints
  • Add is_retryable_400() to detect transient gateway 400 errors (not real bad requests)
  • Add "no parseable body" to CONTEXT_WINDOW_ERROR_MARKERS

Problem this solves

Hung API calls

Before: a request to a slow/unresponsive API endpoint would block indefinitely. No timeout, no way to configure one.

Fix: TimeoutConfig with sensible defaults (30s connect, 5min request) and env var / settings.json overrides.

Ignored Retry-After headers

Before: on 429 (rate-limited) responses, the client always used exponential backoff, ignoring the provider's Retry-After header. This caused unnecessary delays or premature retries.

Fix: parse_retry_after() extracts the header value, and the retry loop respects it over exponential backoff when present.

Transient gateway 400 errors treated as fatal

Before: some providers (especially OpenAI-compat backends like glm-5.1-fast) return 400 with bodies like "HTTP 400 from backend (no parseable body)" or "connection reset by peer" — these are not real bad requests. They're transient gateway errors caused by the backend being overwhelmed or unable to parse an oversized payload. The client treated all 400s as fatal, immediately failing the request.

Fix: is_retryable_400() checks 400 response bodies for transient gateway error signatures and marks them as retryable, so the retry loop can attempt the request again.

Context window overflow disguised as 400

Before: when a request exceeds the model's context window, some OpenAI-compat backends can't even parse the oversized payload and return 400 "no parseable body" instead of a proper context_length_exceeded error. Without recognizing this as a context overflow, is_context_window_error() returns false and the auto-compact retry loop (#2808) never triggers — the user sees an opaque 400 with no recovery path.

Fix: Added "no parseable body" to CONTEXT_WINDOW_ERROR_MARKERS so is_context_window_error() correctly identifies these disguised context overflow errors. This enables the progressive auto-compact retry loop (PR #2808) to kick in and shrink the session until it fits.

After

  • Requests timeout after 5 minutes by default (configurable)
  • 429 responses with Retry-After header use the provider's suggested delay
  • Transient gateway 400s are retried instead of failing immediately
  • Context overflow disguised as 400 is correctly detected for auto-compact retry
  • Users can configure timeouts in settings.json:
{
  "apiTimeout": {
    "connectTimeout": 30,
    "requestTimeout": 300,
    "maxRetries": 8
  }
}

Test plan

  • cargo test --workspace — all tests pass (1 pre-existing env-specific failure in lsp_discovery)
  • cargo build --release — clean build
  • Tested end-to-end with bloated session against OpenAI-compat provider — progressive auto-compact retry triggered and completed successfully
  • Verify timeout triggers on slow endpoints
  • Verify Retry-After header is respected on 429
  • Verify settings.json apiTimeout overrides defaults
  • Verify is_retryable_400() correctly classifies transient gateway 400s

💘 Generated with Crush

TheArchitectit and others added 10 commits May 10, 2026 21:26
…e retry

- Add TimeoutConfig to HTTP client builder with connect_timeout (30s)
  and request_timeout (5min) defaults, configurable via
  CLAW_API_CONNECT_TIMEOUT and CLAW_API_REQUEST_TIMEOUT env vars
- Add with_timeout() builder to both AnthropicClient and
  OpenAiCompatClient for per-client timeout configuration
- Parse Retry-After header on 429 responses and use it to override
  exponential backoff delay when present
- Add ApiTimeoutConfig to runtime config with apiTimeout settings
  in ~/.claw/settings.json (connectTimeout, requestTimeout, maxRetries)
- Add retry_after field to ApiError::Api for propagating rate limit
  backoff hints through the retry pipeline
Some providers/proxies return HTTP 400 with bodies like "no parseable
body" or "connection reset" during transient network blips. These are
not real bad requests — they're gateway errors wearing a 400 mask.
Detect known gateway error phrases in 400 response bodies and mark
them as retryable so the existing exponential backoff handles them.
Some OpenAI-compat backends (e.g. glm-5.1-fast) return 400 with
"no parseable body" when the request payload is too large to parse,
rather than a proper context_length_exceeded error. Without this marker,
is_context_window_error() returns false and the auto-compact retry
loop never triggers — the user just sees an opaque 400 error.

💘 Generated with Crush

Assisted-by: GLM 5.1 FP8 via Crush <crush@charm.land>
Some OpenAI-compatible providers (e.g., GLM-5) omit the `id` field in
streaming and non-streaming responses. Adding #[serde(default)] allows
the parser to accept these responses instead of failing with
"missing field `id`".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds scripts/install.sh that builds the release binary and links it
to ~/.local/bin/claw. Run after code changes to update the CLI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a provider returns HTML (e.g., error page, wrong endpoint) instead
of JSON in an SSE stream, provide a clear error message instead of
hanging or failing with a cryptic parse error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a provider returns a JSON error (e.g., {"error":{"message":"..."}})
without SSE framing (no "data:" prefix), the SSE parser was silently
ignoring it and hanging. Now detects and surfaces these errors.

Also handles HTML responses that lack SSE framing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Some providers (GLM, DeepSeek) emit reasoning tokens in `reasoning_content`
or nested `thinking.content` fields instead of `content`. Added support
for these fields so reasoning models work correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The final streaming chunk from some providers contains only finish_reason
and usage, with no delta field. Made it optional to prevent parse errors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When preserve_recent_messages == 0, raw_keep_from equals messages.len(),
causing index out of bounds when accessing session.messages[k].

Added k >= session.messages.len() check to prevent panic.

Reason: Compaction with preserve_recent_messages=0 triggered OOB access
when checking for tool-use/tool-result pair preservation at boundary.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@TheArchitectit TheArchitectit force-pushed the feat/api-timeout-retry branch from 9ab2ecb to 1c54c0d Compare May 10, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant