Skip to content

feat: Rate-limit detection for OpenAI/Codex #47

@lis186

Description

@lis186

Problem

ccxray detects Claude rate-limit headers for quota ticker display. Codex traffic is primarily WebSocket, so rate-limit detection likely needs ws-proxy frame parsing or WS upgrade response header extraction, not just the HTTP-path ratelimit-log.js.

Scope

  • Identify where OpenAI rate-limit info appears (WS upgrade response headers? per-frame metadata?)
  • Extend rate-limit detection to cover WS transport
  • Feed into quota-ticker UI

Before / After UI

BEFORE:
┌─────────────────────────────────────────────────────────┐
│  ccxray dashboard — quota ticker                        │
│                                                         │
│  Claude (opus-4):                                       │
│  ████████████░░░░░░░░  60,000 / 80,000 tokens           │
│  resets 2026-06-05T10:00:00Z                            │
│  ✓ "Can parallelize — 75% remaining"                    │
│                                                         │
│  Codex (o3):                                            │
│  ░░░░░░░░░░░░░░░░░░░░  — / — tokens                    │
│  (no rate-limit info available)                         │
│  ⚠ No data                                             │
└─────────────────────────────────────────────────────────┘

AFTER:
┌─────────────────────────────────────────────────────────┐
│  ccxray dashboard — quota ticker                        │
│                                                         │
│  Claude (opus-4):                                       │
│  ████████████░░░░░░░░  60,000 / 80,000 tokens           │
│  resets 2026-06-05T10:00:00Z                            │
│  ✓ "Can parallelize — 75% remaining"                    │
│                                                         │
│  Codex (o3):                                            │
│  ██████████████░░░░░░  7,200 / 10,000 RPM              │
│  resets 2026-06-05T09:01:00Z                            │
│  ✓ "Healthy — 72% remaining"                           │
└─────────────────────────────────────────────────────────┘

Architecture

Detection surface by provider

Claude (HTTP) — working today:

Claude Code
  → POST /v1/messages
  → Anthropic API responds with HTTP headers:
      anthropic-ratelimit-tokens-limit: 80000
      anthropic-ratelimit-tokens-remaining: 60000
      anthropic-ratelimit-tokens-reset: 2026-06-05T10:00:00Z
  → server/ratelimit-log.js captures headers from proxyRes
  → SSE broadcast to dashboard
  → public/quota-ticker.js renders progress bar

Codex (WS) — needs investigation:

Codex CLI
  → POST /v1/responses (Upgrade: websocket)
  → OpenAI API responds with 101 Switching Protocols
      ┌─────────────────────────────────────────────────┐
      │ WS upgrade response headers?                    │
      │   x-ratelimit-limit-requests: 10000             │
      │   x-ratelimit-remaining-requests: 7200          │
      │   x-ratelimit-reset-requests: 1s                │
      │   (unconfirmed — needs wire capture)             │
      └─────────────────────────────────────────────────┘
  → Per-frame metadata in WS messages?
      ┌─────────────────────────────────────────────────┐
      │ response.usage.rate_limit_info? (unconfirmed)   │
      │ response.done event metadata? (unconfirmed)     │
      └─────────────────────────────────────────────────┘
  → server/ws-proxy.js would need to extract from one or both
  → Feed into server/ratelimit-log.js (same capture/sample pattern)
  → public/quota-ticker.js renders (same UI, different labels)

Key question: WHERE does OpenAI expose rate-limit info for WebSocket connections?

  • WS upgrade (101) response headers?
  • Per-response metadata in WS frames?
  • Separate REST endpoint (e.g. /v1/rate_limits)?

Files involved:

File Role
server/ratelimit-log.js Capture + sample rate-limit data (currently HTTP-only)
server/ws-proxy.js WS transport proxy — extraction point for Codex
public/quota-ticker.js Dashboard UI rendering
server/config.js UPSTREAM_PROFILES — could gain rateLimitSource field

Value

For users

  • Know when approaching Codex rate limits before hitting them
  • Pace adjustment recommendations ("Can parallelize" / "Slow down") work for Codex too
  • Unified rate-limit visibility across both providers in one dashboard

For developers

  • UPSTREAM_PROFILES could gain a rateLimitSource field per provider family
  • ratelimit-log.js already has the capture/sample pattern — extend to WS frames
  • Clean separation: detection (server) vs rendering (client) already exists

Side Effects

  • WS upgrade headers may not contain rate-limit info (needs wire capture investigation first)
  • OpenAI rate-limit semantics may differ from Anthropic:
    • Per-minute (RPM/TPM) vs Anthropic's per-day with rolling window
    • Org-level vs project-level vs user-level limits
    • Separate limits for requests vs tokens vs images
  • quota-ticker.js assumptions about token windows (5h rolling) may not apply to OpenAI
  • ChatGPT-OAuth vs API-key Codex users may have different rate-limit visibility

Open Questions

  • Does Codex CLI itself surface rate-limit info anywhere (env var, stderr, config)?
  • Is there a /v1/rate_limits endpoint for OpenAI API key users?
  • For ChatGPT-OAuth Codex users, are rate limits even exposed via headers/frames?
  • Do WS upgrade 101 responses carry the same x-ratelimit-* headers as regular REST responses?
  • Should we do a wire capture (CCXRAY_WS_DEBUG=1) to observe actual headers/frames?

Metadata

Metadata

Assignees

No one assigned

    Labels

    codex-parityCodex dashboard parity gapsenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions