Problem
ccxray detects Claude rate-limit headers for quota ticker display. Codex traffic is primarily WebSocket, so rate-limit detection likely needs ws-proxy frame parsing or WS upgrade response header extraction, not just the HTTP-path ratelimit-log.js.
Scope
- Identify where OpenAI rate-limit info appears (WS upgrade response headers? per-frame metadata?)
- Extend rate-limit detection to cover WS transport
- Feed into quota-ticker UI
Before / After UI
BEFORE:
┌─────────────────────────────────────────────────────────┐
│ ccxray dashboard — quota ticker │
│ │
│ Claude (opus-4): │
│ ████████████░░░░░░░░ 60,000 / 80,000 tokens │
│ resets 2026-06-05T10:00:00Z │
│ ✓ "Can parallelize — 75% remaining" │
│ │
│ Codex (o3): │
│ ░░░░░░░░░░░░░░░░░░░░ — / — tokens │
│ (no rate-limit info available) │
│ ⚠ No data │
└─────────────────────────────────────────────────────────┘
AFTER:
┌─────────────────────────────────────────────────────────┐
│ ccxray dashboard — quota ticker │
│ │
│ Claude (opus-4): │
│ ████████████░░░░░░░░ 60,000 / 80,000 tokens │
│ resets 2026-06-05T10:00:00Z │
│ ✓ "Can parallelize — 75% remaining" │
│ │
│ Codex (o3): │
│ ██████████████░░░░░░ 7,200 / 10,000 RPM │
│ resets 2026-06-05T09:01:00Z │
│ ✓ "Healthy — 72% remaining" │
└─────────────────────────────────────────────────────────┘
Architecture
Detection surface by provider
Claude (HTTP) — working today:
Claude Code
→ POST /v1/messages
→ Anthropic API responds with HTTP headers:
anthropic-ratelimit-tokens-limit: 80000
anthropic-ratelimit-tokens-remaining: 60000
anthropic-ratelimit-tokens-reset: 2026-06-05T10:00:00Z
→ server/ratelimit-log.js captures headers from proxyRes
→ SSE broadcast to dashboard
→ public/quota-ticker.js renders progress bar
Codex (WS) — needs investigation:
Codex CLI
→ POST /v1/responses (Upgrade: websocket)
→ OpenAI API responds with 101 Switching Protocols
┌─────────────────────────────────────────────────┐
│ WS upgrade response headers? │
│ x-ratelimit-limit-requests: 10000 │
│ x-ratelimit-remaining-requests: 7200 │
│ x-ratelimit-reset-requests: 1s │
│ (unconfirmed — needs wire capture) │
└─────────────────────────────────────────────────┘
→ Per-frame metadata in WS messages?
┌─────────────────────────────────────────────────┐
│ response.usage.rate_limit_info? (unconfirmed) │
│ response.done event metadata? (unconfirmed) │
└─────────────────────────────────────────────────┘
→ server/ws-proxy.js would need to extract from one or both
→ Feed into server/ratelimit-log.js (same capture/sample pattern)
→ public/quota-ticker.js renders (same UI, different labels)
Key question: WHERE does OpenAI expose rate-limit info for WebSocket connections?
- WS upgrade (101) response headers?
- Per-response metadata in WS frames?
- Separate REST endpoint (e.g.
/v1/rate_limits)?
Files involved:
| File |
Role |
server/ratelimit-log.js |
Capture + sample rate-limit data (currently HTTP-only) |
server/ws-proxy.js |
WS transport proxy — extraction point for Codex |
public/quota-ticker.js |
Dashboard UI rendering |
server/config.js |
UPSTREAM_PROFILES — could gain rateLimitSource field |
Value
For users
- Know when approaching Codex rate limits before hitting them
- Pace adjustment recommendations ("Can parallelize" / "Slow down") work for Codex too
- Unified rate-limit visibility across both providers in one dashboard
For developers
UPSTREAM_PROFILES could gain a rateLimitSource field per provider family
ratelimit-log.js already has the capture/sample pattern — extend to WS frames
- Clean separation: detection (server) vs rendering (client) already exists
Side Effects
- WS upgrade headers may not contain rate-limit info (needs wire capture investigation first)
- OpenAI rate-limit semantics may differ from Anthropic:
- Per-minute (RPM/TPM) vs Anthropic's per-day with rolling window
- Org-level vs project-level vs user-level limits
- Separate limits for requests vs tokens vs images
quota-ticker.js assumptions about token windows (5h rolling) may not apply to OpenAI
- ChatGPT-OAuth vs API-key Codex users may have different rate-limit visibility
Open Questions
- Does Codex CLI itself surface rate-limit info anywhere (env var, stderr, config)?
- Is there a
/v1/rate_limits endpoint for OpenAI API key users?
- For ChatGPT-OAuth Codex users, are rate limits even exposed via headers/frames?
- Do WS upgrade 101 responses carry the same
x-ratelimit-* headers as regular REST responses?
- Should we do a wire capture (
CCXRAY_WS_DEBUG=1) to observe actual headers/frames?
Problem
ccxray detects Claude rate-limit headers for quota ticker display. Codex traffic is primarily WebSocket, so rate-limit detection likely needs
ws-proxyframe parsing or WS upgrade response header extraction, not just the HTTP-pathratelimit-log.js.Scope
Before / After UI
Architecture
Detection surface by provider
Claude (HTTP) — working today:
Codex (WS) — needs investigation:
Key question: WHERE does OpenAI expose rate-limit info for WebSocket connections?
/v1/rate_limits)?Files involved:
server/ratelimit-log.jsserver/ws-proxy.jspublic/quota-ticker.jsserver/config.jsUPSTREAM_PROFILES— could gainrateLimitSourcefieldValue
For users
For developers
UPSTREAM_PROFILEScould gain arateLimitSourcefield per provider familyratelimit-log.jsalready has the capture/sample pattern — extend to WS framesSide Effects
quota-ticker.jsassumptions about token windows (5h rolling) may not apply to OpenAIOpen Questions
/v1/rate_limitsendpoint for OpenAI API key users?x-ratelimit-*headers as regular REST responses?CCXRAY_WS_DEBUG=1) to observe actual headers/frames?