feat(anthropic): add dedicated Claude-compatible endpoint layer#97
feat(anthropic): add dedicated Claude-compatible endpoint layer#97dwnmf wants to merge 16 commits intoSoju06:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c79393a2e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if path.startswith("/v1/") or path.startswith("/backend-api/"): | ||
| return "openai" |
There was a problem hiding this comment.
Route /v1/messages validation failures to Anthropic envelope
The fallback format detection still treats every /v1/* path as OpenAI, so requests like malformed JSON to /v1/messages (or /v1/messages/count_tokens) can bypass set_anthropic_error_format and return the OpenAI error shape instead of Anthropic's {"type":"error",...} envelope. Because JSON/body parsing errors are raised before route dependencies run, Anthropic clients can receive an incompatible error contract on exactly the endpoints this change adds.
Useful? React with 👍 / 👎.
…ases - Return stop_reason="stop_sequence" when upstream response includes a stop_sequence value, instead of incorrectly falling through to "end_turn" (affects both streaming and non-streaming paths) - Skip camelCase cache key aliases (promptCacheKey, promptCacheRetention) in outbound payloads when the snake_case canonical form already exists, preventing duplicate fields that can cause upstream validation errors
Define Phase 5 re-review loop: after fixes are committed, re-run Codex review until 0 findings. Safety limits: max 3 iterations, recurring findings marked wont_fix, user can stop early via HITL override.
- Add x-api-key to IGNORE_INBOUND_HEADERS to prevent client credentials from leaking to upstream OpenAI requests via the Anthropic endpoint - Validate model access and enforce limits against the original requested model name (payload.model) instead of the remapped upstream model, so API keys restricted to Claude model names work correctly
Write passthrough cache keys using their canonical snake_case form (prompt_cache_key, prompt_cache_retention) instead of preserving the original camelCase input, so upstream fields are consistently named.
Prevent _merge_passthrough_cache_extras from re-adding raw model_extra values for keys already handled by the normalization pipeline (prompt_cache_key, prompt_cache_retention). This avoids forwarding malformed values (e.g. whitespace-only) that normalization rejected.
Read promptCacheRetention (camelCase) as fallback in _extract_prompt_cache_retention, matching the pattern used by _extract_explicit_prompt_cache_key. Prevents silent loss of retention config when clients use camelCase field names.
info.md
Outdated
There was a problem hiding this comment.
Please either pass the document to openspec or remove it
app/modules/anthropic_compat/api.py
Outdated
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| _CLAUDE_CODE_FORCED_MODEL = "gpt-5.3-codex" |
There was a problem hiding this comment.
Aliasing a model like claude-* to gpt-5.3-codex is probably not a good idea
Following the approach in https://code.claude.com/docs/en/llm-gateway, modify and use the environment variables for claude code
codex-lb should only provide a transport layer
There was a problem hiding this comment.
Sure. It was an approach during developing phase when I was trying to eliminate cache issues
There was a problem hiding this comment.
Service/dependency layer architecture violation
|
Hi @Soju06, thanks for the review. I removed the forced Claude model aliasing and the hardcoded reasoning override so the Anthropic-compatible layer now stays transport-only, moved the API key reservation/release flow out of the route layer into the service layer, removed the out-of-band info.md notes and synced the related updates into OpenSpec, and added support for mapping reasoningEffort / reasoning.effort (including a server default via CODEX_LB_ANTHROPIC_DEFAULT_REASONING_EFFORT). I also updated and re-ran the relevant tests. One caveat: prompt caching is still not working correctly right now. |
This PR adds a dedicated Anthropic/Claude-compatible API surface on top of the existing OpenAI-compatible backend.
What it does: Claude Code clients expect a different contract than OpenAI-compatible ones. Rather than pushing that complexity onto clients, this adds a translation layer that keeps all the mapping logic in one place — new endpoints, format translation, stream adaptation, the whole thing.
New endpoints:
POST /v1/messagesandPOST /anthropic/v1/messagesPOST /v1/messages/count_tokensandPOST /anthropic/v1/messages/count_tokensWhat's in the module: Request/response schemas for messages and tool blocks, a translator from Anthropic message format to Responses payload, stream adaptation back to Anthropic event format, and Claude model aliases that normalize to the configured upstream lane. Claude requests get forced to
gpt-5.3-codexwithxhighreasoning — conflicting sampling fields are stripped automatically.Prompt caching: Cache key extraction for Anthropic payloads, a dedicated lane for
count-tokensflows, and safer handling of metadata/cache-control fields that were previously causing upstream400errors.Observability: Request logs now include hashed Codex session and conversation IDs. Test coverage extended for migrations, request logs, translator behavior, and Anthropic compatibility.
Known issue — cache hit rate is intermittent. Not guaranteed per request. Example from local logs: