Skip to content

feat(anthropic): add dedicated Claude-compatible endpoint layer#97

Open
dwnmf wants to merge 16 commits intoSoju06:mainfrom
dwnmf:chore/uncommitted-snapshot-20260222
Open

feat(anthropic): add dedicated Claude-compatible endpoint layer#97
dwnmf wants to merge 16 commits intoSoju06:mainfrom
dwnmf:chore/uncommitted-snapshot-20260222

Conversation

@dwnmf
Copy link
Contributor

@dwnmf dwnmf commented Feb 22, 2026

This PR adds a dedicated Anthropic/Claude-compatible API surface on top of the existing OpenAI-compatible backend.

What it does: Claude Code clients expect a different contract than OpenAI-compatible ones. Rather than pushing that complexity onto clients, this adds a translation layer that keeps all the mapping logic in one place — new endpoints, format translation, stream adaptation, the whole thing.

New endpoints:

  • POST /v1/messages and POST /anthropic/v1/messages
  • POST /v1/messages/count_tokens and POST /anthropic/v1/messages/count_tokens

What's in the module: Request/response schemas for messages and tool blocks, a translator from Anthropic message format to Responses payload, stream adaptation back to Anthropic event format, and Claude model aliases that normalize to the configured upstream lane. Claude requests get forced to gpt-5.3-codex with xhigh reasoning — conflicting sampling fields are stripped automatically.

Prompt caching: Cache key extraction for Anthropic payloads, a dedicated lane for count-tokens flows, and safer handling of metadata/cache-control fields that were previously causing upstream 400 errors.

Observability: Request logs now include hashed Codex session and conversation IDs. Test coverage extended for migrations, request logs, translator behavior, and Anthropic compatibility.

Known issue — cache hit rate is intermittent. Not guaranteed per request. Example from local logs:

14:11:25 -> input_tokens=15532, cached_input_tokens=0
14:11:32 -> input_tokens=15603, cached_input_tokens=15104
14:11:43 -> input_tokens=15629, cached_input_tokens=0

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5c79393a2e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 58 to 59
if path.startswith("/v1/") or path.startswith("/backend-api/"):
return "openai"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route /v1/messages validation failures to Anthropic envelope

The fallback format detection still treats every /v1/* path as OpenAI, so requests like malformed JSON to /v1/messages (or /v1/messages/count_tokens) can bypass set_anthropic_error_format and return the OpenAI error shape instead of Anthropic's {"type":"error",...} envelope. Because JSON/body parsing errors are raised before route dependencies run, Anthropic clients can receive an incompatible error contract on exactly the endpoints this change adds.

Useful? React with 👍 / 👎.

dwnmf and others added 11 commits February 22, 2026 15:52
…ases

- Return stop_reason="stop_sequence" when upstream response includes a
  stop_sequence value, instead of incorrectly falling through to
  "end_turn" (affects both streaming and non-streaming paths)
- Skip camelCase cache key aliases (promptCacheKey, promptCacheRetention)
  in outbound payloads when the snake_case canonical form already exists,
  preventing duplicate fields that can cause upstream validation errors
Define Phase 5 re-review loop: after fixes are committed, re-run Codex
review until 0 findings. Safety limits: max 3 iterations, recurring
findings marked wont_fix, user can stop early via HITL override.
- Add x-api-key to IGNORE_INBOUND_HEADERS to prevent client credentials
  from leaking to upstream OpenAI requests via the Anthropic endpoint
- Validate model access and enforce limits against the original requested
  model name (payload.model) instead of the remapped upstream model,
  so API keys restricted to Claude model names work correctly
Write passthrough cache keys using their canonical snake_case form
(prompt_cache_key, prompt_cache_retention) instead of preserving the
original camelCase input, so upstream fields are consistently named.
Prevent _merge_passthrough_cache_extras from re-adding raw model_extra
values for keys already handled by the normalization pipeline
(prompt_cache_key, prompt_cache_retention). This avoids forwarding
malformed values (e.g. whitespace-only) that normalization rejected.
Read promptCacheRetention (camelCase) as fallback in
_extract_prompt_cache_retention, matching the pattern used by
_extract_explicit_prompt_cache_key. Prevents silent loss of
retention config when clients use camelCase field names.
info.md Outdated
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please either pass the document to openspec or remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


logger = logging.getLogger(__name__)

_CLAUDE_CODE_FORCED_MODEL = "gpt-5.3-codex"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aliasing a model like claude-* to gpt-5.3-codex is probably not a good idea
Following the approach in https://code.claude.com/docs/en/llm-gateway, modify and use the environment variables for claude code
codex-lb should only provide a transport layer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It was an approach during developing phase when I was trying to eliminate cache issues

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Service/dependency layer architecture violation

@dwnmf
Copy link
Contributor Author

dwnmf commented Feb 26, 2026

Hi @Soju06, thanks for the review. I removed the forced Claude model aliasing and the hardcoded reasoning override so the Anthropic-compatible layer now stays transport-only, moved the API key reservation/release flow out of the route layer into the service layer, removed the out-of-band info.md notes and synced the related updates into OpenSpec, and added support for mapping reasoningEffort / reasoning.effort (including a server default via CODEX_LB_ANTHROPIC_DEFAULT_REASONING_EFFORT). I also updated and re-ran the relevant tests. One caveat: prompt caching is still not working correctly right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants