You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This description was written and checked for this PR. The code changes were AI-assisted and reviewed locally before submission.
📝 变更描述 / Description
Adds gateway-side Anthropic prompt cache control injection for Claude relay requests.
The relay now adds top-level cache_control only when the client request does not already define Anthropic cache control. The default TTL comes from ANTHROPIC_PROMPT_CACHE_TTL, and callers can override per request with x-anthropic-prompt-cache-ttl. auto selects 1h for evaluation, benchmark, batch, pipeline, and long-running workloads, otherwise 5m.
Gateway-only cache policy headers are stripped before upstream conversion so they do not leak to providers.
go test ./relay ./relay/channel ./relay/channel/claude -run 'TestApplyAnthropicPromptCacheControl|TestProcessHeaderOverride_PassthroughSkipsAnthropicPromptCachePolicyHeaders|TestConvertOpenAIRequest.*PromptCacheControl'
Ran on 2026-06-28 from branch feat/anthropic-cache-control-injection:
go test -count=1 ./relay ./relay/channel ./relay/channel/claude -run 'TestApplyAnthropicPromptCacheControl|TestProcessHeaderOverride_PassthroughSkipsAnthropicPromptCachePolicyHeaders|TestConvertOpenAIRequest.*PromptCacheControl|TestClaudeAdaptorE2EInjectsPromptCacheControlAndForwardsUsage'
Result: passed (ok github.com/QuantumNous/new-api/relay, ok github.com/QuantumNous/new-api/relay/channel, ok github.com/QuantumNous/new-api/relay/channel/claude).
The E2E test uses a mock Anthropic upstream, so it validates gateway request/response behavior without spending real Anthropic tokens. It verifies top-level cache_control: {"type":"ephemeral","ttl":"1h"} is sent for long/eval workloads, gateway-only cache policy headers are stripped before upstream, client-supplied cache control is preserved, and Anthropic cache usage fields flow back through response handling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
📝 变更描述 / Description
Adds gateway-side Anthropic prompt cache control injection for Claude relay requests.
The relay now adds top-level
cache_controlonly when the client request does not already define Anthropic cache control. The default TTL comes fromANTHROPIC_PROMPT_CACHE_TTL, and callers can override per request withx-anthropic-prompt-cache-ttl.autoselects1hfor evaluation, benchmark, batch, pipeline, and long-running workloads, otherwise5m.Gateway-only cache policy headers are stripped before upstream conversion so they do not leak to providers.
🚀 变更类型 / Type of change
🔗 关联任务 / Related Issue
✅ 提交前检查项 / Checklist
Bug fix,我已提交或关联对应 Issue,且不会将设计取舍、预期不一致或理解偏差直接归类为 bug。📸 运行证明 / Proof of Work
Local verification passed: