feat: add Anthropic prompt caching for cost and latency reduction by ckrough · Pull Request #20 · ckrough/retriever

ckrough · 2026-01-10T21:40:46Z

Summary

Enable Anthropic prompt caching via OpenRouter for cost and latency optimization
Add llm_enable_prompt_caching config option (default: true)
Log cache hit/write metrics for observability

Changes

src/config.py: Add llm_enable_prompt_caching setting
src/infrastructure/llm/openrouter.py: Add cache_control to system messages, log cache metrics
src/web/routes.py: Pass caching config to provider
.env.example: Document new config option
tests/test_llm_provider.py: Add comprehensive tests for caching behavior

Benefits

Up to 90% reduction in input token costs for cached prompts
Up to 85% latency reduction for repeated system prompts
Observability via structured cache metrics logging

Test plan

All existing tests pass (330 passed)
Coverage maintained at 80.61%
mypy strict mode passes
ruff format/check passes
bandit security scan passes

Closes: retriever-ghs

Enable prompt caching via OpenRouter for Anthropic models (Claude Sonnet/Haiku). This can reduce input token costs by up to 90% and latency by up to 85% for repeated system prompts. Changes: - Add llm_enable_prompt_caching config option (default: true) - Add cache_control to system messages in OpenRouterProvider - Log cache hit/write metrics for observability - Update .env.example with new config option Closes: retriever-ghs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Anthropic prompt caching for cost and latency reduction#20

feat: add Anthropic prompt caching for cost and latency reduction#20
ckrough wants to merge 1 commit intomainfrom
zen-goldberg

ckrough commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ckrough commented Jan 10, 2026

Summary

Changes

Benefits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant