Skip to content

Commit 1495800

Browse files
docs(ai-chat): add prompt caching guide (#3951)
## Summary New `/ai-chat/prompt-caching` guide covering how to cache a chat agent's prompt prefix with Anthropic prompt caching: the system prompt, the conversation history (a `prepareMessages` breakpoint), and how caching interacts with compaction. It also shows how to verify cache hits via usage and the dashboard, the prefix-stability footguns, and an "Other providers" section (OpenAI and Google cache automatically; Amazon Bedrock uses `cachePoint` through `systemProviderOptions`). Registered under Features in the AI Agents nav, next to Compaction. --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Eric Allam <ericallam@users.noreply.github.com>
1 parent cf4aa7e commit 1495800

2 files changed

Lines changed: 207 additions & 0 deletions

File tree

docs/ai-chat/prompt-caching.mdx

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
---
2+
title: "Prompt caching"
3+
sidebarTitle: "Prompt caching"
4+
description: "Cache the stable prefix of your agent's prompt with Anthropic prompt caching to cut token cost and latency on every turn."
5+
---
6+
7+
import RcBanner from "/snippets/ai-chat-rc-banner.mdx";
8+
9+
<RcBanner />
10+
11+
**Prompt caching lets a provider reuse the unchanged prefix of your prompt across requests, billing it at a fraction of the input price and skipping re-processing.** With Anthropic, cache reads cost ~10% of base input tokens, so a long, stable system prompt or a growing conversation history pays full price once and reads cheaply on every turn after.
12+
13+
Caching is a **byte-exact prefix match**: any change in the prefix invalidates everything after it. A multi-turn agent is the ideal case — the system prompt, tools, and earlier turns are identical turn over turn, so the cacheable prefix only grows. `chat.agent` is built to keep that prefix stable across turns, suspends, and resumes; this page shows how to place the cache breakpoints and verify they're hitting.
14+
15+
Caching is provider-specific. This guide covers Anthropic (`@ai-sdk/anthropic`), where you opt in per breakpoint with `providerOptions.anthropic.cacheControl`. Other providers cache differently, and most cache automatically — see [Other providers](#other-providers).
16+
17+
## What you cache, and where
18+
19+
A request renders as `tools``system``messages`. There are three prefix regions worth caching, in order:
20+
21+
| Region | How to cache it | Stability |
22+
| --- | --- | --- |
23+
| System prompt (+ tools) | `cacheControl` / `systemProviderOptions` on `chat.toStreamTextOptions()`, or `providerOptions` on `chat.prompt.set()` | Set once, never changes — the highest-value target |
24+
| Conversation history | `prepareMessages` adds a breakpoint to the last message | Grows append-only across turns |
25+
| Tool definitions | Stable as long as your tool set doesn't change between turns | Render at position 0 — changing them invalidates everything |
26+
27+
`chat.agent` preserves `providerOptions` through message persistence and rehydration, so a breakpoint you place survives a suspend/resume or a page refresh. The recommended way to place message breakpoints is `prepareMessages` (below) rather than baking `cacheControl` into stored messages — `prepareMessages` runs on every prompt-assembly path, including after compaction, so the breakpoint is always in the right place.
28+
29+
## Cache the system prompt
30+
31+
The system prompt (your `chat.prompt` text plus any skills preamble) is usually the largest stable block, so it's the first thing to cache. `chat.toStreamTextOptions()` returns `system` as a plain string by default; opt into caching and it returns a structured system message carrying the cache breakpoint instead.
32+
33+
<Note>
34+
System-prompt caching needs AI SDK v6 or later, where the `system` parameter accepts a structured message. On AI SDK v5 `system` is a plain string, so these options won't apply a breakpoint to the system block — cache the conversation via `prepareMessages` instead.
35+
</Note>
36+
37+
Three ways to opt in, depending on where you'd rather express it.
38+
39+
**`cacheControl` at the `streamText` call site** — the Anthropic-flavored one-liner:
40+
41+
```ts /trigger/chat.ts
42+
import { chat } from "@trigger.dev/sdk/ai";
43+
import { streamText } from "ai";
44+
import { anthropic } from "@ai-sdk/anthropic";
45+
46+
export const myChat = chat.agent({
47+
id: "my-chat",
48+
onChatStart: async () => {
49+
chat.prompt.set(SYSTEM_PROMPT); // a large, stable instruction block
50+
},
51+
run: async ({ messages, signal }) => {
52+
return streamText({
53+
model: anthropic("claude-sonnet-4-6"),
54+
// Caches the system block with a 5-minute breakpoint.
55+
...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }),
56+
messages,
57+
abortSignal: signal,
58+
});
59+
},
60+
});
61+
```
62+
63+
**`systemProviderOptions`** is the provider-agnostic form — pass the raw `providerOptions` so it composes with any provider:
64+
65+
```ts /trigger/chat.ts
66+
return streamText({
67+
model: anthropic("claude-sonnet-4-6"),
68+
...chat.toStreamTextOptions({
69+
systemProviderOptions: { anthropic: { cacheControl: { type: "ephemeral" } } },
70+
}),
71+
messages,
72+
abortSignal: signal,
73+
});
74+
```
75+
76+
**`providerOptions` on `chat.prompt.set()`** co-locates the intent with where the prompt is defined. It carries through to `toStreamTextOptions()` with no call-site change:
77+
78+
```ts /trigger/chat.ts
79+
onChatStart: async () => {
80+
chat.prompt.set(SYSTEM_PROMPT, {
81+
providerOptions: { anthropic: { cacheControl: { type: "ephemeral" } } },
82+
});
83+
},
84+
run: async ({ messages, signal }) => {
85+
return streamText({
86+
model: anthropic("claude-sonnet-4-6"),
87+
...chat.toStreamTextOptions(), // already cached
88+
messages,
89+
abortSignal: signal,
90+
});
91+
},
92+
```
93+
94+
If more than one is set, the call-site option wins: `systemProviderOptions` overrides `cacheControl`, and both override `chat.prompt.set`'s `providerOptions`. There's no deep merge — the most specific option replaces the rest.
95+
96+
<Note>
97+
Use the 1-hour cache for prefixes that sit idle longer than 5 minutes between turns: `cacheControl: { type: "ephemeral", ttl: "1h" }`. Writes cost more (2× vs 1.25×), so it pays off only when reads span the longer window.
98+
</Note>
99+
100+
## Cache the conversation history
101+
102+
Place a breakpoint on the last message and the entire conversation prefix up to that point is cached, so the next turn reads it back instead of re-processing it. Do this in [`prepareMessages`](/ai-chat/reference#chatagentoptions) — it transforms model messages once, and `chat.agent` applies it on every path that builds a prompt (each turn, and both compaction rebuild paths), so the breakpoint always lands on the real last message.
103+
104+
```ts /trigger/chat.ts
105+
export const myChat = chat.agent({
106+
id: "my-chat",
107+
prepareMessages: async ({ messages }) => {
108+
if (messages.length === 0) return messages;
109+
const last = messages[messages.length - 1];
110+
return [
111+
...messages.slice(0, -1),
112+
{
113+
...last,
114+
providerOptions: {
115+
...last.providerOptions,
116+
anthropic: { cacheControl: { type: "ephemeral" } },
117+
},
118+
},
119+
];
120+
},
121+
run: async ({ messages, signal }) => {
122+
return streamText({
123+
model: anthropic("claude-sonnet-4-6"),
124+
...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }),
125+
messages,
126+
abortSignal: signal,
127+
});
128+
},
129+
});
130+
```
131+
132+
The system breakpoint and the conversation breakpoint compose: the system block is cached once for the life of the chat, and each turn extends the cached message prefix.
133+
134+
<Note>
135+
Anthropic allows **at most 4** cache breakpoints per request, and a prefix must be at least ~1024 tokens (model-dependent) to cache at all — shorter prefixes silently don't cache. One system breakpoint plus one rolling message breakpoint is the typical setup and leaves headroom.
136+
</Note>
137+
138+
## Caching and compaction
139+
140+
Compaction rewrites the conversation prefix — it replaces earlier turns with a summary — so it necessarily invalidates the cached message prefix at that point. That's a one-time reset, not a regression: because `prepareMessages` also runs on the compaction rebuild and result paths, the new (shorter) prefix gets a fresh breakpoint and re-warms on the next turn. Your system-prompt cache is unaffected — compaction never touches the system block. See [Compaction](/ai-chat/compaction) for how the summary is produced.
141+
142+
## Other providers
143+
144+
Caching is provider-specific, and most providers don't use per-block breakpoints at all:
145+
146+
- **OpenAI** and **Google Gemini** cache automatically. OpenAI caches any prompt prefix over 1024 tokens; Gemini 2.5 caches implicitly (1024 tokens on Flash, 2048 on Pro). Neither needs a breakpoint, so the system-caching options above are a no-op for them — `chat.agent` already gives automatic caching exactly what it needs: a byte-stable prefix that only grows across turns. Keep the system prompt frozen and the prefix over the model's minimum and reads happen on their own. (OpenAI's optional `providerOptions.openai.promptCacheKey` improves hit-routing across requests; it's a top-level option, not a system-block breakpoint.)
147+
148+
- **Anthropic** and **Amazon Bedrock** take an explicit breakpoint on the system block — Anthropic via `cacheControl`, Bedrock via `cachePoint`. Both go through the provider-agnostic `systemProviderOptions`:
149+
150+
```ts /trigger/chat.ts
151+
// Amazon Bedrock
152+
return streamText({
153+
...chat.toStreamTextOptions({
154+
systemProviderOptions: { bedrock: { cachePoint: { type: "default" } } },
155+
}),
156+
messages,
157+
});
158+
```
159+
160+
The `cacheControl` shorthand is Anthropic-only; `systemProviderOptions` (and `chat.prompt.set`'s `providerOptions`) is the form to reach for on any other breakpoint-based provider.
161+
162+
Usage reporting is normalized. Each provider reports cache tokens under its own provider-specific field, but the AI SDK maps them into the same `inputTokenDetails.cacheReadTokens` / `cacheWriteTokens` that `previousTurnUsage` and `totalUsage` carry and the dashboard shows — so the [verify step](#verify-caching-is-working) is the same regardless of provider.
163+
164+
## Verify caching is working
165+
166+
The turn's usage carries cache token counts. `chat.agent` accumulates them across turns and hands them to `run` as `previousTurnUsage` (last turn) and `totalUsage` (whole chat), both `LanguageModelUsage`:
167+
168+
```ts /trigger/chat.ts
169+
run: async ({ messages, signal, previousTurnUsage }) => {
170+
// After turn 1, cacheReadTokens should be > 0 on a stable prefix.
171+
console.log("cache read", previousTurnUsage?.inputTokenDetails?.cacheReadTokens);
172+
console.log("cache write", previousTurnUsage?.inputTokenDetails?.cacheWriteTokens);
173+
174+
return streamText({
175+
model: anthropic("claude-sonnet-4-6"),
176+
...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }),
177+
messages,
178+
abortSignal: signal,
179+
});
180+
},
181+
```
182+
183+
The first turn writes the cache (`cacheWriteTokens > 0`, `cacheReadTokens` is 0). Every turn after, on an unchanged prefix, reads it (`cacheReadTokens > 0`). The dashboard surfaces the same numbers on the AI span as **Cache write** and **Cache read**, so you can confirm hits per run without logging.
184+
185+
If `cacheReadTokens` stays 0 across turns with an identical prefix, a silent invalidator is shifting the bytes — see below.
186+
187+
<Warning>
188+
Anything that changes the prefix between turns silently kills the cache. Keep the system prompt **byte-stable** — never interpolate a timestamp, request ID, or per-turn value into `chat.prompt`. Don't change the **model** or the **tool set** mid-conversation (tools render at position 0, so adding one invalidates everything after). Inject dynamic per-turn context as a late message via [pending messages](/ai-chat/pending-messages) or [background injection](/ai-chat/background-injection), not into the cached prefix.
189+
</Warning>
190+
191+
## Next steps
192+
193+
<CardGroup cols={2}>
194+
<Card title="Compaction" icon="compress" href="/ai-chat/compaction">
195+
Keep long conversations within token limits — and re-warm the cache after.
196+
</Card>
197+
<Card title="Fast starts" icon="bolt" href="/ai-chat/fast-starts">
198+
Cut cold-start latency so a cached prefix is the only thing between a message and a reply.
199+
</Card>
200+
<Card title="chat.agent reference" icon="book" href="/ai-chat/reference#chatagentoptions">
201+
Full option surface, including `prepareMessages` and `toStreamTextOptions`.
202+
</Card>
203+
<Card title="Building agents: backend" icon="server" href="/ai-chat/backend">
204+
The three ways to build a chat backend and when to reach for each.
205+
</Card>
206+
</CardGroup>

docs/docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@
123123
"ai/prompts",
124124
"ai-chat/fast-starts",
125125
"ai-chat/compaction",
126+
"ai-chat/prompt-caching",
126127
"ai-chat/pending-messages",
127128
"ai-chat/background-injection",
128129
"ai-chat/actions",

0 commit comments

Comments
 (0)