Is your feature request related to a problem? Please describe.
Long-running conversations can silently exceed a model's context window limit without warning, causing requests to fail or be truncated. The SDK currently tracks token usage after execution but doesn't prevent or warn about context overflow. For production applications, this leads to unpredictable behavior and wasted API calls.
Describe the solution you'd like
Add automatic context window management to the SDK:
- Pre-request token counting — Calculate total tokens before sending requests to detect if context limit will be exceeded
- Automatic message pruning — When approaching context limit (e.g., 85% full), remove oldest messages to make room for new requests
- Conversation summarization — Optionally replace pruned messages with AI-generated summaries to retain context
- Per-model context windows — Enforce model-specific limits (GPT-4: 128k, Claude: 200k, Gemini: 1M, etc.)
Describe alternatives you've considered
- Manual truncation — Developers manage message limits themselves (high friction, error-prone)
- Fail fast — Reject requests that exceed limit (poor UX, no recovery)
- Message count limits — Cap message count instead of tokens (ignores model differences and encoding variations)
- Hard truncation — Drop messages without summarization (loses context)
Additional context
- Should work across all providers (OpenAI, Anthropic, Gemini, Ollama) with their respective token counters
- Integrate with existing conversationId tracking and conversation history
- Should emit events for monitoring when pruning/summarization occurs
- Configuration: enable/disable, pruning strategy, summary trigger threshold
- Use case: chatbots, long-running agent tasks, multi-turn workflows
Is your feature request related to a problem? Please describe.
Long-running conversations can silently exceed a model's context window limit without warning, causing requests to fail or be truncated. The SDK currently tracks token usage after execution but doesn't prevent or warn about context overflow. For production applications, this leads to unpredictable behavior and wasted API calls.
Describe the solution you'd like
Add automatic context window management to the SDK:
Describe alternatives you've considered
Additional context