Problem Statement
What problem does this feature solve?
The current SummarizeCompressor implementation has a significant performance and cost issue. Every time compression is triggered, it re-summarizes the entire message history from scratch, even if portions of that history have been previously summarized.
Specific issues:
- Performance: Lines 97-111 in
src/strategies/summarize.ts always process the complete message range, leading to redundant AI model calls
- Cost: Repeated summarization of the same content increases API costs unnecessarily
- Inefficiency: The original goal of summarization (reduce hallucination and save costs) is undermined by this approach
Current behavior:
// Every compression call processes ALL messages in range
const messagesToSummarize = messages.slice(summarizeStart, keepTailStart);
const conversationText = messagesToSummarize
.map((msg) => `${msg.role}: ${msg.content}`)
.join('\n---\n');
Proposed Solution
High-level approach to solving the problem
Introduce a data store interface that allows caching of previously generated summaries, following the library's "Bring Your Own Model" (BYOM) pattern with "Bring Your Own Store" (BYOS).
Key components:
SlimContextStore interface for storage abstraction
- Enhanced message identification system with thread/conversation IDs
- Intelligent cache key strategy using conversation context + message ranges
- Modified
SummarizeCompressor to check cache before generating new summaries
- Optional
InMemoryStore implementation for testing and learning purposes
Technical Details
Implementation considerations
New interfaces needed:
interface SlimContextStore {
get(key: string): Promise<string | null>;
set(key: string, value: string): Promise<void>;
delete(key: string): Promise<void>;
}
// Conversation wrapper to avoid repetitive threadId on each message
interface SlimContextConversation {
threadId: string;
messages: SlimContextMessage[];
metadata?: Record<string, unknown>;
}
// Keep SlimContextMessage clean and focused (no repetitive threadId)
interface SlimContextMessage {
role: 'system' | 'user' | 'assistant' | 'tool' | 'human';
content: string;
metadata?: Record<string, unknown>;
id?: string; // Optional message identifier
index?: number; // Position within conversation
}
interface CacheKey {
threadId: string;
type: 'summary' | 'message';
startIndex: number;
endIndex?: number; // For range summaries
}
Cache key strategy:
- Format:
"thread_{threadId}:summary:{startIndex}-{endIndex}"
- Example:
"thread_123:summary:5-15" (summary of messages 5-15 in thread 123)
- Avoids using message content as keys (inefficient for long messages)
Backward Compatibility Strategy:
// Enhanced compressor interface with method overloading
interface SlimContextCompressor {
// Existing method - maintains full backward compatibility
compress(messages: SlimContextMessage[]): Promise<SlimContextMessage[]>;
// New method - accepts conversation wrapper for enhanced functionality
compress(conversation: SlimContextConversation): Promise<SlimContextConversation>;
}
// Utility functions for format conversion
function wrapMessages(messages: SlimContextMessage[], threadId: string): SlimContextConversation;
function unwrapMessages(conversation: SlimContextConversation): SlimContextMessage[];
Integration points:
- Modify
SummarizeCompressor.compress() to check cache before summarizing
- Add method overloading to support both message arrays and conversation wrappers
- Add store configuration to
SummarizeConfig
- Update token estimation to account for cached summaries
- Cache key generation uses
conversation.threadId instead of per-message repetition
Implementation Considerations
Open questions and design decisions
1. Summary Combination Strategy
When extending a cached summary (e.g., have summary 5-15, need summary 5-25):
Option A: AI-Driven Combination
- Send model: existing summary (5-15) + new messages (16-25) combined summary (5-25)
- Pros: Intelligent merging, better context preservation, can resolve contradictions
- Cons: More expensive, potentially slower, risk of AI hallucination
Option B: Client-Side Concatenation
- Send model: only new messages (16-25) new summary (16-25)
- Concatenate: summary(5-15) + summary(16-25) = combined(5-25)
- Pros: Cost-effective, faster, predictable behavior
- Cons: Potential fragmentation, no cross-segment awareness
Option C: Hybrid Configurable
- Allow users to choose strategy based on cost/quality tradeoffs
- Default to client-side with option for AI-driven
2. Cache Invalidation Strategy
- Should cache entries expire?
- How to handle message updates/edits?
- Thread-based vs global cache management
3. Store Interface Scope
- Keep minimal (get/set/delete) or add advanced features (batch operations, TTL)?
- Async vs sync interface design?
4. Thread ID Management
- Who provides the thread ID? User application or library?
- Default behavior when no thread ID provided?
- Should we auto-generate thread IDs for backward compatibility?
5. Conversation Wrapper Benefits
- Eliminates repetition: No
threadId duplication across messages
- Cleaner API: Separates conversation context from individual message data
- Better performance: Reduces memory usage and serialization overhead
- Extensible: Easy to add conversation-level metadata without touching messages
Acceptance Criteria
Definition of done
Additional Context
Supporting information
Design principles alignment:
- Model-agnostic: Store interface doesn't depend on specific storage technology
- Framework-independent: Works with any storage backend (Redis, DB, filesystem, memory)
- BYOM pattern: Users provide their own store implementation
- Zero runtime dependencies: Core library remains dependency-free
Potential store implementations users might provide:
- Redis for distributed caching
- Database tables for persistence
- File system for local caching
- Cloud storage (S3, etc.) for serverless environments
Performance impact:
- Should significantly reduce AI model calls for repeated conversation compression
- Cache hits avoid expensive summarization operations
- Memory usage increases with cached summaries (acceptable tradeoff)
Problem Statement
What problem does this feature solve?
The current
SummarizeCompressorimplementation has a significant performance and cost issue. Every time compression is triggered, it re-summarizes the entire message history from scratch, even if portions of that history have been previously summarized.Specific issues:
src/strategies/summarize.tsalways process the complete message range, leading to redundant AI model callsCurrent behavior:
Proposed Solution
High-level approach to solving the problem
Introduce a data store interface that allows caching of previously generated summaries, following the library's "Bring Your Own Model" (BYOM) pattern with "Bring Your Own Store" (BYOS).
Key components:
SlimContextStoreinterface for storage abstractionSummarizeCompressorto check cache before generating new summariesInMemoryStoreimplementation for testing and learning purposesTechnical Details
Implementation considerations
New interfaces needed:
Cache key strategy:
"thread_{threadId}:summary:{startIndex}-{endIndex}""thread_123:summary:5-15"(summary of messages 5-15 in thread 123)Backward Compatibility Strategy:
Integration points:
SummarizeCompressor.compress()to check cache before summarizingSummarizeConfigconversation.threadIdinstead of per-message repetitionImplementation Considerations
Open questions and design decisions
1. Summary Combination Strategy
When extending a cached summary (e.g., have summary 5-15, need summary 5-25):
Option A: AI-Driven Combination
Option B: Client-Side Concatenation
Option C: Hybrid Configurable
2. Cache Invalidation Strategy
3. Store Interface Scope
4. Thread ID Management
5. Conversation Wrapper Benefits
threadIdduplication across messagesAcceptance Criteria
Definition of done
SlimContextStoreinterface defined insrc/interfaces.tsSlimContextConversationwrapper interface implementedwrapMessages/unwrapMessages)SummarizeCompressorto use store for cachingInMemoryStorereference implementationcompress(messages[])unchanged)Additional Context
Supporting information
Design principles alignment:
Potential store implementations users might provide:
Performance impact: