Data Store Interface for Summary Caching


### Problem Statement
**What problem does this feature solve?**

The current `SummarizeCompressor` implementation has a significant performance and cost issue. Every time compression is triggered, it re-summarizes the entire message history from scratch, even if portions of that history have been previously summarized.

**Specific issues:**
- **Performance**: Lines 97-111 in `src/strategies/summarize.ts` always process the complete message range, leading to redundant AI model calls
- **Cost**: Repeated summarization of the same content increases API costs unnecessarily
- **Inefficiency**: The original goal of summarization (reduce hallucination and save costs) is undermined by this approach

**Current behavior:**
```typescript
// Every compression call processes ALL messages in range
const messagesToSummarize = messages.slice(summarizeStart, keepTailStart);
const conversationText = messagesToSummarize
  .map((msg) => `${msg.role}: ${msg.content}`)
  .join('\n---\n');
```

### Proposed Solution
**High-level approach to solving the problem**

Introduce a **data store interface** that allows caching of previously generated summaries, following the library's "Bring Your Own Model" (BYOM) pattern with "Bring Your Own Store" (BYOS).

**Key components:**
1. `SlimContextStore` interface for storage abstraction
2. Enhanced message identification system with thread/conversation IDs
3. Intelligent cache key strategy using conversation context + message ranges
4. Modified `SummarizeCompressor` to check cache before generating new summaries
5. Optional `InMemoryStore` implementation for testing and learning purposes

### Technical Details
**Implementation considerations**

**New interfaces needed:**
```typescript
interface SlimContextStore {
  get(key: string): Promise<string | null>;
  set(key: string, value: string): Promise<void>;
  delete(key: string): Promise<void>;
}

// Conversation wrapper to avoid repetitive threadId on each message
interface SlimContextConversation {
  threadId: string;
  messages: SlimContextMessage[];
  metadata?: Record<string, unknown>;
}

// Keep SlimContextMessage clean and focused (no repetitive threadId)
interface SlimContextMessage {
  role: 'system' | 'user' | 'assistant' | 'tool' | 'human';
  content: string;
  metadata?: Record<string, unknown>;
  id?: string;        // Optional message identifier
  index?: number;     // Position within conversation
}

interface CacheKey {
  threadId: string;
  type: 'summary' | 'message';
  startIndex: number;
  endIndex?: number;     // For range summaries
}
```

**Cache key strategy:**
- Format: `"thread_{threadId}:summary:{startIndex}-{endIndex}"`
- Example: `"thread_123:summary:5-15"` (summary of messages 5-15 in thread 123)
- Avoids using message content as keys (inefficient for long messages)

**Backward Compatibility Strategy:**
```typescript
// Enhanced compressor interface with method overloading
interface SlimContextCompressor {
  // Existing method - maintains full backward compatibility
  compress(messages: SlimContextMessage[]): Promise<SlimContextMessage[]>;

  // New method - accepts conversation wrapper for enhanced functionality
  compress(conversation: SlimContextConversation): Promise<SlimContextConversation>;
}

// Utility functions for format conversion
function wrapMessages(messages: SlimContextMessage[], threadId: string): SlimContextConversation;
function unwrapMessages(conversation: SlimContextConversation): SlimContextMessage[];
```

**Integration points:**
- Modify `SummarizeCompressor.compress()` to check cache before summarizing
- Add method overloading to support both message arrays and conversation wrappers
- Add store configuration to `SummarizeConfig`
- Update token estimation to account for cached summaries
- Cache key generation uses `conversation.threadId` instead of per-message repetition

### Implementation Considerations
**Open questions and design decisions**

**1. Summary Combination Strategy**
When extending a cached summary (e.g., have summary 5-15, need summary 5-25):

**Option A: AI-Driven Combination**
- Send model: existing summary (5-15) + new messages (16-25)  combined summary (5-25)
- **Pros**: Intelligent merging, better context preservation, can resolve contradictions
- **Cons**: More expensive, potentially slower, risk of AI hallucination

**Option B: Client-Side Concatenation**
- Send model: only new messages (16-25)  new summary (16-25)
- Concatenate: summary(5-15) + summary(16-25) = combined(5-25)
- **Pros**: Cost-effective, faster, predictable behavior
- **Cons**: Potential fragmentation, no cross-segment awareness

**Option C: Hybrid Configurable**
- Allow users to choose strategy based on cost/quality tradeoffs
- Default to client-side with option for AI-driven

**2. Cache Invalidation Strategy**
- Should cache entries expire?
- How to handle message updates/edits?
- Thread-based vs global cache management

**3. Store Interface Scope**
- Keep minimal (get/set/delete) or add advanced features (batch operations, TTL)?
- Async vs sync interface design?

**4. Thread ID Management**
- Who provides the thread ID? User application or library?
- Default behavior when no thread ID provided?
- Should we auto-generate thread IDs for backward compatibility?

**5. Conversation Wrapper Benefits**
- **Eliminates repetition**: No `threadId` duplication across messages
- **Cleaner API**: Separates conversation context from individual message data
- **Better performance**: Reduces memory usage and serialization overhead
- **Extensible**: Easy to add conversation-level metadata without touching messages

### Acceptance Criteria
**Definition of done**
- [ ] `SlimContextStore` interface defined in `src/interfaces.ts`
- [ ] `SlimContextConversation` wrapper interface implemented
- [ ] Method overloading for compressors (both message array and conversation wrapper)
- [ ] Utility functions for format conversion (`wrapMessages`/`unwrapMessages`)
- [ ] Modified `SummarizeCompressor` to use store for caching
- [ ] Cache key generation utilities using conversation context
- [ ] `InMemoryStore` reference implementation
- [ ] Summary combination strategy implemented (choose one approach initially)
- [ ] Unit tests for caching behavior and backward compatibility
- [ ] Performance benchmarks showing improvement
- [ ] Documentation for store integration and new conversation wrapper
- [ ] Full backward compatibility maintained (existing `compress(messages[])` unchanged)

### Additional Context
**Supporting information**

**Design principles alignment:**
- **Model-agnostic**: Store interface doesn't depend on specific storage technology
- **Framework-independent**: Works with any storage backend (Redis, DB, filesystem, memory)
- **BYOM pattern**: Users provide their own store implementation
- **Zero runtime dependencies**: Core library remains dependency-free

**Potential store implementations users might provide:**
- Redis for distributed caching
- Database tables for persistence
- File system for local caching
- Cloud storage (S3, etc.) for serverless environments

**Performance impact:**
- Should significantly reduce AI model calls for repeated conversation compression
- Cache hits avoid expensive summarization operations
- Memory usage increases with cached summaries (acceptable tradeoff)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Store Interface for Summary Caching #11

Problem Statement

Proposed Solution

Technical Details

Implementation Considerations

Acceptance Criteria

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data Store Interface for Summary Caching #11

Description

Problem Statement

Proposed Solution

Technical Details

Implementation Considerations

Acceptance Criteria

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions