Skip to content

[FEATURE] Conversation compaction to manage context limits #48

@jwesleye

Description

@jwesleye

Feature Description

Automatically compact or summarize older parts of long conversations to stay within context window limits while preserving important information.

Problem/Motivation

AWS Strands agents have context window limits (e.g., 200K tokens for Claude Sonnet 4.5). During long sessions:

  • Conversations can exceed the context limit
  • Users may lose access to older parts of the conversation
  • No warning when approaching limits
  • No automatic management of conversation length

Proposed Solution

Core Features

  1. Context monitoring - Track current conversation token usage
  2. Limit warnings - Alert users when approaching context limits (80%, 90%, 95%)
  3. Manual compaction - compact command to summarize older messages
  4. Auto-compaction - Optional automatic compaction when reaching threshold
  5. Smart summarization - Preserve key information while reducing token count

Implementation Options

Option A: Rolling Window

  • Keep last N messages in full detail
  • Summarize or remove older messages
  • Simple, predictable behavior

Option B: Intelligent Summarization

  • Use the agent itself to summarize old conversation chunks
  • Preserve important context (code, decisions, key facts)
  • More sophisticated but higher quality

Option C: Hybrid

  • Keep recent messages (last 20-30) in full
  • Summarize middle sections in chunks
  • Drop very old messages beyond threshold

User Experience

# During conversation - warning appears
⚠️  Context usage: 180K / 200K tokens (90%)
Consider using 'compact' command to summarize older messages.

# Manual compaction
You: compact
Compacting conversation (keeping last 30 messages, summarizing 50 older messages)...
✓ Reduced from 180K to 95K tokens (47% reduction)
✓ Preserved 30 recent messages + summary of earlier conversation

# View context status
You: context
Current usage: 95K / 200K tokens (47%)
Messages: 80 total (30 full + 1 summary block)
Oldest message: 2 hours ago

Configuration

# In ~/.chatrc
context:
  max_tokens: 200000          # Model's context limit
  warning_thresholds: [0.8, 0.9, 0.95]  # Show warnings at 80%, 90%, 95%
  auto_compact: false         # Enable automatic compaction
  auto_compact_threshold: 0.85  # Compact at 85% if auto enabled
  preserve_recent: 30         # Always keep last N messages in full
  compaction_method: hybrid   # rolling | summarize | hybrid

Benefits

  • ✅ Never hit context limits unexpectedly
  • ✅ Maintain usable conversation history
  • ✅ Clear visibility into context usage
  • ✅ User control over compaction strategy
  • ✅ Preserve important information

Related Commands

  • context - Show current context usage and statistics
  • compact - Manually trigger compaction
  • compact --preview - Preview what would be compacted
  • compact --method=<rolling|summarize> - Choose compaction strategy

Technical Considerations

Token Counting:

  • Need accurate token counting for current conversation
  • Use tiktoken or similar for Claude/GPT models
  • Track running total as conversation progresses

Summarization Quality:

  • Test different summarization prompts
  • Ensure key information preserved (code, decisions, facts)
  • Include metadata (timestamp, message count) in summaries

Backward Compatibility:

  • Make all features opt-in initially
  • Graceful degradation if token counting unavailable
  • Don't break existing sessions

Edge Cases:

  • Very first message after compaction (context reset)
  • Session resume after compaction
  • Saving/loading compacted sessions

Priority

  • Critical
  • High - Prevents frustrating context limit errors
  • Medium
  • Low

Dependencies

  • Token counting library (tiktoken, anthropic tokenizer)
  • Optional: summarization prompt engineering
  • Configuration system (already exists)

Testing Plan

  1. Test with conversations of various lengths
  2. Verify token counting accuracy
  3. Test summarization quality
  4. Ensure session save/resume works with compaction
  5. Performance test with very long conversations

Future Enhancements

  • Semantic chunking (group related messages)
  • Important message pinning (never compact)
  • Compaction history tracking
  • Export compacted conversations with summaries

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions