Compaction keeps conversations within the model's context window by summarizing older messages. This document is the reference spec for the full subsystem.
graph TD
Agent["Agent<br/><small>maybe_auto_compact</small>"]
Agent --> Estimate["estimate_tokens<br/><small>hybrid: actual + heuristic</small>"]
Agent --> Compact["Opal.Session.Compaction.compact/2"]
Compact --> Cut["find_cut_point<br/><small>walk backwards, cut at turn boundary</small>"]
Compact --> Split["detect_split_turn<br/><small>clean or mid-turn cut</small>"]
Compact --> Summary["build_summary<br/><small>LLM summarize or truncate fallback</small>"]
Compact --> Replace["Session.replace_path_segment<br/><small>swap old messages for summary</small>"]
Agent --> Broadcast["broadcast compaction_start/end<br/><small>before/after message counts</small>"]
Overflow["Overflow recovery<br/><small>separate path</small>"]
Overflow --> Reject["Provider rejects request"]
Reject --> Detect["Overflow.context_overflow?/1"]
Detect --> Emergency["emergency compact → retry turn"]
Key files:
lib/opal/session/compaction.ex— cut point, summarization, file-op trackinglib/opal/agent/usage_tracker.ex— auto-compact trigger, hybrid estimationlib/opal/agent/overflow.ex— overflow pattern matching and emergency compactionlib/opal/agent/token.ex— token estimation heuristics
Each API call sends the entire conversation. The provider returns
prompt_tokens (how many tokens it read) and completion_tokens (what it
generated).
| Metric | Definition | Compaction-relevant? |
|---|---|---|
currentContextTokens |
prompt_tokens from the last API call |
Yes — actual context size |
totalTokens |
Cumulative prompt + completion across all turns |
No — billing metric |
totalTokens double-counts because each turn re-reads the full history:
Turn 1: sends [sys, user1] → prompt=10k
Turn 2: sends [sys, user1, asst1, user2] → prompt=14k
Turn 3: sends [sys, user1, asst1, user2, asst2, user3] → prompt=19k
totalTokens = (10+2) + (14+3) + (19+1) = 49k ← cumulative
currentContextTokens = 19k ← actual context size
The CLI compaction bar uses currentContextTokens / contextWindow.
Trigger: start of each turn, when estimated context ≥ 80% of context window.
Estimation (estimate_tokens): hybrid approach —
- If
last_prompt_tokens > 0: use as calibrated base, add heuristic (~4 chars/token) for messages added since that usage report. - If no usage data yet (first turn): full heuristic via
Token.estimate_context.
Compaction flow:
- Broadcast
compaction_start. - Call
Compaction.compact/2withkeep_recent_tokens = contextWindow / 4. - Broadcast
compaction_endwith before/after message counts. - Reset
last_prompt_tokensandlast_usage_msg_indexto 0.
Walk backwards from the newest message, accumulating estimated characters
(4 chars ≈ 1 token). When the accumulated total exceeds keep_recent_tokens,
find the nearest user message boundary at or after that point. If a clean
boundary is unavailable, split-turn handling (below) preserves continuity.
Default keep_recent_tokens: 20,000. Auto-compaction passes contextWindow / 4.
If no valid cut point exists and force: true, compact all but the last 2
messages.
If the cut point lands inside a multi-message turn (e.g. a user message followed by 30 assistant+tool messages), the kept portion doesn't start with a user message. When detected and the turn prefix is ≥ 5 messages, dual summaries are generated:
- History summary — everything before the split turn.
- Turn-prefix summary — the compacted portion of the in-progress turn, with instructions to focus on what was attempted and intermediate results.
Prefixes < 5 messages are treated as clean cuts (not worth a separate summary).
Strategies:
-
:summarize(default when provider available) — LLM-powered. Asks the model for a structured summary with: Goal, Constraints, Progress (Done / In Progress), Key Decisions, Next Steps, Critical Context, plus<read-files>and<modified-files>sections. -
:truncate(fallback) — No LLM. Produces[Compacted N messages: ...]with role frequencies and file-op tags.
Anti-continuation measures:
- Transcript wrapped in
<conversation>tags (models treat as data, not dialogue). - System prompt restricts the model to summarizer role only.
- User prompt includes explicit "do NOT continue the conversation" rules.
Iterative updates: When the first message being compacted is itself a
previous compaction summary (detected by [Conversation summary prefix), the
LLM is given the @update_summary_prompt which asks it to merge new
information rather than re-summarize from scratch. This prevents progressive
information loss across multiple compaction cycles.
The old messages are replaced with a single summary message (role: :user)
via Session.replace_path_segment. The summary message carries structured
metadata:
%{type: :compaction_summary, read_files: [...], modified_files: [...]}File operations (read/write/edit tool calls) are tracked across compaction cycles. Each cycle:
- Extracts file ops from the previous summary's metadata.
- Extracts file ops from tool calls in the current batch.
- Merges and deduplicates — files that were read then later modified are promoted to modified-only.
When the provider rejects a request due to context overflow:
Overflow.context_overflow?/1pattern-matches against 17 known error strings from OpenAI, Anthropic, Google, and generic proxies.- If matched, emergency compaction fires with
keep_recent_tokens = contextWindow / 5andforce: true. - The turn is retried immediately. This is not counted as a retry attempt.
A second detection path (Overflow.usage_overflow?/2) checks if reported
input_tokens exceed the context window (silent truncation). This flags
overflow_detected so finalize_response can compact before the next turn.
If no session process exists, overflow is surfaced as an error (can't compact).
Looked up from LLMDB at agent initialization via Opal.Provider.Registry.context_window/1. Falls back to 128k if the model isn't found. For Copilot models, queries the github_copilot LLMDB provider; for direct providers, queries the upstream provider (e.g., :anthropic).
The context_window value is included in every usage_update and agent_end event, and is available via agent/state RPC from the moment the session starts.
Users can trigger compaction via /compact in the CLI. The RPC handler
(session/compact) accepts session_id (required) and keep_recent (optional
token budget override). After compacting, it syncs the agent's message list
with the compacted session path.