Skip to content

ranxianglei/opencode-acp

Repository files navigation

English | 中文

Active Context Pruning for OpenCode
The model decides when and what to compress — not a hard limit.


npm license GitHub

opencode plugin opencode-acp@latest --global


Why ACP

ACP hands all context-management authority to the model itself — not relying on external models or any complex external mechanism to do context management. It is, to date, the best context-management implementation on the market.

This brings two concrete effects:

  • It saves about two-thirds of tokens. A model with a 1,000,000-token context window effectively runs in the 200,000–300,000 token range.
  • It supports ultra-long sessions without losing key content500M-token-level cumulative context, 100,000 messages per session.

Proven at scale

Real engineering context, in practice.

Supports 500M-token-level cumulative context, with p95 context around 30% and an average prompt-cache hit ratio above 85%. (That average — not per-session — is explained in Impact on Prompt Caching, where it turns out to save far more tokens than traditional compression.)

Session 1 Session 2
Messages 3,024 2,028
Total tokens processed 582 M 463 M
Prompt-cache hit ratio 86.2% 89.0%
Context p50 (median) 1.2 K (<1%) 1.8 K (<1%)
Context p75 2.8 K 3.5 K
Context p90 108 K (11%) 58 K (6%)
Context p95 251 K (25%) 335 K (34%)
Context p99 425 K (43%) 442 K (44%)
Peak 488 K (49%) 769 K (77%)

(Context percentages are of the 1M window.)


Installation

opencode plugin opencode-acp@latest --global

Or add to your opencode config:

{
  "plugin": {
    "opencode-acp": "latest"
  }
}

How It Works

ACP hands the context-compression tool directly to the model. The model is 100% responsible for context compression. The model's available tools are mainly: compress, decompress, and delete (mark_block / unmark_block).

Lifecycle

Three operations: compress, decompress, and delete. Content loops between raw and compressed, and eventually terminates in deletion:

stateDiagram-v2
    Raw --> Compressed : compress
    Compressed --> Raw : decompress
    Compressed --> Deleted : delete
Loading

Compression strategy

The system injects a prompt telling the model the current context ratio, the compression ratio, whether context is idle, and compression suggestions. When the trigger ratio is hit, content is compressed in priority order:

  1. Agent/subagent review & consultation results (largest block of uncompressed content)
  2. Verbose command output (build/test runs, git diff/log/status, directory listings)
  3. Exploration that led nowhere (failed approaches, dead-end searches)
  4. Redundant tool results (reading the same file repeatedly, repeated status checks)
  5. Intermediate steps of completed multi-step tasks
  6. Resolved discussion threads (once a decision is recorded)
  7. Large file contents already used

After compression, the original content is replaced by a short block that references the original (recoverable via decompress).

Decompression strategy

The model decides when to decompress. When the context is large enough to interfere with the model's self-attention, short blocks lead the model to compress some content first, handle the urgent matter, then decompress what it needs in later work.

Deletion strategy

To handle the accumulation of many small historical blocks, the new version adds a deletion strategy. The model decides whether to delete. Once deleted, content is irrecoverable. This replaces the original forced GC, so that forced garbage collection no longer deletes things the model considers important.


Impact on Prompt Caching

Historically, ACP has fixed many of the low-cache-hit-rate problems caused by DCP. The overall cache hit rate is now ~87%.

Compared to traditional compression — which only compresses at 80–90% and, once it compresses, forces 100% of the context to re-hit — ACP's hit rate is effectively higher.

Additionally, ACP keeps total context around ~30% most of the time, versus the traditional 50–80%. So total token savings are far higher than traditional compression.

Conclusion: ACP simultaneously raises the overall cache hit rate and ensures key context information is not lost.


Commands

ACP provides an /acp slash command (also accepts /dcp for backward compatibility):

Command Description
/acp Shows available ACP commands
/acp context Token usage breakdown by category (system, user, assistant, tools, etc.) and how much has been saved through pruning
/acp stats Cumulative pruning statistics across all sessions
/acp sweep [n] Prunes all tools since the last user message. Optional count: /acp sweep 10 prunes the last 10 tools. Respects commands.protectedTools
/acp manual [on|off] Toggle manual mode. When on, the AI will not autonomously use context management tools
/acp compress [focus] Trigger a single compress tool execution. Optional focus text directs what content to compress, following the active compress.mode
/acp decompress <n> Restore a specific active compression by ID. Running without an argument shows available compression IDs, token sizes, and topics
/acp recompress <n> Re-apply a user-decompressed compression by ID. Running without an argument shows recompressible IDs, token sizes, and topics

Configuration

ACP uses its own config file, searched in order:

  1. Global: ~/.config/opencode/acp.jsonc (or acp.json), created automatically on first run
  2. Custom config directory: $OPENCODE_CONFIG_DIR/acp.jsonc (or acp.json), if OPENCODE_CONFIG_DIR is set
  3. Project: .opencode/acp.jsonc (or acp.json) in your project's .opencode directory

If no acp.jsonc is found, ACP falls back to dcp.jsonc / dcp.json (for backward compatibility with existing DCP installations) and auto-migrates on first write.

Each level overrides the previous, so project settings take priority over global. Restart OpenCode after making config changes.

Important

Disable OpenCode's built-in auto-compaction. ACP handles context management itself — OpenCode's compaction conflicts with ACP and can cause issues (re-expanded messages, lost compression state). Add to your opencode.json:

{
  "compaction": {
    "auto": false
  }
}

Or set the environment variable: OPENCODE_DISABLE_AUTOCOMPACT=1

Note

If you use models with smaller context windows, such as GitHub Copilot models or local models, lower compress.minContextLimit and compress.maxContextLimit in your configuration to match the available context.

Default Configuration (click to expand)
{
    "$schema": "https://raw.githubusercontent.com/ranxianglei/opencode-acp/master/dcp.schema.json",
    // Enable or disable the plugin
    "enabled": true,
    // Automatically update npm-installed ACP when a newer npm latest is available.
    // Version-locked plugin specs are not updated.
    "autoUpdate": true,
    // Enable debug logging to ~/.config/opencode/logs/acp/
    "debug": false,
    // Notification display: "off", "minimal", or "detailed"
    "pruneNotification": "detailed",
    // Notification type: "chat" (in-conversation) or "toast" (system toast)
    "pruneNotificationType": "chat",
    // Slash commands configuration
    "commands": {
        "enabled": true,
        // Additional tools to protect from pruning via commands (e.g., /acp sweep)
        "protectedTools": [],
    },
    // Manual mode: disables autonomous context management,
    // tools only run when explicitly triggered via /acp commands
    "manualMode": {
        "enabled": false,
        // When true, automatic cleanup (deduplication, purgeErrors)
        // still runs even in manual mode
        "automaticStrategies": true,
    },
    // Protect from pruning for <turns> message turns past tool invocation
    "turnProtection": {
        "enabled": false,
        "turns": 4,
    },
    // Experimental settings
    "experimental": {
        // Allow ACP processing in subagent sessions
        "allowSubAgents": false,
        // Enable user-editable prompt overrides under dcp-prompts directories
        // When false (default), prompt override files/directories are ignored
        "customPrompts": false,
    },
    // Protect file operations from pruning via glob patterns
    // Patterns match tool parameters.filePath (e.g. read/write/edit)
    "protectedFilePatterns": [],
    // Unified context compression tool and behavior settings
    "compress": {
        // Compression mode: "range" (compress spans into block summaries)
        // or experimental "message" (compress individual raw messages)
        "mode": "range",
        // Permission mode: "allow" (no prompt), "ask" (prompt), "deny" (tool not registered)
        "permission": "allow",
        // Show compression content in a chat notification
        "showCompression": true,
        // Let active summary tokens extend the effective maxContextLimit
        "summaryBuffer": true,
        // Soft upper threshold: above this, ACP keeps injecting strong
        // compression nudges (based on nudgeFrequency), so compression is
        // much more likely. Accepts: number or "X%" of model context window.
        "maxContextLimit": "55%",
        // Soft lower threshold for reminder nudges: below this, turn/iteration
        // reminders are off (compression less likely). At/above this, reminders
        // are on. Accepts: number or "X%" of model context window.
        "minContextLimit": "45%",
        // Optional per-model override for maxContextLimit by providerID/modelID.
        // If present, this wins over the global maxContextLimit.
        // Accepts: number or "X%".
        // Example:
        // "modelMaxLimits": {
        //     "openai/gpt-5.3-codex": 120000,
        //     "anthropic/claude-sonnet-4.6": "80%"
        // },
        // Optional per-model override for minContextLimit.
        // If present, this wins over the global minContextLimit.
        // "modelMinLimits": {
        //     "openai/gpt-5.3-codex": 50000,
        //     "anthropic/claude-sonnet-4.6": "25%"
        // },
        // How often the context-limit nudge fires (1 = every fetch, 5 = every 5th)
        "nudgeFrequency": 5,
        // Start adding compression reminders after this many
        // messages have happened since the last user message
        "iterationNudgeThreshold": 15,
        // Controls how likely compression is after user messages
        // ("strong" = more likely, "soft" = less likely)
        "nudgeForce": "soft",
        // Tool names whose completed outputs are appended to the compression
        "protectedTools": [],
        // Preserve text wrapped in <protect>...</protect> when compressed
        "protectTags": false,
        // Preserve your messages during compression.
        // Warning: large copy-pasted prompts will never be compressed away
        "protectUserMessages": false,
    },
    // Automatic pruning strategies
    "strategies": {
        // Remove duplicate tool calls (same tool with same arguments)
        "deduplication": {
            "enabled": true,
            // Additional tools to protect from pruning
            "protectedTools": [],
        },
        // Prune tool inputs for errored tools after X turns
        "purgeErrors": {
            "enabled": true,
            // Number of turns before errored tool inputs are pruned
            "turns": 4,
            // Additional tools to protect from pruning
            "protectedTools": [],
        },
    },
    // Garbage collection and batch cleanup
    "gc": {
        "algorithm": "truncate",
        // young → old generation promotion after this many survivals
        "promotionThreshold": 5,
        // deactivate a block after this many survivals
        "maxBlockAge": 15,
        // truncate old-gen summaries exceeding this length (chars)
        "maxOldGenSummaryLength": 3000,
        // run major GC when context usage exceeds this
        "majorGcThresholdPercent": "100%",
        // Three-tier batch merge-cleanup for blocks flagged via mark_block.
        // Accepts a number or "X%" of the model context window.
        "batchCleanup": {
            // At/above this usage, remind the model about marked blocks
            "lowThreshold": "60%",
            // At/above this usage, auto merge-compress all marked blocks into one
            "highThreshold": "75%",
            // At/above this usage, force-merge all old-gen blocks (before GC)
            "forceThreshold": "90%",
        },
    },
}

Prompt Overrides

ACP exposes six editable prompts:

  • system
  • compress-range
  • compress-message
  • context-limit-nudge
  • turn-nudge
  • iteration-nudge

This feature is disabled by default. Set experimental.customPrompts to true in your ACP config to activate it.

When enabled, managed defaults are written to ~/.config/opencode/acp-prompts/defaults/ as plain-text prompt files. A single README.md in that directory explains each prompt and how to create overrides.

To customize behavior, add a file with the same name under an overrides directory and edit it as plain text.

To reset an override, delete the matching file from your overrides directory.

Protected Tools

By default, these tools are always protected from pruning: task, skill, todowrite, todoread, compress, decompress, mark_block, unmark_block, batch, plan_enter, plan_exit, write, edit

The protectedTools arrays in commands and strategies add to this default list.

For the compress tool, compress.protectedTools ensures specific tool outputs are appended to the compressed summary. By default it includes task, skill, todowrite, todoread, and decompress.


Migrating from DCP

ACP is a drop-in replacement for DCP. To migrate:

  1. Remove the old DCP plugin from your opencode.json
  2. Install ACP: opencode plugin install opencode-acp@latest --global
  3. Restart OpenCode

What's preserved:

  • Session state (compression blocks, message ID mappings) -- auto-migrated from plugin/dcp/ to ~/.local/share/opencode/storage/plugin/acp/
  • Config file ~/.config/opencode/dcp.jsonc -- ACP auto-migrates to acp.jsonc
  • Prompt overrides in ~/.config/opencode/dcp-prompts/ -- auto-migrates to acp-prompts/

What changes:

  • Storage directory: plugin/dcp/ to plugin/acp/ (auto-migrated on first launch)
  • Log directory: logs/dcp/ to logs/acp/
  • Slash command: /dcp to /acp (both work for backward compatibility)
  • Notification headers: DCP to ACP
  • Context usage label: DCP threshold to ACP threshold

ACP auto-migrates config from dcp.jsonc to acp.jsonc and prompts from dcp-prompts/ to acp-prompts/ on first launch.


Bug Fixes (38 total) -- applied on top of DCP v3.1.11
# Severity Summary
1 CRITICAL State not persisted across restarts -- messageIds, block deactivation, save errors silently lost
2 CRITICAL resetOnCompaction() clears all compression blocks -- undoes all pruning work
3 CRITICAL prune silently drops summary -- DATA LOSS when no user message precedes anchor
4 CRITICAL getCurrentTokenUsage returns 0 -- prevents nudge from ever triggering
5 HIGH loadPruneMessagesState duplicates activeBlockIds + reasoning-strip undefined guard
6 HIGH Synthetic summary messages get mNNNN refs but are invisible to boundary lookup
7 HIGH State not persisted across restarts -- messageIds, block deactivation, and save errors silently lost
8 HIGH isMessageCompacted() inconsistent with compaction summary message handling
9 HIGH Compressed block summaries retain stale mNNNN message ID tags -- model copies stale IDs
10 HIGH Model uses stale mNNNN IDs from nudges/summaries -- compress fails with "startId not available"
11 HIGH Major GC skips legacy blocks without generation field -- oversized blocks never collected
12 HIGH Percentage-based thresholds calculated against effective input context instead of full model context window
13 HIGH Context window leaks -- compressed messages reappear after /compact
14 HIGH Compression notifications write full block summaries to DB -- can reach 150KB+ per notification
15 HIGH npm auto-install overwrites fork with upstream package
16 HIGH Summary mNNNN refs in compress output -- model copies stale message IDs
17 HIGH Synthetic messages not in messageIdToBlockId -- compress fails to find them
18 HIGH Compress stops model from responding after compression completes
19 HIGH Dynamic block guidance breaks API prefix cache
20 HIGH GC never deactivates old blocks -- dead-weight accumulates indefinitely
21 HIGH Logger + tokenizer 20-50s per-turn latency (268x slowdown)
22 HIGH compress throws hard error on reversed block boundaries -- model gives up
23--34 MEDIUM Various fixes for dedup, purge errors, schema validation, hook timing, etc.
35 HIGH Aging warnings shown at low context usage (<50%) -- triggers unnecessary compress, wastes tokens
36 HIGH Compression summary emitted as a standalone user message before the user's real turn -- model reads its own prior assistant output as user input, causing dialog role confusion / self-Q&A loops
37 HIGH Message-transform pipeline runs on OpenCode's hidden title/summary/compaction agent requests -- corrupts the request and shared session state, breaking session title generation
38 CRITICAL pruneToolOutputs/pruneToolInputs/pruneToolErrors mutate existing messages in-place -- invalidates LLM prefix cache, causing 89% of fresh input tokens to be wasted on cache-invalidating re-sends

For the complete list with root cause analysis, see the bug tracker.


License

AGPL-3.0-or-later -- This project is a fork of @tarquinen/opencode-dcp. Original copyright belongs to the original author. Modifications and bug fixes by ranxianglei.

About

Active Context Pruning — model-driven context management for OpenCode ,主动压缩上下文

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors