Cut your OpenClaw token costs without losing technical accuracy.
Every message you send to an LLM costs tokens — and most of that cost comes from the conversation history that gets sent along with it. As a conversation grows, the model receives your entire history on every single request. Caveman Claw reduces that cost in two ways:
- Output mode — tells the AI to respond in a terse, fragment-based style. Same technical accuracy, fewer words, fewer tokens billed.
- Auto-compression — automatically trims older messages in your conversation history before each request. You never notice it happening, but you pay for less.
- OpenClaw ≥ 2026.4.0
- Node.js ≥ 22
git clone https://github.com/mCo0L/caveman-claw
openclaw plugins install ./caveman-claw
openclaw plugins enable caveman-clawOnce published to npm, the install will be:
openclaw plugins install @mco0l/caveman-claw
openclaw plugins enable caveman-clawThat's it. After enabling, caveman mode activates automatically for every new session (controlled by globalByDefault — on by default).
Since Caveman Claw isn't on ClawHub yet, updates are manual:
Option 1 — if you cloned the repo locally:
cd /path/to/your/caveman-claw
git pull
openclaw plugins install ./caveman-claw
openclaw plugins disable caveman-claw
openclaw plugins enable caveman-clawOption 2 — pull directly in the plugins directory:
cd ~/.openclaw/plugins/caveman-claw
git pull
openclaw plugins disable caveman-claw
openclaw plugins enable caveman-clawOnce installed with default settings:
- Every new session gets caveman mode automatically — the AI responds in terse, fragment-based style from the first message
- Every LLM request has older conversation history algorithmically compressed before it's sent — articles stripped, filler cut, synonyms shortened
- Token savings are tracked and available any time with
/cc stats
You don't need to type anything to get the benefits. It just runs.
You can also control caveman mode manually from chat:
| Command | What it does |
|---|---|
/cc |
Activate caveman mode (full intensity) |
/cc lite |
Mild mode — drops filler, keeps full sentences |
/cc ultra |
Aggressive mode — abbreviates fn, param, cfg, err, etc. |
/cc compress <file> |
Compress a context file like SOUL.md or SKILL.md |
/cc stats |
Show how many tokens and dollars you've saved |
normal |
Turn off caveman mode for this session |
stop caveman |
Same as above |
lite — Removes filler phrases ("Sure!", "Of course", "I'd be happy to") and stops there. Full sentences preserved.
full (default) — Everything in lite, plus: fragments instead of full sentences, hedging removed ("might", "possibly", "appears to"), common words shortened ("utilize" → "use", "however" → "but", "additionally" → "also").
ultra — Everything in full, plus: technical terms abbreviated (function → fn, parameter → param, configuration → cfg, error → err, repository → repo).
If you have large context files (SOUL.md, SKILL.md, project docs) that get loaded on every request, you can compress them permanently:
/cc compress SOUL.md
This rewrites the file in caveman format and saves a .original.md backup first. Code blocks, URLs, and numbers are never modified.
/cc stats
Caveman Claw savings:
Session: -12,400 tokens (~$0.04 saved)
Today: -89,200 tokens (~$0.27 saved)
All time: -1.2M tokens (~$3.60 saved)
Model: openrouter/anthropic/claude-haiku-4-5 · $0.80/1M tokens
Note: tracks input compression only. Output savings not measured.
Pricing is read automatically from your OpenClaw config (~/.openclaw/openclaw.json) and model catalog — no manual setup needed.
After installing, you can edit plugin.json in the Caveman Claw plugin directory to change defaults:
{
"config": {
"autoCompress": true,
"historyDepth": 2,
"intensity": "full",
"globalByDefault": true
}
}| Option | Default | Description |
|---|---|---|
globalByDefault |
true |
Automatically activate caveman mode for every new session |
autoCompress |
true |
Compress older conversation history before each LLM call |
historyDepth |
2 |
How many of the most recent assistant turns to leave uncompressed |
intensity |
"full" |
Compression level: lite, full, or ultra |
Setting globalByDefault: false means caveman mode is opt-in — you activate it manually per session with /cc.
When a request is sent, Caveman Claw processes the conversation history before it reaches the model:
- User messages are never compressed
- System messages are never compressed
- The most recent N assistant turns are left uncompressed (controlled by
historyDepth) - Older assistant messages are compressed: articles removed, filler phrases stripped, synonyms shortened, hedging cut
- Code blocks, URLs, and numbers are extracted and restored exactly — never touched
When compression reduces the history, a short system note is injected before your latest message letting the model know the older context was pre-compressed and to respond with matching brevity.
- Auto-compression requires
context_before_requesthook support in OpenClaw (issue #33017). This hook isn't available yet — the plugin registers it in advance and will activate automatically once OpenClaw ships it. In the meantime,globalByDefault: true(the default) gives you caveman output mode on every session, and/cc compresshandles manual context file compression. - Savings tracking covers input tokens only — output token reduction from shorter responses is not measured.
- Token estimation is approximate — uses a character-based heuristic (~4 chars/token). Accurate enough for tracking savings trends; not suitable for billing.
/cc not showing up in chat
Run openclaw plugins list to confirm the plugin is enabled. If it shows as disabled, run openclaw plugins enable caveman-claw.
Stats show $0 savings
Pricing requires ~/.openclaw/openclaw.json (for the active model) and ~/.openclaw/agents/main/agent/models.json (for per-model pricing). These are created by OpenClaw automatically — if they're missing, token counts will still be tracked, just without dollar estimates.
Caveman mode feels too aggressive
Switch to lite mode: set "intensity": "lite" in plugin.json, or type /cc lite in chat for the current session only.
For contributors and curious users:
- Compression is purely algorithmic — regex-based rules strip articles, filler phrases, hedging words, and replace verbose synonyms with shorter ones. Code blocks, URLs, and numbers are extracted before processing and restored exactly afterward.
- Hook registration uses
api.registerHook({ name: 'context_before_request', handler })— a forward-compatible registration that activates once OpenClaw ships the hook. - Savings tracking writes to
data/savings.jsonusing atomic temp-file-then-rename writes to prevent corruption on concurrent requests. - Model pricing is read from
~/.openclaw/openclaw.json(active model) and~/.openclaw/agents/main/agent/models.json(per-model costs) — the same files OpenClaw uses for its own dashboard. - Session activation for
globalByDefaultlistens tosession:patchandcommand:newevents and records activated session IDs indata/activated-sessions.json.
Caveman Claw was directly inspired by caveman by Julius Brussee — the original token-saving caveman-speak plugin for Claude Code. The core idea, compression philosophy, and intensity levels all trace back to that project.
Caveman Claw is a ground-up reimplementation for the OpenClaw ecosystem: TypeScript compression library, native plugin registration, token savings tracking against OpenClaw's own model pricing, and an auto-compression hook. Credit for the concept belongs entirely to Julius.
Issues and PRs welcome at github.com/mCo0L/caveman-claw.
MIT