Skip to content

TCMG-v1/dnd-context-compressor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dnd-context-compressor

Context compression pipeline for running LLM-powered D&D multiplayer sessions on a Raspberry Pi 4 (8GB).

1 LLM call per round, no matter how many players.

The Problem

Running a 5-player D&D session with local LLMs (Ollama) on a Pi 4 means:

  • Small context windows (~1100 tokens for phi3-mini)
  • Slow generation (~12s per call)
  • 5 sequential LLM calls = 60 seconds per round = unplayable

The Solution

This library compresses everything so the DM makes one merged LLM call per round:

Players act → Collect all → Compress → 1 LLM call → Pre-gen A/B → Broadcast → Whispers → Wait

5 Compression Strategies (layered, cheapest first)

# Strategy Cost Description
1 TRIM Free Sliding window — drop old context
2 SKELETON Free Strip narration, keep state-changing facts
3 SUMMARIZE 1 cheap LLM call Condense old exchanges to 1-2 sentences
4 MERGE Free Combine N player actions into 1 paragraph
5 FORGET Free Nuclear reset — only keep current scene

Token Budget

Component Tokens
Context (compressed history + actions) ~800
Generation (DM response) ~300
Total ~1,100

Components

ContextCompressor — Token budget engine

Applies compression strategies automatically to stay within budget.

TurnCollector — Async action batching

Waits for all players (60s timeout), auto-passes stragglers, merges everything into one prompt.

DMPipeline — Main orchestrator

The full round loop: compress → DM → pre-gen → broadcast → whispers → wait.

PreGenerator — A/B options

Pre-generates choice options while players read narration. Hides LLM latency.

WhisperChannel — Private messages

Player secrets, patron voices, trap notices, DM notes — all routed privately.

Quick Start

pip install dnd-context-compressor

Solo Play

from dnd_compressor import DMPipeline, TurnCollector, PlayerAction, ActionType

pipeline = DMPipeline()
pipeline.initialize_sync()

# Player acts
action = PlayerAction(
    player_name="Kael",
    player_class="Fighter",
    action_type=ActionType.ATTACK,
    action_text="I swing my greatsword at the goblin",
    target="Goblin Chief",
)

result = pipeline.process_round_sync([action])
print(result.dm_narration)
print(f"A) {result.option_a}")
print(f"B) {result.option_b}")

Multiplayer (async server)

import asyncio
from dnd_compressor import DMPipeline, TurnCollector, PlayerAction, ActionType

async def main():
    pipeline = DMPipeline()
    await pipeline.initialize(
        broadcast_fn=send_to_all_players,  # your websocket broadcast
        whisper_fn=send_private_message,   # your private msg sender
    )

    collector = pipeline.collector
    collector.start_round(["Kael", "Thalia", "Gruk"])

    # Players submit actions via websocket...
    # collector.submit_action(action)

    # Wait for all (60s timeout, auto-pass stragglers)
    actions = await collector.wait_for_all()

    # One LLM call for everything
    result = await pipeline.process_round(actions)
    # Broadcast + whispers happen automatically via callbacks

asyncio.run(main())

Secret Actions

# Rogue tries to steal something — only DM knows
secret = PlayerAction(
    player_name="Kael",
    player_class="Rogue",
    action_type=ActionType.WHISPER,
    action_text="I secretly pocket the enchanted gem",
    is_private=True,
)

# DM processes it privately, other players see:
# "Kael hesitates, examining the treasure hoard..."

Configuration

from dnd_compressor.pipeline import LLMConfig, DMPipeline

config = LLMConfig(
    base_url="http://localhost:11434",  # Ollama
    model="phi3:mini",                  # Best for Pi 4
    max_gen_tokens=300,
    temperature=0.8,
    num_ctx=1100,                       # 800 context + 300 gen
)

pipeline = DMPipeline(llm_config=config)

Supported Models (Pi 4 8GB)

Model Speed Quality Recommended
phi3:mini ~12s/call Good Default
mistral:7b-q4 ~18s/call Great For narrative-heavy
llama3:8b-q4 ~20s/call Best If you can wait
tinyllama ~5s/call Basic Speed priority

Architecture

┌─────────────────────────────────────────────────┐
│                  DMPipeline                      │
│                                                  │
│  ┌──────────┐  ┌───────────┐  ┌──────────────┐ │
│  │  Turn     │  │  Context  │  │    Pre-      │ │
│  │ Collector │→ │Compressor │→ │  Generator   │ │
│  └──────────┘  └───────────┘  └──────────────┘ │
│       ↑              │              │            │
│       │              ↓              ↓            │
│  Players        Ollama (1       A/B Options     │
│  (1-5)          call/round)     (async)         │
│       ↑              │              │            │
│       │              ↓              │            │
│  ┌──────────┐  ┌───────────┐       │            │
│  │ Whisper  │← │ Broadcast │←──────┘            │
│  │ Channel  │  │  (public) │                    │
│  └──────────┘  └───────────┘                    │
└─────────────────────────────────────────────────┘

Development

git clone https://github.com/TCMG-v1/dnd-context-compressor.git
cd dnd-context-compressor
pip install -e ".[dev]"
pytest

License

MIT — Created by TCMG-v1 · Co-created with Claude, Grok, Perplexity

About

Context compression pipeline for LLM-powered D&D on Raspberry Pi 4 — 1 LLM call per round

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages