Storyteller tells you a story — and pulls you into it. The narrator listens, asks what you do, weaves your answers and the world-facts you drop along the way back into the tale — while a planner LLM keeps a multi-act arc intact behind the scenes. You're not just answering prompts; you're co-building a world.
Worlds can be picked, generated from a one-page brief (1–3 min — full setting, characters, items, glossary, blueprint, random tables), and replayed with structurally different arcs thanks to multi- blueprint worlds.
Two ways to run it — same engine, same worlds, same save-state:
- As an appliance — a Raspberry Pi 4 with a ReSpeaker mic and LED ring, boots into voice mode, "Hey Jarvis" to wake. The original prototype hid the Pi inside a hollow book at a kid's birthday party; that's still the design intent.
- As a self-hosted web service — run the backend on any machine
(a laptop, a NAS, a Mini-PC), and kids reach it in any browser at
https://your-host/. Text mode or tap-to-talk voice, no Pi or mic-array needed.
Plus a PC text REPL (CLI) for prompt-engineering / debugging the same engine without the audio layer.
Under the hood: a LangGraph story engine with a planner, an anti- spoiler curator gate, and per-session RAG over world content + player- introduced facts. The engine talks to any OpenAI-compatible endpoint — the OpenAI API by default, or self-hosted backends like Ollama / vLLM / llama.cpp (LLM + embeddings), faster-whisper (STT), and Piper (Wyoming/TCP) or XTTS v2 for TTS. Endpoints are configurable per role (story / planner / gen / gate / STT / TTS / embeddings), so you can run fully local, fully cloud, or mix and match. A daily cost cap plus per-world / per-model spend rollup prevents a forgotten cloud key from running into a surprise bill, and endpoint health monitoring lets the appliance differentiate "internet is down" from "my local server is off" from "API key is wrong" — surfaced both via spoken prompt and in the admin UI. Localized for German and English.
Want to go fully local with no API key at all? The repo ships a
turn-key Windows + NVIDIA-GPU server stack in
local_llm_servers_win/ (Ollama + Faster-Whisper
- XTTS v2). Point your Storyteller client (Pi or PC) at it over the LAN — guide in docs/SETUP_LOCAL_AI_SERVER.md. Requires a capable NVIDIA GPU with enough VRAM (≥ 24 GB recommended).
➡ Architecture, decisions & roadmap: PLAN.md ➡ Conventions for working on the codebase: AGENTS.md
This is a uv workspace.
One .venv at the root, one uv.lock, every member built together.
packages/
core/ storyteller_core story engine (LangGraph), worlds, config, persistence
voice/ storyteller_voice STT/TTS/FX (API-coupled, no hardware I/O)
hardware/ storyteller_hardware audio backends, LEDs, ReSpeaker, voice menu, net
apps/
cli/ storyteller_cli PC text REPL (chat / info / worlds / seed / history)
pi/ storyteller_pi Pi voice loop (ReSpeaker + wake word + LEDs)
web-ui/ backend/ + frontend/ Player-facing web app (FastAPI + SvelteKit)
web-admin/ backend/ + frontend/ Admin web app (FastAPI + SvelteKit)
Python: always uv, never pip. Web frontends: always yarn 4.x,
never npm/npx/pnpm. See AGENTS.md.
uv sync # install everything
uv run --package storyteller-cli storyteller-cli seed # write seed worlds (once)
uv run --package storyteller-cli storyteller-cli chat --world sternenfahrtThe CLI uses the same LangGraph engine as the web/Pi apps; per-session state
is checkpointed in data/checkpoints.db. Use --new to start a fresh branch,
/undo mid-chat to roll back one turn, /state to inspect.
Each web app is a FastAPI backend that also serves its SvelteKit UI as a static SPA — one process, one port, no Node at runtime. Build the frontends once, then run the backend:
bash scripts/build_frontends.sh # Node 20 + yarn 4; writes apps/*/frontend/build/
uv run --package storyteller-web-ui-backend storyteller-web-ui # player -> :8090
uv run --package storyteller-web-admin-backend storyteller-web-admin # admin -> :8080Open the player UI at http://localhost:8090 and the admin UI at
http://localhost:8080. The SPA talks to the backend on its own origin, so
the same build works via localhost, the Pi's IP, or a hostname.
- Player: pick a world, play by text, or
/voicefor tap-to-talk — click or spacebar starts the recording, the next click/spacebar stops and sends (browserMediaRecorder→ WS/ws/voice/{thread_id}→ server STT/TTS). - Admin: structured world editor (core fields, tone sliders, blueprint, content lists with ✨ per-piece LLM suggest, random tables), world generation (async job + polling), RAG reindex, transcripts viewer, and settings — per-role models + temperatures and per-purpose custom OpenAI-compatible endpoints (self-hosted LLM/STT/TTS/embeddings), audio backend, moderation thresholds. Full walkthrough: docs/ADMIN_GUIDE.md.
Frontend development (hot reload, talks to the running backend via
VITE_BACKEND): cd apps/web-ui/frontend && yarn dev (port 5173;
admin: 5174). Rebuild for production with scripts/build_frontends.sh.
If you don't want to install Python, Node and Caddy on the host — the player UI + admin UI also ship as a Docker stack:
cd docker/
cp .env.example .env # fill OPENAI_API_KEY + admin token
docker compose up --buildPlayer UI at https://localhost/, admin UI at https://localhost:8443/.
Persistence is a bind-mount on ./data/ (created empty on first run;
generate worlds via the admin UI). x86_64-only, single image reused
for both services. Full walkthrough: docker/README.md.
The Pi voice-loop, CLI and local-AI server stacks are out of scope
for Docker — they're host-coupled (audio devices, GPIO, GPU). The web
containers can co-host with a Pi on the same machine by bind-mounting
the Pi's data/ directory; see the Docker README.
Browsers gate navigator.mediaDevices behind HTTPS or localhost — so
opening the player web-ui's voice page from a phone over plain LAN
HTTP fails with a TypeError. scripts/install_https.sh sets up a
local CA (via mkcert) plus Caddy as a TLS reverse proxy in front of
both backends:
bash scripts/install_https.sh # one-shot, idempotentAfter installing the generated /etc/storyteller/mkcert/rootCA.pem on
each remote device once (per-device steps in
docs/SETUP_HTTPS.md), you get:
https://story.local/→ player text/voice//createhttps://story.local:8443/→ adminhttp://story.local/→ 301 → https
The plain-HTTP listeners on 8080/8090 stay bound (so existing curl / smoke / tunnel workflows aren't disturbed).
The same gameplay features are available in all three player-facing
entry points — storyteller-pi run (voice), storyteller-cli chat
(text REPL) and the player web UI at / + /voice:
All gameplay commands are localised — examples below show English first, with the German equivalent in parentheses where the user- facing label differs.
| Feature | Pi voice | CLI | Web UI |
|---|---|---|---|
| Existing-world selection | ✓ voice menu | ✓ numbered picker | ✓ dropdown |
| World generation from a player brief | ✓ voice interview + "Generate" (de: "Generieren") | ✓ /create <prompt> |
✓ /create page (text) |
| Manage worlds (copy / rename / delete) | ✓ voice ("manage" → copy / rename / delete + yes/no; de: "verwalten" → kopieren / umbenennen / löschen + ja/nein) | — | ✓ buttons in worlds list (Admin) |
| Multi-blueprint variants (replay value: same world, structurally different arc) | engine + planner pick per new substory | engine + planner pick per new substory | ✓ sub-tabs in Tone & arc / Ton & Bogen tab (Admin) |
| Player-introduced facts → RAG | ✓ "Note: …" (de: "Vermerken: …") | ✓ /note <text> |
✓ "+ Note" / "+ Notiz" button (text + voice pages) |
| Repeat last narration (TTS only, no LLM call) | ✓ "Repeat" / "Wiederhole" | scroll up | ✓ "Repeat" / "Wiederhole" in voice page (STT-matched) |
| End story → back to world picker | ✓ voice command "End story" / "Geschichte beenden" | ✓ /end |
✓ "End story" / "Geschichte beenden" button |
| Daily cost cap pause + player message | ✓ daily_cap_pause prompt |
✓ rich-text cap notice | ✓ red banner + WS daily_cap_exceeded |
| Wait-sound under LLM thinking | ✓ per-world / generic ambient | — (text) | ✓ voice page loops generic_waiting.wav |
| Barge-in / interrupt | ✓ GPIO button long-press | ✓ Ctrl-C | ✓ ⏹ button (voice page) |
storyteller-pi run is the full voice appliance. Boot flow:
- Short spoken greeting ("Hello, I'm your storyteller. When you're
ready, wake me with Hey Jarvis." / DE: "Hallo, ich bin dein
Erzähler. Wenn du bereit bist, weck mich mit Hey Jarvis.") —
toggleable in the system menu (
intro). - Idle wait for the "Hey Jarvis" wake word — the Pi stays silent until you call it.
- On wake-word: "Would you like to get started?" — yes continues, no / silent / unclear (after one re-ask) goes back to idle.
- "Existing world or new one?" — branches into either:
- the voice world-menu (free-form phrasing, LLM-classified) → load the picked world, or
- a guided voice-mode world design: a short Q&A interview drives generate_world live on-device. The interview obeys the same idle / silence contract as the story loop (silence → "say Hey Jarvis" + replay last question on wake) and accepts short cancel commands ("cancel / stop / end" — DE: "abbrechen / stopp / beenden"). Player ends the interview by saying "Generate" (DE: "Generieren"); a neutral wait-sound covers the 1–3 min generation, then the new world is saved and started.
- Once a world is picked, before the first narration the in-story
command briefing plays (Note / Repeat / Menu / End story / Shutdown
— DE: Vermerken / Wiederhole / Menü / Geschichte beenden / Schluss
— plus the wake-hint). Toggleable in the system menu
(
commands info).
In-session: wake word with a follow-up window, speech capture that ends
on a pause, a wait-sound loop under TTS, and a spoken system menu
(save / end story / shutdown / undo / reset world / audio / intro /
commands info / close). Voice commands during play: "Note" (DE:
"Vermerken") to add a player-introduced world fact (RAG-indexed
live), "Repeat / Say that again" (DE: "Wiederhole / Sag das
nochmal") to re-play the last narration (TTS only, no LLM call),
"End story" (DE: "Geschichte beenden") to save + return to the
wake-word idle for the next world, "Shutdown / Power off" (DE:
"Schluss / Ausschalten") to power the device off. Per-world session
state auto-resumes across restarts (LangGraph checkpointer, thread_id = pi-<world>); --new starts a fresh branch.
Resuming a saved world plays a short spoken recap of where you are.
Barge-in — the narrator can be interrupted any time:
- Pi — an optional GPIO push-button (see docs/SETUP_PI.md);
off by default, configurable per-button via
[hardware]inconfig.toml. - Web — a Stopp button on the voice page pauses playback.
- CLI —
Ctrl+Cduring a turn aborts that turn.
When interrupted, the system stops, listens, and decides what you want (system menu vs. a new story input).
sudo bash scripts/setup_system.sh # udev rule (LED ring & DSP tuning), once
bash scripts/install_wakeword.sh # openWakeWord (re-run after every `uv sync`)
uv run --package storyteller-pi storyteller-pi run # full voice loop
uv run --package storyteller-pi storyteller-pi run --text # keyboard, no mic
uv run --package storyteller-pi storyteller-pi run --ptt # push-to-talk (no wake word)storyteller-pi netcheck opens the Wi-Fi captive portal if offline at boot.
See docs/SETUP_PI.md for the full Pi setup.
Three long-running services (unit files in scripts/):
| Service | Command | Purpose |
|---|---|---|
storyteller.service |
storyteller-pi run |
Pi voice loop |
storyteller-admin.service |
storyteller-web-admin |
admin backend + UI, :8080 |
storyteller-web-ui.service |
storyteller-web-ui |
player backend + UI, :8090 |
storyteller-netcheck.service |
storyteller-pi netcheck |
Wi-Fi onboarding at boot |
uv sync # workspace venv
bash scripts/install_wakeword.sh # re-run: uv sync prunes the wake-word stack
bash scripts/build_frontends.sh # build both SPAs
sudo bash scripts/install_services.sh # install + enable the long-running units
sudo bash scripts/install_netcheck.sh # optional: Wi-Fi onboarding serviceAfter pulling changes: re-run uv sync + install_wakeword.sh (Python),
build_frontends.sh (web), then sudo systemctl restart storyteller storyteller-admin storyteller-web-ui.
The narrator actively involves the player (free speech, no menus), follows a macro arc and a dynamically planned substory. Substory resolved → the architect (planner LLM) plans the next one in the same turn (LangGraph replan node). An abstract story dynamic (new antagonist, unforeseen event …) spices planning and play without derailing the arc.
Per-session state — memory, substory plan, known facts, character status,
synopsis, cost — lives in the LangGraph SqliteSaver at
data/checkpoints.db, keyed by thread_id. Branching ("what if I had
done X?") works natively via engine.history() / engine.rewind_to(cp_id).
Pre-narrator phase (moderation, RAG retrieval, substory ensure, dynamics
roll) fans out in parallel — wall-clock latency = max() instead of sum.
Moderation is skipped for trivially short benign inputs (Ja, Vielen Dank) so those turns don't pay the OpenAI round-trip. TTS chunks are
fetched in parallel and played in a streaming pipeline — the player
starts speaking the first chunk while later ones are still rendering, so
most of the TTS latency happens behind the audio rather than before it.
Anti-spoiler narration gate — a small per-turn LLM call (gate_llm,
defaults to the same endpoint as the planner) curates which authored
reveals (fragments, history, substory resolution, future beats) the
narrator may weave in this turn. Player-driven improvisation and
spontaneous new facts stay free — only the pre-written plot points are
gated. Toggle: [story] narration_gate_enabled in config.toml. See
docs/ADMIN_GUIDE.md.
Storymodus — soft plot-pressure — instead of a hard "with plot vs.
free play" toggle the engine carries a continuous plot-pressure
value (0–1) that gradually fades the planner, curator, beat-nudges
and substory-tools in and out as the player's behaviour drifts.
A tiny per-turn heuristic (tools fired + lexical match against the
arc + explicit phrases like "lass uns einfach…" or "was war
nochmal die Mission?") drives an EMA-smoothed target; an optional
cheap planner-LLM tiebreaker handles the ambiguous zone. The active
substory isn't discarded when pressure drops — it's parked as
dormant_substory so the next pressure spike can pick it up
exactly where it was. An admin-pinnable Storymodus setting
(auto | planner | frei) overrides the heuristic when needed;
on the Pi the sysmenu has a "Storymodus" entry and the bare phrase
"Storymodus frei / Plan / Auto" works mid-story too. The admin
transcripts viewer surfaces every pressure decision as a
[pressure] marker line so operators can see what the heuristic
chose and why.
Per-session cost cap (graceful wrap-up). Follow-up questions get short answers and don't advance beats.
config [general] locale = de | en (per-run via --locale). Controls
narration language, voice-prompt audio, menu keywords, STT language and
world content. German prompts are kept verbatim; English equivalents
added. Worlds exist in both languages (data/worlds/<id>.json for de,
<id>.en.json for en) with isolated RAG.
Every player input is sent to the OpenAI moderation model
(omni-moderation-latest) before the narrator answers; on a threshold
hit the turn is politely refused. Thresholds are editable in the admin UI
(/settings). Played stories are recorded as transcripts (player input,
moderation result, every LLM tool call + result, narrator replies) and
viewable in the admin UI (/transcripts).
I had my "bam, AI is real" moment in 2023. So I built a python storyteller bot on a Raspberry Pi with a ReSpeaker mic, hid it inside a dummy book, and let it loose on my son's Harry-Potter themed birthday party. The "magical book" played a ghost trapped inside, dropping plot hooks and silly side-quests for the kids all afternoon. Under the hood: gpt-3.5-turbo with tool calls. The kids were the experience; the model was just the engine.
A year later it was a Star-Wars-themed birthday, and the same hardware became a "rebel field radio" the kids could call into to talk to "headquarters" and pick up mission updates. Same north star: the experience of the kids comes first; the tech disappears.
In 2026 I rewrote the whole thing from scratch with Claude Opus 4.7 as my pair-programmer. The new system is what you see in this repo: a proper LangGraph engine, multi-step world generation, per-role LLM endpoints, a structured admin UI, voice + text + Pi-as- appliance. At my side throughout: my son Justus (10) — tireless tester, ruthless feedback-giver, "feature wisher" and "world builder". Many of the rough edges that got smoothed out here got caught because Justus said "das ist doof" at exactly the right moment.
- ARCHITECTURE.md — how the system is built (engine, apps, data)
- AGENTS.md — monorepo conventions (uv, yarn, package boundaries)
- PLAN.md — what's still open
- docs/SETUP_PI.md — Raspberry Pi + ReSpeaker
- docs/SETUP_PC.md — PC setup (no Pi/no ReSpeaker)
- docker/README.md — self-host the web services in Docker (admin + player + Caddy, x86_64)
- docs/SETUP_LOCAL_AI_SERVER.md — optional fully-local AI backends (Ollama + Whisper + XTTS) on a Windows + NVIDIA GPU host
- docs/USER_GUIDE.md — how to play
- docs/ADMIN_GUIDE.md — admin frontend (worlds, generation, transcripts, model/endpoint settings)
- THIRD_PARTY_NOTICES.md — dependency licenses
This project's own source is MIT (LICENSE). All required Python dependencies are permissive (MIT/BSD/Apache/MPL). Notable caveats — see THIRD_PARTY_NOTICES.md:
pedalboardis GPLv3 — only the optionalaudiofxextra (voice reverb), dynamically imported with a pass-through fallback. Without it the project stays MIT-clean; enabling it for distribution triggers GPLv3 for that combined work.- openWakeWord code is Apache-2.0, but its default pretrained models (e.g. "hey jarvis") are CC-BY-NC-SA 4.0 (non-commercial). Models are downloaded at install, not shipped; for commercial use train/replace them.
- Vendored ReSpeaker drivers (
storyteller_hardware/pixel_ring_v2.py,tuning.py) are from Seeed Studio under Apache-2.0, retained with attribution (not relicensed).
