A Person of Interest-themed web front-end for the llmem-gw AI service.
Streams LLM responses word-by-word in the Samaritan UI style with full voice I/O — speak to Samaritan and hear it speak back.
Explore the docs »
Report Bug
·
Request Feature
Table of Contents
samaritan-webfe is a Python web service that provides a browser-based AI chat client styled
after the Samaritan interface from the CBS television series Person of Interest (2011-2016).
It acts as a front-end proxy to the llmem-gw
AI agent service, streaming responses token-by-token.
The Samaritan UI (index.html) operates in two modes:
#mode default— The main voice/text chat with Samaritan's word-flash animation, terminal panel display, full-voice hands-free loop, and pluggable TTS/STT providers#mode cognitive— A real-time monitoring dashboard that polls llmem-gw's cognitive engine at 10-second resolution, displaying goals, beliefs, plans, tools, and live timer countdowns
This repo contains three independent frontend UIs served by the same samaritan.py FastAPI server. They share the same llmem-gw backend and auth cookie, but each is a self-contained single-file HTML/CSS/JS application with no shared code between them:
| Frontend | Route | Description | Docs |
|---|---|---|---|
| Samaritan Voice UI | / |
Person of Interest-themed voice interface (this README) | — |
| Chat | /chat |
Claude-style scrolling chat with markdown, LaTeX, memory display | docs/CHAT.md |
| Chat-GED | /chat-ged |
GED exam prep tutor with subject isolation, score tracking, Mermaid charts | docs/CHAT-GED.md |
If you want to use any of these separately, you would need to pull them apart — each HTML file is standalone but relies on the samaritan.py proxy for auth, API key management, and SSE/WebSocket proxying. I keep them together as a frontend portfolio demonstrating different approaches to the same backend.
- Python 3.10+
- llmem-gw running on the same host (default port 8767)
openssl(for self-signed cert generation — usually pre-installed on Linux/macOS)- At least one voice provider API key (required for FULL VOICE mode — see Configuration)
-
Clone the repo
git clone https://github.com/derezed88/samaritan-webfe.git cd samaritan-webfe -
Copy and edit the environment file
cp .env.example .env # Edit .env and set SAMARITAN_API_KEY plus at least one voice provider key — see Configuration below -
Run the start script (creates venv, installs deps, generates TLS cert, starts server)
chmod +x start.sh ./start.sh
-
Open in your browser
- Local network:
https://<your-host-ip>:8800 - Pinggy tunnel:
https://<assigned-pinggy-url>(see Remote Access)
- Local network:
The default mode is the Person of Interest-styled interface. Short responses (< 10 words) animate center-screen in the show's word-flash style — one word at a time. Longer responses use a typewriter terminal panel at the top of the screen.
Key features:
- Samaritan visual style — white radial-gradient background, ALL-CAPS monospace font, red accent triangle, scanline overlay
- Word-by-word token animation; longer responses use a typewriter terminal panel
- Full-voice hands-free loop — speak, hear the response, mic reopens automatically
- Pluggable TTS providers (Deepgram Aura, Inworld, xAI, xAI Persistent) switchable at runtime
- Deepgram STT with multiple mic modes (barge-in, speaker-safe, diarization)
- Dark/light theme via
#screen_mode dark|lightwith a 3-second crossfade - Keyword-triggered Samaritan-style cards (INITIATIVE, ASSET, TASK, THREAT)
- Tap the MIC button — the label changes to LIVE and the interface listens
- Speak your query — it auto-submits when you finish speaking
- The response streams word-by-word on screen
- When the response finishes, the mic restarts automatically for the next turn
- Tap D / I / X (TTS provider button) to cycle through Deepgram Aura, Inworld, xAI, or xAI Persistent
- Tap FULL VOICE to enable spoken responses, then tap MIC to start listening
- Speak your query — Samaritan responds in text and speaks the response aloud via AI voice
- After the audio finishes, the mic reopens automatically — the loop continues hands-free indefinitely
- Works over remote access (Pinggy tunnel) from any device with a browser and microphone
Note: Voice responses require at least one TTS provider API key in
.env(see Configuration). Tap the provider button in the control bar to cycle providers at any time without reloading.
- Tap the keyboard button to open the text input
- Type your query and press Enter or tap SEND
- The input panel closes while the response streams, then reopens ready for the next message
- If FULL VOICE is active, typed prompts also receive a spoken response
After IDLE_TIMEOUT_SEC seconds (default 300 / 5 minutes), the screen clears and returns to the blinking CONNECTION ESTABLISHED. state.
Type #mode cognitive to switch to the real-time monitoring dashboard. This mode polls llmem-gw's cognitive engine and displays the internal state of the AI agent.
Layout — 4-column display:
| Column | Width | Content |
|---|---|---|
| Left data stack | 220px | Goals, beliefs, prospective memory, plans (auto-scrolling cards) |
| Center top | flex | Dashboard with live timer table |
| Center bottom | flex | Chat log with input field |
| Right | 352px | Samaritan-style countdown timer cards |
Live data cards (refreshed every 30 seconds):
!cogn goals— Current LLM goals!cogn flags— Belief flags!plan— Current plan breakdown!cogn— Full cognition state (prospective memory)!toolstats— Tool usage and availability!memstats— Memory pool statistics
Timer cards (right column, refreshed every second):
- Parsed from the
!timerscommand - Each card shows timer name, live countdown, status, last/next run, duration, run count
- Multiple timers auto-cycle through groups of 3
Chat input:
The center-bottom panel has a live input field for interacting with the agent while monitoring. Type #mode default to return to the main Samaritan UI.
Samaritan uses a pluggable TTS provider architecture. Switch at runtime via the VOICE button in the control bar, or set the default via TTS_PROVIDER in the JS config block of index.html.
| Provider | Button | API Key | Audio Format | Notes |
|---|---|---|---|---|
| Inworld AI | I |
INWORLD_API_KEY |
Streaming NDJSON, base64 WAV chunks | Server proxy at /api/tts/inworld. Voice: Evelyn. Model: inworld-tts-1.5-mini. 44-byte RIFF header stripped per chunk. |
| xAI Realtime | X |
XAI_API_KEY |
Per-turn WebSocket | Ephemeral token minted server-side per response. Voices: Eve, Ara, Rex, Sal, Leo. Sentence-level streaming. |
All API keys are kept server-side — they are never sent to the browser.
Configurable constants (top of JS in index.html):
let TTS_PROVIDER = 'inworld'; // default: 'inworld' | 'xai'
const XAI_VOICE = 'ara'; // Eve | Ara | Rex | Sal | Leo
const INWORLD_VOICE = 'Evelyn';
const INWORLD_MODEL = 'inworld-tts-1.5-mini';Audio pipeline: All providers feed into a shared Web Audio API pipeline (scheduleAudioChunk) for gapless playback. Barge-in support stops all queued audio immediately via stopAllAudio().
| Provider | Model | API | Notes |
|---|---|---|---|
| Deepgram Flux | flux-general-en |
v2 /listen |
Default STT. Native turn detection via TurnInfo events. EOT threshold: 0.8. Do NOT send language or punctuate params. |
| Deepgram Nova-3 (Diarize) | nova-3 |
v1 /listen |
Used in DI mic mode only. diarize=true, labels each speaker as [Speaker N]: text. |
Audio capture uses an AudioWorklet (PCM16-LE) with ScriptProcessorNode fallback for iOS Safari. PCM is buffered to ~80ms chunks before sending (Flux requirement). The server-side WebSocket proxy (/api/stt-proxy) injects the Deepgram Authorization header so the API key never reaches the browser.
The rightmost button cycles through four STT modes:
| Mode | Icon | Behavior |
|---|---|---|
| Off | mic icon | Mic disabled in full-voice mode |
| Barge-in | B (red) | Incoming speech during TTS immediately stops playback — best for headphones/AirPods |
| Speaker | S (amber) | Mic stays on but transcripts are suppressed during TTS — safe for phone speaker use. 1500ms cooldown after audio ends. |
| Diarize | DI (blue/red pulse) | Deepgram nova-3 + speaker diarization; each turn prefixed with [Speaker N]: for multi-person conversations |
If you are using a phone or tablet without headphones, use S or DI mode to prevent feedback loops.
These commands are typed in the input field (keyboard mode in default, or the chat input in cognitive mode):
| Command | Mode | Effect |
|---|---|---|
#mode default |
Any | Switch to the Samaritan voice UI |
#mode cognitive |
Any | Switch to the cognitive monitoring dashboard |
#mode chat |
Default | Redirect to /chat (Chat UI) |
#mode chat-ged |
Default | Redirect to /chat-ged (GED Study UI) |
#screen_mode dark |
Default | Switch to dark theme (3s crossfade) |
#screen_mode light |
Default | Switch to light theme |
#inworld_voice <name> |
Default | Change Inworld TTS voice at runtime |
#db <name> |
Cognitive | Switch active database across all cognitive sessions |
All configuration lives in .env in the project root. A template is provided — copy it and fill in your values:
cp .env.example .env.env is listed in .gitignore and must never be committed. The variables:
| Variable | Required | Description |
|---|---|---|
SAMARITAN_API_KEY |
Yes | Access password for the web UI. Set to any strong secret string. Must not end with ! (iOS autofill strips it). |
LLMEM_GW_API_KEY |
No | Bearer token forwarded to llmem-gw. Leave blank if llmem-gw has no key set. |
LLMEM_GW_URL |
No | Base URL of the llmem-gw service. Default: http://localhost:8767. |
DEEPGRAM_API_KEY |
For STT + Deepgram TTS | Used server-side for STT WebSocket proxy and TTS streaming. Never sent to browser. console.deepgram.com |
XAI_API_KEY |
For xAI voice | Used server-side to mint ephemeral WebSocket tokens. Never sent to browser. console.x.ai |
INWORLD_API_KEY |
For Inworld voice | Base64-encoded credential from Inworld Portal (Settings > API Keys). Never sent to browser. |
Additional JS constants at the top of static/index.html:
const IDLE_TIMEOUT_SEC = 300; // seconds before idle screen
const WORD_FADE = 180; // ms opacity transition per word
const WORD_HOLD = 380; // ms each word is visible
const WORD_GAP = 60; // ms gap between words
const LONG_RESPONSE_THRESHOLD = 10; // words — responses >= this use terminal displayThis repo includes two additional chat-style frontends. Each is a self-contained single-file HTML application with its own feature set:
- Chat UI (
/chat) — Claude-style scrolling chat with markdown rendering, KaTeX math, database sidebar, model selection, Inworld TTS, and Deepgram STT - Chat-GED (
/chat-ged) — GED exam prep tutor with 5-subject isolation, score tracking, progress dashboards, Mermaid diagram rendering, and quiz analytics
See the linked docs for full feature descriptions and usage instructions.
Pinggy provides an SSH-based tunnel that terminates TLS on its end, meaning the browser sees a valid HTTPS URL (required for the Web Speech API microphone).
The app listens on HTTP port 8801 specifically for the tunnel (no TLS — pinggy handles it):
ssh -p 443 \
-R0:localhost:8801 \
-o StrictHostKeyChecking=no \
-o ServerAliveInterval=10 \
-o ServerAliveCountMax=6 \
-o TCPKeepAlive=yes \
-o ExitOnForwardFailure=yes \
-o ConnectTimeout=30 \
-t YOUR_TOKEN@pro.pinggy.io "k:4YOUR_KEY" x:httpsPinggy prints the assigned public URL on connect. Share only with trusted users —
the SAMARITAN_API_KEY auth prompt is the access gate.
| Port | Protocol | Purpose |
|---|---|---|
| 8800 | HTTPS | Local network access (self-signed cert) |
| 8801 | HTTP | Pinggy tunnel endpoint (pinggy provides TLS) |
- Cookie-based auth is enforced on every route including
/. Unauthorized clients are redirected to/login. POST /loginvalidates the password againstSAMARITAN_API_KEYand sets anHttpOnly; SameSite=laxcookie (30 days).- Auth also accepts:
Bearertoken header or?token=query param (for SSE streams). LLMEM_GW_API_KEYand all voice provider keys are server-side secrets — never exposed to the browser.- The self-signed cert on port 8800 will trigger a browser warning on first visit; accept it once.
- Cookie persists in iOS PWA (WKWebView) across launches.
I created this front end because I wanted to combine the memory capabilities of llmem-gw, allowing me to choose any mainstream or locally hosted LLM with the voice providers of my choice.
As an example, let's say you like the Grok app's ability to handle text and voice in the same chat. What's really going on in the backend is that when in text mode, Grok is using models that have a much bigger context window than when in live voice mode — in live voice mode, as of this writing the Voice Agent is used and that is backed by a model with a much smaller 32k-token context window. There are good reasons for that, and the biggest reason I can see is optimizing for voice quality: your voice and the model handling the voice are the same, or at least together, reducing API turns.
This project is therefore a workaround, with some performance hit and possible cost implications. If you want to send voice to e.g. grok-4-1-fast-reasoning (or any other model that llmem-gw supports — and that includes all mainstream models and any OpenAI-compatible or llama/ollama-hosted model), then you need to process STT (your voice speech-to-text) and TTS (the model's text response to voice), with the LLM of your choice in the middle. I first started with simple Web Speech API for input and text response only. That wasn't good enough for me, so I went with Deepgram for STT and xAI and Inworld for TTS. I don't have enough resources to locally host models to do it on my own, so cloud APIs it is for me. Of course if you want to go keyboard and text only, that works too.
The performance implication: API turns for the LLM (plus possible tool calls), for voice input, and for voice response. I am seeing about 8-10 seconds for voice input to voice response — and for the enhanced memory, I'm okay with that.
The cost implication: API and token usage for everything. Figure that out based on your perceived amount of use.
Platform note: Since the frontend is a web browser talking to a Python service, it can run from just about anywhere — macOS, iPhones with Safari, etc. The downside of browser mode is speaker-phone use: you should run voice input in the S (speaker) mic mode so that barge-in is suppressed and the speaker doesn't pick up the audio output of the response.
The project has two layers:
| Layer | File | Role |
|---|---|---|
| Browser UI | static/index.html |
Single-file HTML/CSS/JS — all rendering, SSE parsing, TTS/STT logic |
| Python proxy | samaritan.py |
FastAPI server — auth gate, API key management, stream translation |
samaritan.py exists primarily to keep API keys out of the browser. It translates between the browser's expectations and whatever backend you wire up.
| Service | Where coupled | How to swap |
|---|---|---|
| llmem-gw (LLM backend) | samaritan.py routes + index.html SSE parser |
See Swapping the LLM Backend below |
| Deepgram (STT) | samaritan.py WebSocket proxy (/api/stt-proxy), index.html AudioWorklet |
Replace proxy + browser WS client |
| xAI Realtime (TTS) | samaritan.py /api/tts/xai, index.html ttsProviders.xai |
Implement new provider object + server route |
| Inworld AI (TTS) | samaritan.py /api/tts/inworld, index.html ttsProviders.inworld |
Implement new provider object + server route |
The frontend and backend share an internal SSE contract. As long as samaritan.py emits these events, index.html needs no changes:
| Event | Payload | Meaning |
|---|---|---|
tok |
{"type":"tok","text":"..."} |
One token/word to display |
flush |
{"type":"flush","text":"..."} |
Intermediate checkpoint (tool call done, more coming); resets TTS buffer |
done |
{"type":"done"} |
Turn complete — trigger TTS and re-open mic |
error |
{"type":"error","text":"..."} |
Stream error |
To replace llmem-gw with a different LLM (OpenAI, Anthropic, Ollama, etc.), rewrite only samaritan.py:
- Submit route (
POST /api/submit) — translate{text, client_id}into your backend's request format and start a streaming response. - Stream route (
GET /api/stream/{client_id}) — parse your backend's streaming format and emittok/flush/done/errorSSE events to the browser. - Session management — llmem-gw correlates a submitted request to its SSE stream via
client_id. If your backend streams directly in the POST response body, you can simplify or eliminate the separate stream route. - Health check (
GET /api/health) — point at your backend's health endpoint.
AudioContextmust be created/unlocked during a direct user gestureinitAudioCtx()called on every tap handler; also listens forstatechangeto auto-resume after iOS notification interruptions- iOS kills the hardware mic when Safari backgrounds — on return,
visibilitychangesaves state tosessionStorageand triggerslocation.reload()to recover - State persisted across reload:
micMode,ttsProvider,fullVoiceMode,SESSION_ID - Cookie persists in iOS PWA (WKWebView) across launches
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE.md for more information.
Mark Jimenez - @properTweetment - xb12pilot@gmail.com
Project Link: https://github.com/derezed88/samaritan-webfe
- othneildrew — Best-README-Template — The structure and shield/badge conventions used in this README are based on this template.
- Samaritan UI style — inspired by the Person of Interest television series (CBS/Warner Bros., 2011-2016),
created by Jonathan Nolan. The colour scheme (white radial gradient, red
#fe2d2daccents, inverted black highlight), ALL-CAPS typography, animated triangle marker, and word-flash animation are a fan recreation for personal/educational use. No assets from the show are included. - Share Tech Mono typeface — Carrois Apostrophe, licensed under the SIL Open Font License 1.1. Served via Google Fonts.
- CSS scanline overlay technique — adapted from public domain CSS snippets widely shared in the retro/CRT aesthetic community (no single original author identified).
- Person of Interest web UI demo — phresh-it.hu/demos/poi-web-ui/ — card designs for ASSET, THREAT, and other keyword pop-ups were adapted from this demo.
- FastAPI — ASGI web framework (MIT License)
- uvicorn — ASGI server (BSD License)
- httpx — async HTTP client (BSD License)
- python-dotenv —
.envfile loader (BSD License) - Deepgram — streaming speech-to-text via WebSocket; proxied server-side to keep the API key out of the browser. Standard mode uses Flux (
flux-general-en) on the v2 API for low-latency turn detection; Diarize mode uses Nova-3 (nova-3) on the v1 API withdiarize=truefor speaker identification. - Web Speech API — browser-native speech recognition used as ScriptProcessorNode fallback for iOS Safari AudioWorklet gaps (W3C specification, implemented by browser vendors)
- Web Audio API — browser-native PCM audio scheduling for real-time TTS playback
- xAI Realtime API — WebSocket-based AI voice synthesis
- Inworld AI TTS API — Streaming AI voice synthesis
- Pinggy — SSH-based HTTPS tunnel service
- Interface design, architecture, and implementation assisted by Claude (Anthropic, claude-opus-4-6).
