Local-first transcription pipeline that copies recordings off a Sony ICD-UX570 (or any audio source), transcribes them with Whisper into a searchable SQLite/FTS5 archive, and adds opt-in, per-call Claude/Ollama enrichment plus an MCP server so Claude can search your archive directly.
- What: Plug in a UX570, run
whisperlog watch, and recordings flow into a local archive — copied off the device, transcribed, optionally summarized, and indexed for full-text search. Audio never leaves your machine. - How:
ingestcopies/dedupes files →transcriberuns faster-whisper → text is indexed in SQLite FTS5 → optionalenrich/agentpasses summarize via local Ollama or Claude → an MCP server exposes the archive to Claude Desktop/Code. - Stack: Python 3.11+,
faster-whisper,typerCLI,pydantic-settings, SQLite FTS5; optionalanthropic+keyring,pyyaml, andmcpextras. Managed withuv. - Run:
uv pip install -e .thenwhisperlog watch(local mode, no network calls). Add Claude withuv pip install -e '.[cloud]'andwhisperlog config set-key anthropic.
Plug in a UX570, run whisperlog watch, and your recordings flow into a local archive: copied off the device, transcribed with Whisper, optionally summarized with a local LLM, and indexed for full-text search. Claude features (summaries, action items, multi-step agent workflows, MCP) are opt-in and per-call — never silently triggered.
The pipeline is generic — the UX570 is just the default. Pass --source <dir> to ingest or watch to point it at any USB drive, network share, or local folder. If the Sony REC_FILE/FOLDER01..05 layout is detected the device convention is used; otherwise the directory is walked recursively for any common audio format (.mp3 .wav .m4a .mp4 .flac .ogg .aac .opus .webm).
There are two modes. The default never makes a network call.
| Mode | Audio | Transcript | Summarization |
|---|---|---|---|
| Local (default) | stays on disk | stays on disk | local Ollama |
| Cloud (opt-in) | stays on disk | sent to Claude API or Claude CLI | Anthropic |
- Audio never leaves your machine, in either mode. Even agents see only the transcript text.
- API keys are stored in the OS keychain (Keychain on macOS, Secret Service on Linux) via
keyring, never in.env, never logged. - Audit log at
~/.whisperlog/audit.logrecords every cloud call: timestamp, backend, model, transcript SHA-256, first 80 characters, token counts, cost. The transcript content itself is not written. - Fully offline mode is real. With
DEFAULT_ENRICH_BACKEND=ollamaand--backend ollama, theanthropicSDK is lazily imported and never loaded.
- Language: Python 3.11+ (
requires-python = ">=3.11"), managed withuv. - Core deps:
faster-whisper(transcription),typer+rich(CLI),pydantic/pydantic-settings(config),httpx,platformdirs. - Storage: SQLite with FTS5 full-text search at
~/.whisperlog/index.db; archive files under~/Documents/whisperlog-archive/. - Optional extras:
cloud(anthropic,keyring),agent(addspyyaml),mcp(mcp),all, anddev(pytest,ruff,mypy). - Entry points:
whisperlog(CLI) andwhisperlog-mcp(stdio MCP server).
- Python 3.11+ and
uv(or pip) - ffmpeg — for audio decoding (
brew install ffmpeg/apt install ffmpeg)
Optional, depending on which features you want:
- Ollama — local LLM for summarization.
brew install ollama, thenollama serveandollama pull qwen2.5:7b. claudeCLI — Claude Code CLI, for theclaude-clienrichment backend and in-repo agent proposals.- Anthropic API key — for the
claude-apibackend and most agent workflows.
git clone https://github.com/BaruchEric/whisperlog.git
cd whisperlog
uv pip install -e .
# Copy the env template and tweak as needed.
cp .env.example .env
# Plug in the UX570, then:
whisperlog ingest # copy new recordings to the archive
whisperlog transcribe 'archive/**/audio.mp3'
# Or, in one shot:
whisperlog watch # daemon: ingest+transcribe whenever the device is plugged in
whisperlog watch --enrich # also runs the default Ollama summarizer
# Any other audio source — a folder, USB drive, network share, etc.:
whisperlog ingest --source ~/Dropbox/voice-memos
whisperlog watch --source ~/Dropbox/voice-memos --onceSearch later:
whisperlog search "sarah AND project"The archive lives at ~/Documents/whisperlog-archive/<YYYY>/<YYYY-MM-DD>/<HH-MM>_<sha8>/ and contains:
audio.mp3— the original file (read-only)transcript.txt— plain texttranscript.srt— subtitlestranscript.md— metadata + transcript + appended enrichments
whisperlog --help for the full list; key subcommands:
| Command | What it does |
|---|---|
ingest [--source DIR] [--once] |
Copy + dedupe recordings off the device (or any source) into the archive. |
transcribe '<glob>' |
Run faster-whisper over matching audio, writing transcript.{txt,srt,md}. |
watch [--source DIR] [--enrich] [--once] |
Daemon that ingests + transcribes (and optionally enriches) on plug-in. |
search '<FTS5 query>' |
Full-text search across all transcripts. |
enrich <path> [--backend ...] [--task ...] [--model ...] [--yes] |
Run one enrichment pass and append to the transcript .md. |
redact <path> [--ollama] [--out ...] |
Strip PII via regex (+ optional local Ollama pass) before sending anything out. |
stats [--days N] |
Show cloud spend over a window. |
config set-key anthropic / config show |
Store the Anthropic key in the OS keychain / print effective config. |
agent meeting-debrief <path> |
Notes + action items + follow-up email + .ics, from one transcript. |
agent code-review <path> [--repo ...] |
Extract decisions/questions, optionally propose code changes via Claude CLI. |
agent custom <path> --workflow file.yaml |
Run a YAML-defined sequence of prompts. |
version |
Print the version. |
Claude features are an extra, paid layer. Install the extras and store your key:
uv pip install -e '.[cloud]'
whisperlog config set-key anthropic # prompts; key goes to OS keychainThen pick a backend per call:
whisperlog enrich archive/2026/2026-04-27/14-30_abcd1234/ \
--backend claude-api \
--task meeting_notesDefaults use claude-sonnet-4-6 (cheaper/faster). Agent workflows default to claude-opus-4-7 (higher quality, ~5× the cost). The daily cap (MAX_DAILY_CLAUDE_USD, default $5.00) is enforced before any call is sent — hitting it raises an error, with no silent failover. whisperlog stats --days 30 shows your spend.
Regex catches email/phone/SSN/credit-card patterns; a local Ollama pass also strips names and addresses (better recall, still imperfect):
whisperlog redact archive/.../transcript.txt --ollama --out cleaned.txt
whisperlog enrich cleaned.txt --backend claude-api --task summarizeRegex alone is not real DLP. Run with --ollama for anything sensitive.
All settings load from .env (see .env.example) via pydantic-settings; CLI flags override per call. Notable keys:
| Key | Default | Purpose |
|---|---|---|
WHISPERLOG_ARCHIVE_DIR |
~/Documents/whisperlog-archive |
Where the archive lives. |
WHISPERLOG_STATE_DIR |
~/.whisperlog |
DB, audit log, enrich log. |
WHISPER_MODEL |
small.en |
Whisper model size (see guide below). |
WHISPER_DEVICE / WHISPER_COMPUTE_TYPE |
auto / int8 |
float16 only on Nvidia GPUs. |
DEFAULT_ENRICH_BACKEND |
ollama |
ollama | claude-api | claude-cli. |
OLLAMA_HOST / OLLAMA_MODEL |
http://localhost:11434 / qwen2.5:7b |
Local LLM. |
CLAUDE_MODEL / CLAUDE_AGENT_MODEL |
claude-sonnet-4-6 / claude-opus-4-7 |
Enrich vs. agent default models. |
MAX_DAILY_CLAUDE_USD / COST_CONFIRM_USD |
5.00 / 0.10 |
Daily cap and per-call confirmation threshold. |
The Anthropic key is never read from .env — it lives in the OS keychain.
| Model | Disk | RAM | Speed (M-series) | Speed (CPU) | Quality |
|---|---|---|---|---|---|
tiny.en |
75 MB | ~1 GB | ~30× real-time | ~10× | rough |
base.en |
142 MB | ~1 GB | ~16× | ~5× | usable |
small.en |
466 MB | ~2 GB | ~6× | ~2× | good default |
medium.en |
1.5 GB | ~5 GB | ~2× | <1× | great on M-series |
large-v3 |
2.9 GB | ~10 GB | ~1× | painful | best |
Prompts. Drop a file in prompts/<task>.md using {{transcript}} as the placeholder. The CLI auto-discovers it: whisperlog enrich /path/to/transcript.txt --task my-custom-task. Bundled prompts include summarize, meeting_notes, action_items, followup_email, calendar_ics, coding_session, and identify_speakers.
Agent workflows. A workflow is a YAML file with a steps list; each step has a name, an output filename, and either a template (referencing prompts/<template>.md) or an inline prompt:
steps:
- name: tldr
template: summarize
output: tldr.md
- name: facts-only
output: facts.md
prompt: |
Extract only verifiable factual claims from the transcript below.
Output a numbered list. Skip opinions and intentions.
---
{{transcript}}Run it: whisperlog agent custom archive/.../transcript.txt --workflow my_workflow.yaml.
Expose your transcript archive to Claude Desktop or Claude Code:
uv pip install -e '.[mcp]'Add this to your MCP config:
{
"mcpServers": {
"whisperlog": { "command": "whisperlog-mcp" }
}
}Tools the server provides:
search_transcripts(query, limit)— FTS5 searchget_transcript(recording_id)— full textlist_recent(limit)— recent recordingslist_enrichments(recording_id)— past summaries/agent outputs
Now you can ask Claude "What did I talk about with Sarah last week?" and it queries this archive directly.
src/whisperlog/ holds the package:
ingest.py— copy/dedupe recordings into the archive (safe_copy, source detection).transcribe.py— faster-whisper runner producing.txt/.srt/.md.archive.py/db.py— SQLite FTS5 index, recording records, enrichment records, search.enrich/— pluggable backends (base.py,ollama.py,claude_api.py,claude_cli.py) + prompt loader and audit logging.agent.py— multi-step workflows (meeting_debrief,code_review,custom_workflow).mcp_server.py— stdio MCP server (whisperlog-mcp).config.py,secrets.py,ledger.py,redact.py,watch.py,utils.py,cli.py— settings, keychain, spend cap, PII redaction, the watch daemon, helpers, and the Typer entry point.
State lives in three places: the archive (~/Documents/whisperlog-archive/), ~/.whisperlog/ (DB, audit log, enrich log), and OS keychain entries under whisperlog.
- Device not detected. On macOS, look for
/Volumes/IC RECORDER.diskutil listshows whether the OS sees the recorder. - Ollama call fails. Confirm
ollama serveis running andollama listshows theOLLAMA_MODEL. There's no silent fallback — intentional. - "No Anthropic API key in OS keychain." Run
whisperlog config set-key anthropic; re-run to rotate. - Cost-cap error mid-day. Raise
MAX_DAILY_CLAUDE_USDor wait — the cap is per local calendar day.
uv pip install -e '.[dev,all]'
ruff check .
pytestTests live in tests/ (redaction, ingest dedup, transcribe outputs, enrich backends, spend cap, archive search, SRT utils) and use a tmp-dir-isolated SQLite DB — they never touch your real archive or keychain.
- The core pipeline (ingest → transcribe → search), local Ollama enrichment, Claude API/CLI enrichment, agent workflows, redaction, spend cap, and the MCP server are all implemented and covered by the test suite.
src/ux570_transcribe/is a stale artifact from the project rename (commitd5071ea,ux570-transcribe → whisperlog). It contains only__pycache__,.DS_Store, and an emptyenrich/subfolder — no live code. The wheel build target issrc/whisperlog, so it does not ship. Safe to delete.
MIT © Eric Baruch