whisperlog

Local-first transcription pipeline that copies recordings off a Sony ICD-UX570 (or any audio source), transcribes them with Whisper into a searchable SQLite/FTS5 archive, and adds opt-in, per-call Claude/Ollama enrichment plus an MCP server so Claude can search your archive directly.

TL;DR

What: Plug in a UX570, run whisperlog watch, and recordings flow into a local archive — copied off the device, transcribed, optionally summarized, and indexed for full-text search. Audio never leaves your machine.
How: ingest copies/dedupes files → transcribe runs faster-whisper → text is indexed in SQLite FTS5 → optional enrich/agent passes summarize via local Ollama or Claude → an MCP server exposes the archive to Claude Desktop/Code.
Stack: Python 3.11+, faster-whisper, typer CLI, pydantic-settings, SQLite FTS5; optional anthropic + keyring, pyyaml, and mcp extras. Managed with uv.
Run: uv pip install -e . then whisperlog watch (local mode, no network calls). Add Claude with uv pip install -e '.[cloud]' and whisperlog config set-key anthropic.

What this is

Plug in a UX570, run whisperlog watch, and your recordings flow into a local archive: copied off the device, transcribed with Whisper, optionally summarized with a local LLM, and indexed for full-text search. Claude features (summaries, action items, multi-step agent workflows, MCP) are opt-in and per-call — never silently triggered.

The pipeline is generic — the UX570 is just the default. Pass --source <dir> to ingest or watch to point it at any USB drive, network share, or local folder. If the Sony REC_FILE/FOLDER01..05 layout is detected the device convention is used; otherwise the directory is walked recursively for any common audio format (.mp3 .wav .m4a .mp4 .flac .ogg .aac .opus .webm).

Privacy model

There are two modes. The default never makes a network call.

Mode	Audio	Transcript	Summarization
Local (default)	stays on disk	stays on disk	local Ollama
Cloud (opt-in)	stays on disk	sent to Claude API or Claude CLI	Anthropic

Audio never leaves your machine, in either mode. Even agents see only the transcript text.
API keys are stored in the OS keychain (Keychain on macOS, Secret Service on Linux) via keyring, never in .env, never logged.
Audit log at ~/.whisperlog/audit.log records every cloud call: timestamp, backend, model, transcript SHA-256, first 80 characters, token counts, cost. The transcript content itself is not written.
Fully offline mode is real. With DEFAULT_ENRICH_BACKEND=ollama and --backend ollama, the anthropic SDK is lazily imported and never loaded.

Tech stack

Language: Python 3.11+ (requires-python = ">=3.11"), managed with uv.
Core deps: faster-whisper (transcription), typer + rich (CLI), pydantic / pydantic-settings (config), httpx, platformdirs.
Storage: SQLite with FTS5 full-text search at ~/.whisperlog/index.db; archive files under ~/Documents/whisperlog-archive/.
Optional extras: cloud (anthropic, keyring), agent (adds pyyaml), mcp (mcp), all, and dev (pytest, ruff, mypy).
Entry points: whisperlog (CLI) and whisperlog-mcp (stdio MCP server).

Prerequisites

Python 3.11+ and uv (or pip)
ffmpeg — for audio decoding (brew install ffmpeg / apt install ffmpeg)

Optional, depending on which features you want:

Ollama — local LLM for summarization. brew install ollama, then ollama serve and ollama pull qwen2.5:7b.
claude CLI — Claude Code CLI, for the claude-cli enrichment backend and in-repo agent proposals.
Anthropic API key — for the claude-api backend and most agent workflows.

Quick start (local mode)

git clone https://github.com/BaruchEric/whisperlog.git
cd whisperlog
uv pip install -e .

# Copy the env template and tweak as needed.
cp .env.example .env

# Plug in the UX570, then:
whisperlog ingest         # copy new recordings to the archive
whisperlog transcribe 'archive/**/audio.mp3'

# Or, in one shot:
whisperlog watch          # daemon: ingest+transcribe whenever the device is plugged in
whisperlog watch --enrich # also runs the default Ollama summarizer

# Any other audio source — a folder, USB drive, network share, etc.:
whisperlog ingest --source ~/Dropbox/voice-memos
whisperlog watch  --source ~/Dropbox/voice-memos --once

Search later:

whisperlog search "sarah AND project"

The archive lives at ~/Documents/whisperlog-archive/<YYYY>/<YYYY-MM-DD>/<HH-MM>_<sha8>/ and contains:

audio.mp3 — the original file (read-only)
transcript.txt — plain text
transcript.srt — subtitles
transcript.md — metadata + transcript + appended enrichments

Commands

whisperlog --help for the full list; key subcommands:

Command	What it does
`ingest [--source DIR] [--once]`	Copy + dedupe recordings off the device (or any source) into the archive.
`transcribe '<glob>'`	Run faster-whisper over matching audio, writing `transcript.{txt,srt,md}`.
`watch [--source DIR] [--enrich] [--once]`	Daemon that ingests + transcribes (and optionally enriches) on plug-in.
`search '<FTS5 query>'`	Full-text search across all transcripts.
`enrich <path> [--backend ...] [--task ...] [--model ...] [--yes]`	Run one enrichment pass and append to the transcript `.md`.
`redact <path> [--ollama] [--out ...]`	Strip PII via regex (+ optional local Ollama pass) before sending anything out.
`stats [--days N]`	Show cloud spend over a window.
`config set-key anthropic` / `config show`	Store the Anthropic key in the OS keychain / print effective config.
`agent meeting-debrief <path>`	Notes + action items + follow-up email + `.ics`, from one transcript.
`agent code-review <path> [--repo ...]`	Extract decisions/questions, optionally propose code changes via Claude CLI.
`agent custom <path> --workflow file.yaml`	Run a YAML-defined sequence of prompts.
`version`	Print the version.

Adding Claude

Claude features are an extra, paid layer. Install the extras and store your key:

uv pip install -e '.[cloud]'
whisperlog config set-key anthropic     # prompts; key goes to OS keychain

Then pick a backend per call:

whisperlog enrich archive/2026/2026-04-27/14-30_abcd1234/ \
  --backend claude-api \
  --task meeting_notes

Cost expectations

Defaults use claude-sonnet-4-6 (cheaper/faster). Agent workflows default to claude-opus-4-7 (higher quality, ~5× the cost). The daily cap (MAX_DAILY_CLAUDE_USD, default $5.00) is enforced before any call is sent — hitting it raises an error, with no silent failover. whisperlog stats --days 30 shows your spend.

Redact before sending

Regex catches email/phone/SSN/credit-card patterns; a local Ollama pass also strips names and addresses (better recall, still imperfect):

whisperlog redact archive/.../transcript.txt --ollama --out cleaned.txt
whisperlog enrich cleaned.txt --backend claude-api --task summarize

Regex alone is not real DLP. Run with --ollama for anything sensitive.

Configuration

All settings load from .env (see .env.example) via pydantic-settings; CLI flags override per call. Notable keys:

Key	Default	Purpose
`WHISPERLOG_ARCHIVE_DIR`	`~/Documents/whisperlog-archive`	Where the archive lives.
`WHISPERLOG_STATE_DIR`	`~/.whisperlog`	DB, audit log, enrich log.
`WHISPER_MODEL`	`small.en`	Whisper model size (see guide below).
`WHISPER_DEVICE` / `WHISPER_COMPUTE_TYPE`	`auto` / `int8`	`float16` only on Nvidia GPUs.
`DEFAULT_ENRICH_BACKEND`	`ollama`	`ollama` \| `claude-api` \| `claude-cli`.
`OLLAMA_HOST` / `OLLAMA_MODEL`	`http://localhost:11434` / `qwen2.5:7b`	Local LLM.
`CLAUDE_MODEL` / `CLAUDE_AGENT_MODEL`	`claude-sonnet-4-6` / `claude-opus-4-7`	Enrich vs. agent default models.
`MAX_DAILY_CLAUDE_USD` / `COST_CONFIRM_USD`	`5.00` / `0.10`	Daily cap and per-call confirmation threshold.

The Anthropic key is never read from .env — it lives in the OS keychain.

Whisper model size guide

Model	Disk	RAM	Speed (M-series)	Speed (CPU)	Quality
`tiny.en`	75 MB	~1 GB	~30× real-time	~10×	rough
`base.en`	142 MB	~1 GB	~16×	~5×	usable
`small.en`	466 MB	~2 GB	~6×	~2×	good default
`medium.en`	1.5 GB	~5 GB	~2×	<1×	great on M-series
`large-v3`	2.9 GB	~10 GB	~1×	painful	best

Custom prompts & agent workflows

Prompts. Drop a file in prompts/<task>.md using {{transcript}} as the placeholder. The CLI auto-discovers it: whisperlog enrich /path/to/transcript.txt --task my-custom-task. Bundled prompts include summarize, meeting_notes, action_items, followup_email, calendar_ics, coding_session, and identify_speakers.

Agent workflows. A workflow is a YAML file with a steps list; each step has a name, an output filename, and either a template (referencing prompts/<template>.md) or an inline prompt:

steps:
  - name: tldr
    template: summarize
    output: tldr.md
  - name: facts-only
    output: facts.md
    prompt: |
      Extract only verifiable factual claims from the transcript below.
      Output a numbered list. Skip opinions and intentions.

      ---

      {{transcript}}

Run it: whisperlog agent custom archive/.../transcript.txt --workflow my_workflow.yaml.

MCP server

Expose your transcript archive to Claude Desktop or Claude Code:

uv pip install -e '.[mcp]'

Add this to your MCP config:

{
  "mcpServers": {
    "whisperlog": { "command": "whisperlog-mcp" }
  }
}

Tools the server provides:

search_transcripts(query, limit) — FTS5 search
get_transcript(recording_id) — full text
list_recent(limit) — recent recordings
list_enrichments(recording_id) — past summaries/agent outputs

Now you can ask Claude "What did I talk about with Sarah last week?" and it queries this archive directly.

Architecture & data model

src/whisperlog/ holds the package:

ingest.py — copy/dedupe recordings into the archive (safe_copy, source detection).
transcribe.py — faster-whisper runner producing .txt/.srt/.md.
archive.py / db.py — SQLite FTS5 index, recording records, enrichment records, search.
enrich/ — pluggable backends (base.py, ollama.py, claude_api.py, claude_cli.py) + prompt loader and audit logging.
agent.py — multi-step workflows (meeting_debrief, code_review, custom_workflow).
mcp_server.py — stdio MCP server (whisperlog-mcp).
config.py, secrets.py, ledger.py, redact.py, watch.py, utils.py, cli.py — settings, keychain, spend cap, PII redaction, the watch daemon, helpers, and the Typer entry point.

State lives in three places: the archive (~/Documents/whisperlog-archive/), ~/.whisperlog/ (DB, audit log, enrich log), and OS keychain entries under whisperlog.

Troubleshooting

Device not detected. On macOS, look for /Volumes/IC RECORDER. diskutil list shows whether the OS sees the recorder.
Ollama call fails. Confirm ollama serve is running and ollama list shows the OLLAMA_MODEL. There's no silent fallback — intentional.
"No Anthropic API key in OS keychain." Run whisperlog config set-key anthropic; re-run to rotate.
Cost-cap error mid-day. Raise MAX_DAILY_CLAUDE_USD or wait — the cap is per local calendar day.

Development

uv pip install -e '.[dev,all]'
ruff check .
pytest

Tests live in tests/ (redaction, ingest dedup, transcribe outputs, enrich backends, spend cap, archive search, SRT utils) and use a tmp-dir-isolated SQLite DB — they never touch your real archive or keychain.

Status

The core pipeline (ingest → transcribe → search), local Ollama enrichment, Claude API/CLI enrichment, agent workflows, redaction, spend cap, and the MCP server are all implemented and covered by the test suite.
src/ux570_transcribe/ is a stale artifact from the project rename (commit d5071ea, ux570-transcribe → whisperlog). It contains only __pycache__, .DS_Store, and an empty enrich/ subfolder — no live code. The wheel build target is src/whisperlog, so it does not ship. Safe to delete.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.claude/plans		.claude/plans
prompts		prompts
scripts		scripts
src/whisperlog		src/whisperlog
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.md.bak		README.md.bak
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisperlog

TL;DR

What this is

Privacy model

Tech stack

Prerequisites

Quick start (local mode)

Commands

Adding Claude

Cost expectations

Redact before sending

Configuration

Whisper model size guide

Custom prompts & agent workflows

MCP server

Architecture & data model

Troubleshooting

Development

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

whisperlog

TL;DR

What this is

Privacy model

Tech stack

Prerequisites

Quick start (local mode)

Commands

Adding Claude

Cost expectations

Redact before sending

Configuration

Whisper model size guide

Custom prompts & agent workflows

MCP server

Architecture & data model

Troubleshooting

Development

Status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages