Skip to content

BaruchEric/whisperlog

Repository files navigation

whisperlog

License: MIT Python 3.11+

Local-first transcription pipeline that copies recordings off a Sony ICD-UX570 (or any audio source), transcribes them with Whisper into a searchable SQLite/FTS5 archive, and adds opt-in, per-call Claude/Ollama enrichment plus an MCP server so Claude can search your archive directly.

TL;DR

  • What: Plug in a UX570, run whisperlog watch, and recordings flow into a local archive — copied off the device, transcribed, optionally summarized, and indexed for full-text search. Audio never leaves your machine.
  • How: ingest copies/dedupes files → transcribe runs faster-whisper → text is indexed in SQLite FTS5 → optional enrich/agent passes summarize via local Ollama or Claude → an MCP server exposes the archive to Claude Desktop/Code.
  • Stack: Python 3.11+, faster-whisper, typer CLI, pydantic-settings, SQLite FTS5; optional anthropic + keyring, pyyaml, and mcp extras. Managed with uv.
  • Run: uv pip install -e . then whisperlog watch (local mode, no network calls). Add Claude with uv pip install -e '.[cloud]' and whisperlog config set-key anthropic.

What this is

Plug in a UX570, run whisperlog watch, and your recordings flow into a local archive: copied off the device, transcribed with Whisper, optionally summarized with a local LLM, and indexed for full-text search. Claude features (summaries, action items, multi-step agent workflows, MCP) are opt-in and per-call — never silently triggered.

The pipeline is generic — the UX570 is just the default. Pass --source <dir> to ingest or watch to point it at any USB drive, network share, or local folder. If the Sony REC_FILE/FOLDER01..05 layout is detected the device convention is used; otherwise the directory is walked recursively for any common audio format (.mp3 .wav .m4a .mp4 .flac .ogg .aac .opus .webm).

Privacy model

There are two modes. The default never makes a network call.

Mode Audio Transcript Summarization
Local (default) stays on disk stays on disk local Ollama
Cloud (opt-in) stays on disk sent to Claude API or Claude CLI Anthropic
  • Audio never leaves your machine, in either mode. Even agents see only the transcript text.
  • API keys are stored in the OS keychain (Keychain on macOS, Secret Service on Linux) via keyring, never in .env, never logged.
  • Audit log at ~/.whisperlog/audit.log records every cloud call: timestamp, backend, model, transcript SHA-256, first 80 characters, token counts, cost. The transcript content itself is not written.
  • Fully offline mode is real. With DEFAULT_ENRICH_BACKEND=ollama and --backend ollama, the anthropic SDK is lazily imported and never loaded.

Tech stack

  • Language: Python 3.11+ (requires-python = ">=3.11"), managed with uv.
  • Core deps: faster-whisper (transcription), typer + rich (CLI), pydantic / pydantic-settings (config), httpx, platformdirs.
  • Storage: SQLite with FTS5 full-text search at ~/.whisperlog/index.db; archive files under ~/Documents/whisperlog-archive/.
  • Optional extras: cloud (anthropic, keyring), agent (adds pyyaml), mcp (mcp), all, and dev (pytest, ruff, mypy).
  • Entry points: whisperlog (CLI) and whisperlog-mcp (stdio MCP server).

Prerequisites

  • Python 3.11+ and uv (or pip)
  • ffmpeg — for audio decoding (brew install ffmpeg / apt install ffmpeg)

Optional, depending on which features you want:

  • Ollama — local LLM for summarization. brew install ollama, then ollama serve and ollama pull qwen2.5:7b.
  • claude CLI — Claude Code CLI, for the claude-cli enrichment backend and in-repo agent proposals.
  • Anthropic API key — for the claude-api backend and most agent workflows.

Quick start (local mode)

git clone https://github.com/BaruchEric/whisperlog.git
cd whisperlog
uv pip install -e .

# Copy the env template and tweak as needed.
cp .env.example .env

# Plug in the UX570, then:
whisperlog ingest         # copy new recordings to the archive
whisperlog transcribe 'archive/**/audio.mp3'

# Or, in one shot:
whisperlog watch          # daemon: ingest+transcribe whenever the device is plugged in
whisperlog watch --enrich # also runs the default Ollama summarizer

# Any other audio source — a folder, USB drive, network share, etc.:
whisperlog ingest --source ~/Dropbox/voice-memos
whisperlog watch  --source ~/Dropbox/voice-memos --once

Search later:

whisperlog search "sarah AND project"

The archive lives at ~/Documents/whisperlog-archive/<YYYY>/<YYYY-MM-DD>/<HH-MM>_<sha8>/ and contains:

  • audio.mp3 — the original file (read-only)
  • transcript.txt — plain text
  • transcript.srt — subtitles
  • transcript.md — metadata + transcript + appended enrichments

Commands

whisperlog --help for the full list; key subcommands:

Command What it does
ingest [--source DIR] [--once] Copy + dedupe recordings off the device (or any source) into the archive.
transcribe '<glob>' Run faster-whisper over matching audio, writing transcript.{txt,srt,md}.
watch [--source DIR] [--enrich] [--once] Daemon that ingests + transcribes (and optionally enriches) on plug-in.
search '<FTS5 query>' Full-text search across all transcripts.
enrich <path> [--backend ...] [--task ...] [--model ...] [--yes] Run one enrichment pass and append to the transcript .md.
redact <path> [--ollama] [--out ...] Strip PII via regex (+ optional local Ollama pass) before sending anything out.
stats [--days N] Show cloud spend over a window.
config set-key anthropic / config show Store the Anthropic key in the OS keychain / print effective config.
agent meeting-debrief <path> Notes + action items + follow-up email + .ics, from one transcript.
agent code-review <path> [--repo ...] Extract decisions/questions, optionally propose code changes via Claude CLI.
agent custom <path> --workflow file.yaml Run a YAML-defined sequence of prompts.
version Print the version.

Adding Claude

Claude features are an extra, paid layer. Install the extras and store your key:

uv pip install -e '.[cloud]'
whisperlog config set-key anthropic     # prompts; key goes to OS keychain

Then pick a backend per call:

whisperlog enrich archive/2026/2026-04-27/14-30_abcd1234/ \
  --backend claude-api \
  --task meeting_notes

Cost expectations

Defaults use claude-sonnet-4-6 (cheaper/faster). Agent workflows default to claude-opus-4-7 (higher quality, ~5× the cost). The daily cap (MAX_DAILY_CLAUDE_USD, default $5.00) is enforced before any call is sent — hitting it raises an error, with no silent failover. whisperlog stats --days 30 shows your spend.

Redact before sending

Regex catches email/phone/SSN/credit-card patterns; a local Ollama pass also strips names and addresses (better recall, still imperfect):

whisperlog redact archive/.../transcript.txt --ollama --out cleaned.txt
whisperlog enrich cleaned.txt --backend claude-api --task summarize

Regex alone is not real DLP. Run with --ollama for anything sensitive.

Configuration

All settings load from .env (see .env.example) via pydantic-settings; CLI flags override per call. Notable keys:

Key Default Purpose
WHISPERLOG_ARCHIVE_DIR ~/Documents/whisperlog-archive Where the archive lives.
WHISPERLOG_STATE_DIR ~/.whisperlog DB, audit log, enrich log.
WHISPER_MODEL small.en Whisper model size (see guide below).
WHISPER_DEVICE / WHISPER_COMPUTE_TYPE auto / int8 float16 only on Nvidia GPUs.
DEFAULT_ENRICH_BACKEND ollama ollama | claude-api | claude-cli.
OLLAMA_HOST / OLLAMA_MODEL http://localhost:11434 / qwen2.5:7b Local LLM.
CLAUDE_MODEL / CLAUDE_AGENT_MODEL claude-sonnet-4-6 / claude-opus-4-7 Enrich vs. agent default models.
MAX_DAILY_CLAUDE_USD / COST_CONFIRM_USD 5.00 / 0.10 Daily cap and per-call confirmation threshold.

The Anthropic key is never read from .env — it lives in the OS keychain.

Whisper model size guide

Model Disk RAM Speed (M-series) Speed (CPU) Quality
tiny.en 75 MB ~1 GB ~30× real-time ~10× rough
base.en 142 MB ~1 GB ~16× ~5× usable
small.en 466 MB ~2 GB ~6× ~2× good default
medium.en 1.5 GB ~5 GB ~2× <1× great on M-series
large-v3 2.9 GB ~10 GB ~1× painful best

Custom prompts & agent workflows

Prompts. Drop a file in prompts/<task>.md using {{transcript}} as the placeholder. The CLI auto-discovers it: whisperlog enrich /path/to/transcript.txt --task my-custom-task. Bundled prompts include summarize, meeting_notes, action_items, followup_email, calendar_ics, coding_session, and identify_speakers.

Agent workflows. A workflow is a YAML file with a steps list; each step has a name, an output filename, and either a template (referencing prompts/<template>.md) or an inline prompt:

steps:
  - name: tldr
    template: summarize
    output: tldr.md
  - name: facts-only
    output: facts.md
    prompt: |
      Extract only verifiable factual claims from the transcript below.
      Output a numbered list. Skip opinions and intentions.

      ---

      {{transcript}}

Run it: whisperlog agent custom archive/.../transcript.txt --workflow my_workflow.yaml.

MCP server

Expose your transcript archive to Claude Desktop or Claude Code:

uv pip install -e '.[mcp]'

Add this to your MCP config:

{
  "mcpServers": {
    "whisperlog": { "command": "whisperlog-mcp" }
  }
}

Tools the server provides:

  • search_transcripts(query, limit) — FTS5 search
  • get_transcript(recording_id) — full text
  • list_recent(limit) — recent recordings
  • list_enrichments(recording_id) — past summaries/agent outputs

Now you can ask Claude "What did I talk about with Sarah last week?" and it queries this archive directly.

Architecture & data model

src/whisperlog/ holds the package:

  • ingest.py — copy/dedupe recordings into the archive (safe_copy, source detection).
  • transcribe.py — faster-whisper runner producing .txt/.srt/.md.
  • archive.py / db.py — SQLite FTS5 index, recording records, enrichment records, search.
  • enrich/ — pluggable backends (base.py, ollama.py, claude_api.py, claude_cli.py) + prompt loader and audit logging.
  • agent.py — multi-step workflows (meeting_debrief, code_review, custom_workflow).
  • mcp_server.py — stdio MCP server (whisperlog-mcp).
  • config.py, secrets.py, ledger.py, redact.py, watch.py, utils.py, cli.py — settings, keychain, spend cap, PII redaction, the watch daemon, helpers, and the Typer entry point.

State lives in three places: the archive (~/Documents/whisperlog-archive/), ~/.whisperlog/ (DB, audit log, enrich log), and OS keychain entries under whisperlog.

Troubleshooting

  • Device not detected. On macOS, look for /Volumes/IC RECORDER. diskutil list shows whether the OS sees the recorder.
  • Ollama call fails. Confirm ollama serve is running and ollama list shows the OLLAMA_MODEL. There's no silent fallback — intentional.
  • "No Anthropic API key in OS keychain." Run whisperlog config set-key anthropic; re-run to rotate.
  • Cost-cap error mid-day. Raise MAX_DAILY_CLAUDE_USD or wait — the cap is per local calendar day.

Development

uv pip install -e '.[dev,all]'
ruff check .
pytest

Tests live in tests/ (redaction, ingest dedup, transcribe outputs, enrich backends, spend cap, archive search, SRT utils) and use a tmp-dir-isolated SQLite DB — they never touch your real archive or keychain.

Status

  • The core pipeline (ingest → transcribe → search), local Ollama enrichment, Claude API/CLI enrichment, agent workflows, redaction, spend cap, and the MCP server are all implemented and covered by the test suite.
  • src/ux570_transcribe/ is a stale artifact from the project rename (commit d5071ea, ux570-transcribe → whisperlog). It contains only __pycache__, .DS_Store, and an empty enrich/ subfolder — no live code. The wheel build target is src/whisperlog, so it does not ship. Safe to delete.

License

MIT © Eric Baruch

About

Local-first audio transcription pipeline (Whisper + SQLite/FTS5, optional Claude/Ollama enrichment, MCP). Auto-detects Sony ICD-UX570; works with any audio source.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors