Record your day, let AI organize it.
An open-source desktop app that transcribes, summarizes, and auto-organizes your recordings into structured notes — then lets you search across everything with natural language. Runs entirely on your machine. No cloud. No subscription.
You record a lecture, a meeting, a casual conversation — and it all just sits there as audio you'll never revisit.
VoiceVault changes that. It transcribes everything with on-device Whisper, then uses AI to:
- Generate concise summaries every minute
- Classify each recording (lecture, meeting, conversation, memo)
- Let you search across all past recordings with natural-language questions
- Export everything as clean, organized Markdown notes for Obsidian
All of this runs locally on your machine — no cloud, no API keys required (unless you opt in).
See your words appear as text while you speak. VoiceVault uses whisper-cli (on-device, via Bun.spawn subprocess) — no internet required.
Every minute, an AI summary of what was said appears automatically. Long recordings become clean timelines of key points instead of hours of raw audio.
| What you recorded | What VoiceVault creates |
|---|---|
| A university lecture | A structured lecture note |
| A team meeting | A meeting summary with action items |
| Coffee with a friend | A conversation log |
| Thinking out loud | A personal memo |
Classification is fully offline using local LLM via llama-cli. Custom templates are JSON files in templates/.
Ask a question in plain English (or any language):
"What did the professor say about transformer architecture last week?"
VoiceVault searches across all your recordings and gives you a grounded answer with exact sources and timestamps.
Export any recording as an Obsidian-compatible Markdown file with YAML frontmatter, auto-generated [[wikilinks]] to related recordings, and a clean timeline — ready for your vault.
- 100% offline — Whisper and LLM run locally; no data leaves your machine by default
- No accounts, no sign-ups
- Open source — inspect every line of code
| VoiceVault | Clova Note | Otter.ai | Built-in Voice Memo | |
|---|---|---|---|---|
| Price | Free | Paid | Paid | Free |
| Works offline | ✅ | ✗ | ✗ | ✅ |
| Auto-summarize | ✅ | Partial | ✅ | ✗ |
| Auto-classify | ✅ | ✗ | ✗ | ✗ |
| Search past recordings | Natural language (RAG) | Text only | Text only | ✗ |
| Custom templates | ✅ | ✗ | ✗ | ✗ |
| Obsidian / PKM export | ✅ | ✗ | ✗ | ✗ |
| Privacy | Local-only | Cloud | Cloud | Local |
| Open source | ✅ | ✗ | ✗ | ✗ |
Download the latest release from GitHub Releases.
- Download
stable-macos-arm64-VoiceVault.dmg - Open the DMG and drag VoiceVault to Applications
- On first launch, macOS Gatekeeper will show a warning (the app is unsigned).
Right-click the app → Open to bypass, or run:
xattr -cr /Applications/VoiceVault.app
- Grant microphone permission when prompted
Requirements: macOS 14+ (Sonoma), Apple Silicon (M1/M2/M3/M4)
- Download
stable-linux-x64-VoiceVault-Setup.tar.gz - Extract and run the installer:
The installer places VoiceVault in
tar -xzf stable-linux-x64-VoiceVault-Setup.tar.gz ./installer
~/.local/share/VoiceVault/and creates a desktop shortcut. - Install system dependencies (if not already present):
sudo apt install libwebkit2gtk-4.1-dev # Ubuntu/Debian
Requirements: Linux x64, glibc 2.35+ (Ubuntu 22.04+), GTK 4 + WebKitGTK
On first launch, VoiceVault will prompt you to download the Whisper speech-to-text model (~75 MB). This is the only time an internet connection is required.
For local LLM summarization, download a GGUF model and place it in ~/.voicevault/models/:
# Example: Whisper base model (manual download)
mkdir -p ~/.voicevault/models
wget -O ~/.voicevault/models/ggml-base.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.binCloud LLM providers (Claude, OpenAI, Google Gemini) are also supported — add your API key in Settings to enable them. No API key is required for fully offline use.
VoiceVault is a standalone Electrobun desktop app — a single binary that ships Bun (runtime) + Zig (launcher) + the system WebView. No Electron. No Python. No Docker.
VoiceVault/
├── src/
│ ├── main/ # Electrobun main process (Bun Worker)
│ │ ├── main.ts # Entry — DB init, RPC server, BrowserWindow
│ │ ├── http-rpc.ts # HTTP RPC server (port 50100)
│ │ ├── rpc/ # Domain handlers: audio, whisper, LLM, export…
│ │ ├── services/
│ │ │ ├── db.ts # bun:sqlite WAL database
│ │ │ ├── settings.ts # Settings (bun:sqlite-backed)
│ │ │ ├── registry.ts # ServiceRegistry singleton
│ │ │ └── subprocess/
│ │ │ ├── WhisperSubprocess.ts # Bun.spawn whisper-cli
│ │ │ └── LlmSubprocess.ts # Bun.spawn llama-cli
│ │ └── utils/
│ │ ├── subprocess.ts # resolveBinary / resolveModel / downloadFile
│ │ └── validate.ts # assertFiniteId / assertNonEmptyString / …
│ ├── renderer/ # React 19 + Vite (port 5173)
│ │ └── src/
│ │ ├── lib/
│ │ │ └── electrobun-bridge.ts # Routes window.api.* → HTTP RPC
│ │ ├── components/ # UI (shadcn/ui + Tailwind CSS v4)
│ │ └── pages/ # Route-level pages
│ └── shared/ # Types + IPC channel constants
│
├── plugin/ # Obsidian community plugin (TypeScript + esbuild)
├── scripts/ # dev-electrobun.sh, test-whisper.sh
├── templates/ # Classification template JSON files
├── tests/
│ ├── unit/ # Vitest (renderer components, i18n, format utils)
│ └── e2e/ # Playwright (app-launch smoke test)
└── electrobun.config.ts
Microphone (browser MediaRecorder)
│ audio blob
▼
HTTP RPC POST /rpc { channel: "whisper:transcribe-file", params: { filePath } }
│
▼
WhisperSubprocess → Bun.spawn whisper-cli
│ transcript segments
▼
LlmSubprocess → Bun.spawn llama-cli (or Claude / OpenAI API)
│ summary / classification
▼
bun:sqlite (~/.voicevault/voicevault.db)
│
▼
Obsidian Export → Markdown + YAML frontmatter
- Linux x64 or macOS (Windows: untested)
- A working microphone
- ~2 GB free disk space (AI models)
- Bun (
~/.bun/bin/bun) - pnpm (
npm install -g pnpm) - Linuxbrew (Linux) or Homebrew (macOS) — for
whisper-cliandllama-cli
git clone https://github.com/PJH720/VoiceVault.git
cd VoiceVaultpnpm install# Whisper (speech-to-text)
brew install whisper-cpp
# llama.cpp (local LLM — for summarization and classification)
brew install llama.cppBoth install as whisper-cli and llama-cli in your Linuxbrew/Homebrew bin directory.
# Whisper model (~75 MB)
mkdir -p ~/.voicevault/models
wget -O ~/.voicevault/models/ggml-tiny.en.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin
# LLM model (~2 GB — Gemma 3 or similar GGUF)
# Download from HuggingFace and place in ~/.voicevault/models/cp .env.example .env
# Edit .env — at minimum no changes needed for fully offline use.
# Add API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY) to enable cloud LLM providers.pnpm dev
# Starts: Vite renderer (port 5173) + Electrobun launcherClick the record button — your words appear as text in real time. Every minute, an AI summary is generated automatically.
Stop recording: VoiceVault classifies the content and presents organized summaries. Browse the timeline and see how your session was categorized.
Go to RAG Search and ask anything:
You: "When is the project deadline?"
VoiceVault: "Based on your recording from Feb 8 (conversation with Sarah), the project deadline is next Friday, February 14th. [Source: rec-2026-02-08, 00:12:30]"
Select any recording and export it as an Obsidian Markdown file — metadata, tags, and cross-links included.
Fully offline. No API key needed. Uses llama-cli via Bun.spawn.
# Download a GGUF model (e.g. Gemma 3)
# Place in ~/.voicevault/models/ and set LLM_MODEL in .envHigher quality summaries. Add keys to .env:
LLM_PROVIDER=claude
ANTHROPIC_API_KEY=your-key-here
Get a Claude API key at console.anthropic.com.
VoiceVault ships with seven built-in classification templates:
- Lecture — key concepts and definitions
- Meeting — agenda items, decisions, action items
- Conversation — participants, topics, memorable moments
- Memo — personal thoughts and ideas
- Person — contact notes
- English Vocabulary — vocabulary study entries
- Incident — incident report documentation
Add your own by dropping a JSON file into templates/. See the existing files for the format.
| Question | Answer |
|---|---|
| Where is data stored? | ~/.voicevault/ on your machine |
| Does anything go to the cloud? | Only if you opt into Claude / OpenAI API |
| Can I delete my data? | Yes — delete ~/.voicevault/ |
| What format are exports? | Standard Markdown (.md) |
No transcription appearing
- Check that your browser has microphone permission
- Verify
whisper-cliis installed:which whisper-cli - Run the smoke test:
pnpm test:whisper
LLM summaries not working
- Verify
llama-cliis installed:which llama-cli - Check that your GGUF model path is correct in
.env
App window doesn't open
- Verify GTK WebKit is installed (Linux):
apt install libwebkit2gtk-4.1-dev - Check
scripts/dev-electrobun.shfor build artifact path
For more, see wiki/FAQ-&-Troubleshooting.md or open an issue.
pnpm dev # Vite renderer (5173) + Electrobun launcher
pnpm build # vite build + bun build → out/
pnpm test # Vitest unit tests (tests/unit/)
pnpm test:watch # Vitest watch mode
pnpm test:e2e # Playwright (tests/e2e/app-launch.test.ts)
pnpm test:whisper # Whisper HTTP RPC smoke test
pnpm lint # ESLint
pnpm typecheck # tsc (renderer, tsconfig.web.json)
pnpm typecheck:bun # tsc (main process, tsconfig.node.json)
pnpm package:linux # pnpm build + electrobun build --env=stable
pnpm package:mac # pnpm build + electrobun build --env=stableStack:
- Runtime: Electrobun 1.15 (Bun + Zig + system WebView)
- UI: React 19 · Vite 7 · Tailwind CSS v4 · shadcn/ui
- Main process: Bun Worker · HTTP RPC (port 50100) ·
bun:sqliteWAL - Speech-to-Text:
whisper-cliviaBun.spawn - LLM:
llama-cliviaBun.spawn(local GGUF) or Claude / OpenAI API - Testing: Vitest · Playwright
See CLAUDE.md for contributor guidance and architectural decisions.
- Real-time transcription (Whisper)
- 1-minute auto-summarization
- Zero-shot classification with templates
- RAG search across recordings
- Obsidian Markdown export
- Hourly hierarchical summaries
- Cross-boundary time range extraction
- Electrobun desktop migration (v0.7.0 — Electron fully removed)
- Obsidian community plugin (embedded UI + RAG search)
- Speaker diarization (who said what)
- Mobile companion app
MIT License — free for personal and commercial use. See LICENSE.
VoiceVault — Record your day, let AI organize it.
Built with care for Sogang University Runnerthon 2026.