Meeting transcription with speaker diarization using remote speaches API servers. Offloads VAD, speaker embeddings, and transcription to GPU servers — the client stays lightweight with no ML dependencies.
- Remote processing: VAD, speaker embeddings, and transcription via speaches API (OpenAI-compatible)
- Web UI: Browser-based interface with step-by-step workflow
- User authentication: Login/password auth with team-scoped access
- Multi-track processing: Handle video files with multiple audio tracks or individual audio files
- Speaker enrollment: Register speakers with voice samples for automatic identification
- Speaker diarization: Automatically separate and identify speakers without enrollment
- Multi-team support: Separate speaker databases and sessions per team
- Parallel transcription: Distribute chunks across multiple servers
- Flexible input: Video files, audio files, directories, or glob patterns
- Python 3.12+
- FFmpeg (for audio extraction and conversion)
- One or more speaches API servers
FFmpeg is required for processing audio and video files.
Windows:
winget install "FFmpeg (Shared)"macOS:
brew install ffmpegLinux (Debian/Ubuntu):
sudo apt install ffmpegAfter installation, restart your terminal to update PATH.
uv venv
uv pip install -e ".[web]"The [web] extra installs FastAPI, Uvicorn, Jinja2 and other web dependencies. Omit it for CLI-only usage.
MeetScribe uses two configuration files:
| File | Purpose |
|---|---|
.env |
Data directory path and environment settings |
data/config.yaml |
Servers, pipeline parameters, web UI settings |
# 1. Set up environment
cp .env.example .env
# 2. Set up config
cp config.example.yaml data/config.yaml
# Edit data/config.yaml — set your server URL
# 3. Initialize database and create admin user
meetscribe team create default
meetscribe user create admin --team default --adminControls where MeetScribe stores its data. See .env.example.
| Variable | Description | Default |
|---|---|---|
MEETSCRIBE_DATA_DIR |
Root directory for DB, logs, sessions, samples | Platform-specific (see below) |
MEETSCRIBE_TMP_DIR |
Temp files directory | DATA_DIR/tmp |
MEETSCRIBE_MAX_UPLOAD_SIZE |
Max upload size in bytes | 4294967296 (4 GB) |
Default data directory without MEETSCRIBE_DATA_DIR:
- Windows:
%LOCALAPPDATA%/meetscribe - macOS:
~/Library/Application Support/meetscribe - Linux:
~/.local/share/meetscribe
Setting MEETSCRIBE_DATA_DIR=./data keeps everything in the project directory — convenient for development and debugging.
Located at MEETSCRIBE_DATA_DIR/config.yaml (by default ./data/config.yaml). All application settings in one file. See config.example.yaml for a fully documented example.
Sections:
servers— List of speaches API servers (URL + name)vad— Voice Activity Detection: server, timeout, silence/speech thresholdsembeddings— Speaker embeddings: server, model, identification thresholds, AHC clustering parameterstranscription— Speech-to-text: servers, model, language, timeout, segment mergingweb— Web UI: host, port, session TTL
Start the web interface:
meetscribe web
meetscribe web --host 0.0.0.0 --port 8080Host and port can also be set in config.yaml under the web section. CLI arguments take priority.
Create an admin user via CLI before using the web UI:
meetscribe user create admin --team default --adminThe admin can then register other users through the web UI at /register.
The web UI guides you through a 6-step process:
- Upload — Upload video or audio files
- Configure — Assign speakers to tracks or enable auto-diarization
- Extract — Extract speaker samples via VAD + embeddings
- Samples — Review and organize extracted speaker samples
- Enroll — Register speakers from samples
- Transcribe — Generate transcript with speaker attribution
- Each user belongs to a team
- Sessions are visible only to users in the same team
- Only admin users can register new users (in their own team)
- Authentication uses HttpOnly cookies (works with SSE streaming)
MeetScribe supports multiple teams, each with its own set of enrolled speakers, voice samples, and sessions. This enables separate speaker databases for different projects, clients, or departments.
All commands accept -t/--team flag to specify the team (defaults to default):
meetscribe -t sales enroll "John Doe" ./samples/john/
meetscribe -t sales transcribe meeting.mp4 -o output.md
meetscribe -t sales list-speakersmeetscribe team create sales
meetscribe team list
meetscribe team delete salesTeam data is stored in teams/<name>/samples/ under the data directory. Voiceprints are stored in a shared SQLite database (meetscribe.db), scoped per team.
# Create an admin user
meetscribe user create admin --team default --admin
# Create a regular user
meetscribe user create john --team sales
# List all users
meetscribe user list
# Delete a user
meetscribe user delete johnTranscribe a meeting with speaker diarization:
meetscribe transcribe meeting.mp4 -o output.md --track1 "Host"
meetscribe transcribe path/to/tracks/ -o output.md --track1 "Host"
meetscribe transcribe track1.wav track2.wav -o output.md --track1 "Host"Tracks without a --trackN assignment are diarized automatically.
| Option | Description | Default |
|---|---|---|
-t, --team |
Team to use for speaker identification | default |
-o, --output |
Output file or directory | required |
-l, --language |
Language code (overrides config.yaml) | from config |
--trackN |
Assign speaker name to track N | diarize |
Register known speakers for automatic identification:
meetscribe enroll "John Doe" ./samples/john/
meetscribe -t my-team enroll "John Doe" ./samples/john/Extract audio tracks from a video file:
meetscribe extract meeting.mp4 -o output_dir/Extract audio samples from unknown speakers for later enrollment:
meetscribe extract-samples meeting.mp4Show enrolled speakers:
meetscribe list-speakers
meetscribe -t my-team list-speakersStart the web UI server:
meetscribe web
meetscribe web --host 0.0.0.0 --port 8080Display data directories, configuration, and settings:
meetscribe infoMeetScribe supports multiple input formats:
- Video files (
.mp4,.mkv,.avi,.mov,.webm): audio tracks are extracted automatically - Audio files (
.wav,.mp3,.flac,.ogg,.m4a): used directly as tracks - Directories: all audio files in the directory are used as tracks
- Glob patterns: matched audio files are used as tracks
For video files with multiple audio tracks (e.g., track 1 = host, track 2 = guests), use --trackN to assign speaker names.
uv venv
uv pip install -e ".[dev,web]"
uv run pytest
uv run ruff check src/
uv run ruff format src/
uv run mypy src/Unit, functional, and integration tests covering the pipeline, database, config, and web services:
uv run pytest # all tests
uv run pytest --cov # with coverage
uv run pytest tests/test_models.py # single fileGitHub Actions runs on every push and PR: ruff, mypy, pytest, bandit — on Python 3.12 and 3.13.
MIT