Skip to content

EffortlessMetrics/slower-whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

240 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

slower-whisper

Know who said what, when, and how they said it — with streaming, topic structure, and stable contracts. Local-first, from any audio file, on your machine.

Transcription gives you words. slower-whisper gives you the whole conversation: speakers, timing, tone, emotion, and topic structure — as stable, versioned JSON you can build on. Local-first. No cloud, no API keys, no per-minute billing.

CI Verify PyPI Python License

from transcription import TranscriptionConfig, transcribe_file

cfg = TranscriptionConfig(model="base", device="auto", language="en")
transcript = transcribe_file("meeting.wav", root=".", config=cfg)

print(transcript.full_text)
print(transcript.segments[0].speaker)       # "spk_0"
print(transcript.segments[0].audio_state)   # prosody, emotion, timing

Install

uv add slower-whisper
# or: pip install slower-whisper

Add what you need:

Extra What it adds
enrich-basic DSP stack (numpy, librosa, soundfile)
enrich-prosody Praat-based prosody (praat-parselmouth)
emotion Emotion models (torch, torchaudio, transformers)
diarization Speaker diarization (pyannote.audio)
full Everything above
api FastAPI service runtime
integrations LangChain + LlamaIndex adapters
uv sync                              # transcription only
uv sync --extra full                 # all enrichment
uv sync --extra full --extra dev     # contributor toolchain

Package Map

Package What it is
transcription Core Python API — batch, file, streaming, enrichment, post-processing
slower_whisper faster-whisper drop-in replacement (WhisperModel, Segment, Word)
slower-whisper CLI — transcribe, enrich, benchmark, export, validate

faster-whisper Migration

Change one import:

# Before
from faster_whisper import WhisperModel

# After
from slower_whisper import WhisperModel

model = WhisperModel("base", device="auto")
segments, info = model.transcribe("audio.wav", word_timestamps=True)

# slower-whisper extension — full transcript with enrichment
transcript = model.last_transcript

Full option mapping: docs/FASTER_WHISPER_MIGRATION.md

CLI Quickstart

# Transcribe (reads raw_audio/, writes whisper_json/)
uv run slower-whisper transcribe --root .

# Enrich with prosody + emotion
uv run slower-whisper enrich --root .

# Benchmark against baselines
uv run slower-whisper benchmark run --track asr --dataset smoke
uv run slower-whisper benchmark compare --track asr --dataset smoke --gate

What You Get

  • Local-first — all processing on your hardware. No accounts, no rate limits, no per-minute billing.
  • Beyond words — speaker diarization, prosody, emotion, semantic annotation as opt-in layers.
  • Stable contracts — JSON schema v2 with schema_version, receipt, and run_id. Same input + config = same output.
  • Streaming — WebSocket/SSE with event envelopes, resume protocol, and backpressure.
  • Post-processing — topic segmentation (TF-IDF), turn-taking policies, domain presets for call centers and meetings.
  • Benchmarks — ASR WER, diarization DER, emotion, streaming latency — with baseline gating in CI.
  • faster-whisper compatible — change one import line.

Documentation

Start here Then
Quickstart CLI Reference
Python API Configuration
faster-whisper Migration Streaming Architecture
Post-Processing Benchmarks

Full docs index: docs/INDEX.md | Vision and strategy: VISION.md | Changelog: CHANGELOG.md | Roadmap: ROADMAP.md

Community & Support

License

Dual-licensed under Apache 2.0 or MIT, at your option. See LICENSE-APACHE and LICENSE-MIT.

About

Know who said what, when, and how — local-first conversation ETL with streaming, topic segmentation, and schema-versioned JSON.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors

Languages