Voice Transcribe

A high-performance audio transcription service built with Python that implements multiple Voice model strategies for speech-to-text conversion.

Overview

Voice Transcribe provides a FastAPI server that exposes endpoints for transcribing audio files and streaming audio through WebSockets. The project supports multiple Whisper model implementations:

Currently Implemented

whisper - Original OpenAI Whisper implementation with support for:
- Multiple model sizes (large-v3, turbo)
- Real-time transcription via WebSocket
- WAV file processing (currently using temporary file storage)
- Language detection

Future Implementations

whisper_cpp - C++ implementation of OpenAI's Whisper model
faster_whisper - Optimized version of Whisper for improved performance
whisperx - Enhanced Whisper model with additional features
Enhanced streaming capabilities for lower latency

Backstory

This project was developed to provide enterprise-grade transcription capabilities in datacenter environments, leveraging open-source resources rather than relying on proprietary cloud services.

Features

FastAPI REST API for audio transcription
WebSocket support for real-time transcription
Multiple Whisper model strategies (currently implementing OpenAI Whisper)
Support for WAV file transcription
Comprehensive API documentation
Example code for quick integration
Language detection and transcription
Model size selection (large-v3, turbo)

Requirements

Python 3.9.6+
UV package manager
OpenAI Whisper model files (automatically downloaded on first run)

Usage

Development Server

Run the FastAPI development server:

cd voice_transcribe &&
export PYTHONPATH="$PYTHONPATH:." &&
uvicorn main:create_app --reload

Production Server

Run the FastAPI application with Gunicorn and Uvicorn workers:

cd voice_transcribe &&
export PYTHONPATH="$PYTHONPATH:." &&
gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:create_app

API Endpoints

The API will be available at http://localhost:8000.

POST /transcribe - Transcribe a WAV audio file
WebSocket /ws - Stream audio for real-time transcription
Swagger /docs - Detailed API documentation and examples are available in the /docs directory.

Configuration

The service can be configured using YAML files in the configs directory:

dev.yaml - Development environment settings
stage.yaml - Staging environment settings
prod.yaml - Production environment settings

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Additional Resources

Medium Blog Post

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
example/live_call		example/live_call
voice_transcribe		voice_transcribe
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Transcribe

Overview

Currently Implemented

Future Implementations

Backstory

Features

Requirements

Usage

Development Server

Production Server

API Endpoints

Configuration

Contributing

License

Additional Resources

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Voice Transcribe

Overview

Currently Implemented

Future Implementations

Backstory

Features

Requirements

Usage

Development Server

Production Server

API Endpoints

Configuration

Contributing

License

Additional Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages