Audio Transcriber

An audio transcription tool leveraging OpenAI Whisper and WhisperX for accurate and efficient transcriptions.

About

Audio Transcriber is a Python-based command-line and scriptable tool for transcribing audio files into text. It integrates:

OpenAI Whisper for baseline transcription.
WhisperX for forced alignment and improved timestamp accuracy.
Optional usage of DeepSeek or other transcription services.

The tool supports MP3 and MP4 input formats and outputs verbatim transcripts with timestamps.

Features

High accuracy with WhisperX forced alignment.
Timestamped transcriptions.
Batch processing of multiple audio files.
Simple CLI and Python API.
Customizable model selection and performance parameters.
This repository contains an app built using Gradio that inputs audio and video files and creates a transcription. The steps to create a transcription using this app are...

Create a new transcription:

Select the "Transcribe" tab
Enter the language spoken in the audio or video
Select an audio or video file on your local drive to upload
Select "Submit" to load the audio followed by "Transcribe" to begin the transcription
The results are displayed in the UI and written to an SQlite database.

View Available Transcriptions:

Select "View Transcriptions" tab
From the dropdown select the title of an audio or video
The transcripts results are displayed

Prerequisites

Python 3.8+
pip package manager
FFmpeg installed and available in your PATH.

Installation

git clone https://github.com/krpopkin/Audio_Transcriber.git
cd Audio_Transcriber
pip install -r requirements.txt

Usage

Command-Line

python transcriber1.py --input /path/to/audio.mp3 --output transcript.txt

Common options:

--model: Whisper model to use (e.g., base, small, medium, large).
--align: Enable WhisperX forced alignment.
--language: Specify language code (e.g., en, es).
--batch-dir: Process all audio files in a directory.

Python API

from transcriber1 import transcribe

result = transcribe(
    input_path="audio.mp3",
    model="small",
    align=True,
    language="en"
)
print(result.text)

Configuration

Configuration options are set via command-line flags or environment variables:

WHISPER_MODEL
USE_WHISPERX
FFMPEG_PATH

Project Structure

Audio_Transcriber/
├── transcriber1.py           # Main transcription script
├── transcription_service.py  # Whisper and WhisperX integration
├── requirements.txt
├── examples/                 # Example audio files and usage
│   └── sample_audio.mp3
└── README.md                 # Project README (this file)

Author

Ken Popkin
GitHub: @krpopkin
Email: krpopkin@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
database_queries.py		database_queries.py
file_upload.py		file_upload.py
get_the_transcription_results.py		get_the_transcription_results.py
gradio_ui_transcribe.py		gradio_ui_transcribe.py
gradio_ui_view_transcripts.py		gradio_ui_view_transcripts.py
main.py		main.py
requirements.txt		requirements.txt
transcriber.zip		transcriber.zip
transcription_service.py		transcription_service.py
transcriptions.db		transcriptions.db
view_transcripts_functions.py		view_transcripts_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Transcriber

Table of Contents

About

Features

Prerequisites

Installation

Usage

Command-Line

Python API

Configuration

Project Structure

Author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audio Transcriber

Table of Contents

About

Features

Prerequisites

Installation

Usage

Command-Line

Python API

Configuration

Project Structure

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages