Skip to content

Davenads/TubeScript

Repository files navigation

TubeScript

TubeScript is an application for generating speaker-labeled transcripts from YouTube videos with timestamps and punctuation using state-of-the-art AI models.

Features

  • Extract audio from YouTube videos
  • Perform speaker diarization (who spoke when)
  • Generate accurate transcriptions with proper punctuation
  • Label speakers and timestamps
  • Interactive UI for reviewing and editing transcripts
  • Export in multiple formats (.txt, .srt, .vtt)

Requirements

  • Python 3.9+
  • GPU with CUDA support (NVIDIA RTX 4070 Super or better recommended)
  • FFmpeg installed and accessible via system PATH
  • HuggingFace account and API token (for accessing pyannote.audio models)

Installation

1. Clone the repository

git clone https://github.com/yourusername/TubeScript.git
cd TubeScript

2. Set up the Python backend

cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

3. Configure environment variables

Create a .env file in the backend directory with the following content:

HUGGINGFACE_TOKEN=your_huggingface_token_here

You can obtain your HuggingFace token from: https://huggingface.co/settings/tokens

4. Preload AI models (recommended)

python preload_models.py

5. Set up the frontend

cd ../frontend
npm install

Usage

1. Using the start script (Windows)

For convenience, a start script is included that launches both backend and frontend servers:

start.bat

2. Manual startup

Start the backend server

cd backend
source venv/bin/activate  # On Windows: venv\Scripts\activate
python app.py

The API will be available at http://localhost:8000

Start the frontend development server

cd frontend
npm run dev

The web interface will be available at http://localhost:5173

3. Process a YouTube video

  1. Enter a YouTube URL in the input field
  2. Click "Process Video"
  3. Wait for the processing to complete
  4. Review the transcript and rename speakers if desired
  5. Export the transcript in your preferred format

How It Works

  1. YouTube Audio Extraction: Downloads and extracts audio using yt-dlp
  2. Speaker Diarization: Uses pyannote.audio to identify different speakers
  3. Transcription: Applies OpenAI's Whisper to generate accurate text with punctuation
  4. Transcript Assembly: Combines speaker information with transcribed text
  5. Frontend Display: Shows an interactive transcript with editing capabilities

License

MIT License

Acknowledgements

About

A local AI-powered YouTube transcription tool that extracts audio, identifies speakers, and generates accurate, punctuated transcripts with timestamps. Features speaker diarization via Pyannote.Audio, high-quality transcription with Whisper, and a web-based interface for editing and exporting in multiple formats (TXT, SRT, VTT).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors