TubeScript is an application for generating speaker-labeled transcripts from YouTube videos with timestamps and punctuation using state-of-the-art AI models.
- Extract audio from YouTube videos
- Perform speaker diarization (who spoke when)
- Generate accurate transcriptions with proper punctuation
- Label speakers and timestamps
- Interactive UI for reviewing and editing transcripts
- Export in multiple formats (.txt, .srt, .vtt)
- Python 3.9+
- GPU with CUDA support (NVIDIA RTX 4070 Super or better recommended)
- FFmpeg installed and accessible via system PATH
- HuggingFace account and API token (for accessing pyannote.audio models)
git clone https://github.com/yourusername/TubeScript.git
cd TubeScriptcd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtCreate a .env file in the backend directory with the following content:
HUGGINGFACE_TOKEN=your_huggingface_token_here
You can obtain your HuggingFace token from: https://huggingface.co/settings/tokens
python preload_models.pycd ../frontend
npm installFor convenience, a start script is included that launches both backend and frontend servers:
start.batcd backend
source venv/bin/activate # On Windows: venv\Scripts\activate
python app.pyThe API will be available at http://localhost:8000
cd frontend
npm run devThe web interface will be available at http://localhost:5173
- Enter a YouTube URL in the input field
- Click "Process Video"
- Wait for the processing to complete
- Review the transcript and rename speakers if desired
- Export the transcript in your preferred format
- YouTube Audio Extraction: Downloads and extracts audio using yt-dlp
- Speaker Diarization: Uses pyannote.audio to identify different speakers
- Transcription: Applies OpenAI's Whisper to generate accurate text with punctuation
- Transcript Assembly: Combines speaker information with transcribed text
- Frontend Display: Shows an interactive transcript with editing capabilities
MIT License
- Pyannote Audio for speaker diarization
- OpenAI Whisper for transcription
- yt-dlp for YouTube downloading