This project implements a web application for transcribing podcast episodes, leveraging OpenAI's Whisper model for speech-to-text conversion and Algolia for efficient search indexing. The architecture comprises a Python Quart backend for core logic and data management, and a React frontend for user interaction.
- Podcast RSS Feed Ingestion: Parses provided RSS feed URLs to extract podcast episode metadata.
- Audio Sample Extraction: Downloads and processes audio, extracting user-defined random samples (in minutes) to optimize transcription time and resource consumption.
- Speech-to-Text Transcription: Utilizes OpenAI's Whisper model for high-accuracy audio transcription, using a user-provided OpenAI API key.
- Local Data Persistence: Stores episode titles and transcripts in a SQLite database.
- Algolia Integration: Indexes transcribed content with Algolia for powerful search capabilities, using user-provided Algolia Application ID and Write API Key.
- Interactive Frontend: A React application facilitating RSS URL submission, configurable number of episodes for transcription, displaying transcription results, and real-time granular process tracking.
- Algolia Dashboard Link: Provides a direct link to the Algolia dashboard's index explorer, allowing users to view their indexed transcripts (requires Algolia login).
The project adheres to a standard two-tier directory structure:
app.py: Quart application entry point, API routes, and a health checkpodcast_workflow.py: Orchestration logic for the transcription pipelinefetch_rss.py: RSS feed parsing utilitydownload_audio.py: Audio downloading and sampling logictranscribe.py: Whisper model integrationdatabase.py: SQLite database abstractionupload_algolia.py: Algolia API integration for indexingrequirements.txt: Python package dependencies
public/: Static assetssrc/: React source code (e.g.,App.js,index.js)package.json: Node.js dependencies and project scripts
Make sure the following tools are installed:
- Git — Version control
- Python 3.8+ — Backend runtime (
pipwill be available) - Node.js 14+ & npm 6+ — Frontend runtime and package manager
Clone the project:
git clone https://github.com/dwariyar/podcast_transcriber.git
cd podcast_transcriberNote: The recent commit involving Redis and RQ likely will not work locally. To load the previous version, run the following command in the root directory of your project:
git checkout <COMMIT_HASH>Replace <COMMIT_HASH> with the actual 7-character short hash of the desired commit. For this project, 0225d34 should be the commit hash to give you a stable version to work with.
Navigate to the backend/ directory:
cd backendCreate and activate a Python virtual environment, then install required packages:
python3 -m venv venv # Create venv (use `py -m venv venv` on Windows)
source venv/bin/activate # Activate venv (use `.\venv\Scripts\activate.bat` or `.\venv\Scripts\Activate.ps1` on Windows)
pip install -r requirements.txtNote: OpenAI and Algolia API keys (Application ID, Write API Key) are now provided directly via the frontend UI when you run the application. You do not need to configure them in backend environment variables (like a .env file) for the backend to function. Ensure your Algolia Write API Key possesses addObject, deleteObject, listIndexes, and settings permissions for full functionality.
If you encounter a UnicodeEncodeError or 'ascii' codec can't encode character error during transcription (especially with podcast titles or content containing special characters like curly quotes “ ”), it's likely due to your terminal's default character encoding.
To ensure your Python environment correctly handles all Unicode characters, set your locale environment variables to UTF-8 before running your Quart backend:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8Navigate to the frontend/ directory:
cd ../frontendInstall the necessary Node.js packages:
npm installBootstrap is utilized for frontend styling. Ensure its CSS is imported in your React application. The recommended approach is via src/index.js.
Verify frontend/src/index.js includes:
import 'bootstrap/dist/css/bootstrap.min.css';From the backend/ directory, ensure your Python virtual environment is active and you've set the locale environment variables (see 2.c above):
cd my_podcast_project/backend
source venv/bin/activate # Or Windows equivalent
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export PYTHONPATH=$PYTHONPATH:/path/to/backend
export QUART_APP=app.py
quart run --host 0.0.0.0 --port 5001From the frontend/ directory:
cd my_podcast_project/frontend
npm startThe React development server will usually launch your browser to http://localhost:3000.
With both services running, you can now interact with the Podcast Transcriber application via your browser.
You will now need to input your OpenAI API Key, Algolia Application ID, and Algolia Write API Key directly into the provided fields in the frontend UI before initiating a transcription.