UofTHacks 13 Winner - MLH Best Use of ElevenLabs
Authors: James Weng, Ryan Li, David Yang
Pipeline: Upload a product name and audio file with speech → ElevenLabs clones voice(s) and generates a human-like ad read → Call OpenAI API to generate ad text → the system finds the optimal insertion point based on syntactic + semantic context, stitching the ad into the final audio.
- User input for podcast audio and product name - product details are optional
- AI-recommended insertion timestamps using semantic + syntactic analysis
- ElevenLabs TTS for realistic sponsor reads (single speaker or multi-way conversation)
- Preview insertions before rendering final output
- Export monetized episodes with loudness matching + crossfades
- Frontend: React + TypeScript + Vite, plain CSS with CSS custom properties (Syne + DM Mono + DM Sans fonts)
- Backend API: Node.js + Express
- Audio pipeline: Python (pydub, librosa, pyloudnorm)
- AI services: OpenAI (ad generation + placement), ElevenLabs (TTS/voice cloning)
- Media tools: ffmpeg/ffprobe
.
├── README.md
├── backend/ # Node API + Python ad_inserter pipeline
│ ├── ad_inserter/
│ ├── audio_tests/
│ ├── index.mjs
│ ├── package.json
│ └── requirements.txt
├── frontend/ # React UI
│ ├── src/
│ ├── index.html
│ ├── package.json
│ └── vite.config.ts
├── docs/
└── venv/
- Node.js (for
frontend/andbackend/) - Python (for
backend/ad_inserter) ffmpeg+ffprobe
cd backend
npm install
pip install -r requirements.txtSet env vars (examples):
export ELEVENLABS_API_KEY="..."
export OPENAI_API_KEY="..."Run the API:
npm run devThe API listens on http://localhost:3001.
cd frontend
npm install
npm run devThe UI runs on http://localhost:5173 and calls the backend.
__init__.pyexposes the package modules (analysis, llm, mix) and versionanalysis.pyhandles audio analysis: ffmpeg check, loading/standardizing audio, silence-based candidate detection for podcasts, beat/RMS analysis for songs, optional Whisper transcription, and building candidate payloadsanalyze_cli.pyexposes a CLI helper that runs analysis and returns JSON for the Node APIcli.pyprovides the single-speaker CLI workflow: parse args, pick candidates, call LLM to write promo/choose insertion, loudness match + room tone + crossfade, and export output (plus debug artifacts)insert_ad.pyhandles two-speaker insertion (A/B/DUO), optional diarization, and optional voice cloningllm.pybuilds the prompt and calls OpenAI to generate promo text and choose insertion index; parses JSON response intoLLMResultmix.pydoes audio mixing utilities: LUFS measurement, loudness matching, looping room tone, ducking, crossfade insertion, and context window extractiontts.pybuilds sponsor reads with ElevenLabs (single or multi-statement blocks)
Semantic context:
- Uses Whisper locally (if installed) to transcribe short context windows around candidate insertion points before evaluating topic transitions and sentence boundaries
- If Whisper is not available, fall back to silence-based insertion
Rhythmic/syntactic context:
- Uses librosa to estimate tempo and beat times
- Finds low-energy (RMS) valleys, snaps to the nearest beat, and inserts the promo there
Run from the backend/ directory so python -m ad_inserter.cli can find the package.
Podcast example:
python -m ad_inserter.cli \
--main path/to/main.mp3 \
--promo-audio path/to/promo.wav \
--product-name "Sparrow Notes" \
--product-desc "A calmer note-taking app for busy teams" \
--product-url "https://sparrow.example" \
--mode podcast \
--out output.mp3 \
--debug-dir debugSong example:
python -m ad_inserter.cli \
--main path/to/song.mp3 \
--promo-audio path/to/promo.mp3 \
--product-name "Pulse Water" \
--product-desc "Electrolytes without the sugar crash" \
--mode song \
--out song_with_ad.mp3This feature inserts an AI-written ad into a two-person conversation. It can speak as Speaker A, Speaker B, or a short back-and-forth.
OPENAI_API_KEYfor ad script generation (unless--llm-provider none)ELEVENLABS_API_KEYfor TTSELEVENLABS_VOICE_ID_AandELEVENLABS_VOICE_ID_Bfor speaker mapping, or setELEVENLABS_DEFAULT_VOICE_IDas a fallback
Optional diarization (enables DUO mode and voice cloning):
- Install
pyannote.audioseparately - Set
HUGGINGFACE_TOKEN(orPYANNOTE_TOKEN) for model access
python -m ad_inserter.insert_ad \
--input path/to/conversation.mp3 \
--product-name "Notion" \
--product-blurb "AI-powered productivity workspace" \
--ad-style casual \
--ad-mode DUO \
--out out.mp3Optional voice cloning (requires diarization + ElevenLabs API key):
python -m ad_inserter.insert_ad \
--input path/to/conversation.mp3 \
--product-name "Notion" \
--product-blurb "AI-powered productivity workspace" \
--ad-style casual \
--ad-mode A_ONLY \
--clone-voices \
--out out.mp3curl -X POST http://localhost:3001/ad/insert \
-F "audio=@path/to/conversation.mp3" \
-F "productName=Notion" \
-F "productBlurb=AI-powered productivity workspace" \
-F "adStyle=casual" \
-F "adMode=DUO" \
--output out.mp3--llm-provider openai(defaultopenai)- Set
OPENAI_API_KEYin your environment