🎙️ EchoScribe

Turn your voice (or music) into text. A beginner-friendly Speech-to-Text project built with Python, Flask, and Whisper.

Start simple with microphone recording or file uploads — then upgrade into lyrics mode with Demucs, karaoke highlights, or speaker diarization. Now also includes Live Dictation (real-time speech → text).

✨ Features

🖥️ Web UI with Flask (HTML/CSS/JS frontend)

🎤 Voice → Text: Record mic audio or upload a file → transcribe with faster-whisper

📹 Video → Text: Extract audio from video via ffmpeg → transcribe

📝 Live Dictation (new!): Speak into your mic and see text appear in real time (Web Speech API MVP, offline Whisper streaming planned)

🎶 Lyrics mode: Isolate vocals with Demucs → Whisper for lyric transcription

⚡ Visualizer: Animated equalizer while recording

📜 Export: Copy, clear, expand view, or save transcripts as plain text / .srt

🔊 CLI mic tool: Record + transcribe directly from Python

🛠️ Tech Stack

Backend

Flask → web server & API faster-whisper → Whisper STT engine FFmpeg → decode/convert mic uploads & video to 16kHz WAV Werkzeug → safe file uploads

Frontend

HTML + CSS (gaming-inspired UI) Vanilla JavaScript → mic recording, file uploads, transcription, export Web Audio API → live input visualizer (equalizer) Web Speech API → Live Dictation MVP (browser-based speech recognition)

Optional / Future

sounddevice + scipy → CLI mic recorder demucs (PyTorch) → Lyrics mode (vocal separation) flask-sock + simple-websocket + numpy + soundfile → Planned Offline Live Dictation (real-time Whisper over WebSockets) fluent-ffmpeg (Node) → experimental video/audio processing

🔹 Installation

Clone the repo and set up a virtual environment:

git clone https://github.com/irinaholler/EchoScribe.git cd EchoScribe python -m venv .venv source .venv/bin/activate

Install dependencies: pip install -r requirements.txt

Make sure FFmpeg is installed and available in your system PATH: ffmpeg -version (macOS: brew install ffmpeg)

🚀 Run

Start the Flask dev server: python app.py

Open your browser at: 👉 http://127.0.0.1:5000

📂 Modes

Voice → Text: Upload or record → /stt Video → Text: Upload video → /video Live Dictation: Real-time speech → /live Lyrics → Text: Isolate vocals with Demucs → /lyrics

⚠️ Notes

Live Dictation (MVP) uses the browser’s Web Speech API (requires internet, supported in Chrome/Edge/Safari). Offline Live Dictation (with faster-whisper streaming) is planned — will require WebSockets & extra Python libs. Demucs is optional and heavy (PyTorch dependency). Only install if you want Lyrics mode.

📜 License

Permission is granted to view and use this code for personal learning purposes only.

🔹 Needed for lyrics mode (vocals → Whisper)

demucs → separates vocals from music so Whisper can catch lyrics better ⚠️ Heavy dependency: installs PyTorch, large download, slower install.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
static		static
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ EchoScribe

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ EchoScribe

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages