A production-ready Flask backend that transcribes meeting audio, translates speech to Hindi, and generates AI-powered meeting summaries using Vosk and Google Gemini. The core AI engine of the Shulker meeting ecosystem.
💡 Made by Vasu Goel
A production-ready Python + Flask backend for real-time speech processing and summarization:
- Transcribes audio files to English text using Vosk (offline, no API cost)
- Translates transcription to Hindi in real-time via Google Translate
- Generates structured meeting summaries with key points and action items using Gemini
- Combines transcription + summarization in a single endpoint
- Converts any audio format to WAV via FFmpeg before processing
- Deployed via Docker on Render
| Category | Technologies Used |
|---|---|
| Backend | Python, Flask 3.0.3 |
| Speech Recognition | Vosk 0.3.45 (vosk-model-small-en-us-0.15) |
| Audio Processing | FFmpeg, Wave, KaldiRecognizer |
| Translation | googletrans 4.0.0-rc1 |
| AI Summarization | Google Gemini (gemini-flash-latest) |
| CORS | Flask-Cors |
| Environment | python-dotenv |
| Containerization | Docker (python:3.12.4-slim) |
| Deployment | Render (Docker web service) |
| Production Server | Gunicorn 23.0.0 |
Shulker_AI/
├── api.py # Flask app - routes, Vosk recognition, Gemini summarization
├── requirements.txt # Python dependencies
├── dockerfile # Docker build config
├── render.yaml # Render deployment config
├── Procfile # Gunicorn process definition
├── runtime.txt # Python 3.12.4
├── .env # API key (not committed)
├── .gitignore # Ignores venv, .env, __pycache__
├── README.md # Project documentation
└── vosk-model-small-en-us-0.15/ # Offline Vosk English model
├── am/ # Acoustic model
├── graph/ # Language graph (FST)
├── ivector/ # Speaker adaptation vectors
└── conf/ # MFCC and model config
1. Clone the repository
git clone https://github.com/vasug27/Shulker_AI.git
cd Shulker_AI2. Create and activate a virtual environment
python -m venv myenv
# macOS/Linux
source myenv/bin/activate
# Windows
myenv\Scripts\activate3. Install FFmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# macOS
brew install ffmpeg
# Windows
Download from https://ffmpeg.org/download.html4. Install dependencies
pip install -r requirements.txt5. Configure environment
Create a .env file in the root directory:
GEMINI_API_KEY=your_google_gemini_api_key_here
Get your API key from Google AI Studio
6. Start the server
# Development
python api.py
# Production
gunicorn app:appServer runs at http://localhost:5000
Or run with Docker
# Build the image
docker build -t shulker-ai .
# Run the container
docker run -p 5000:5000 --env-file .env shulker-aiServer runs at http://localhost:5000
🌐 Live: https://shulker-ai.onrender.com
This repo includes vosk-model-small-en-us-0.15 - a lightweight offline English speech recognition model.
| Property | Detail |
|---|---|
| Size | Small (mobile-optimized) |
| Sample Rate | 16000 Hz |
| Word Error Rate | 10.38% (TED-LIUM) / 9.85% (LibriSpeech) |
| Speed | 0.11x real-time (desktop) |
| Latency | ~0.15s right context |
No internet required for transcription - Vosk runs fully offline.
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Health check - lists available routes |
| POST | /recognize |
Transcribe audio file to English + Hindi |
| POST | /summarize |
Generate meeting summary from plain text |
| POST | /recognize-and-summarize |
Transcribe audio and summarize in one call |
POST /recognize
- Content-Type:
multipart/form-data - Body: audio file (any format - converted to WAV via FFmpeg)
- Response:
{
"partials": ["partial transcript chunks"],
"final": {
"english": "full transcribed text",
"hindi": "हिंदी अनुवाद"
}
}- Errors:
400no file uploaded ·400audio conversion failed
POST /summarize
- Content-Type:
text/plain - Body: Raw meeting transcript text
- Response:
{
"summary": "Short paragraph + numbered action items",
"input_length": 1024
}- Errors:
400empty body ·500Gemini generation failed
POST /recognize-and-summarize
- Content-Type:
multipart/form-data - Body: audio file
- Response:
{
"recognized_text": "full transcribed text",
"summary": "Short paragraph + numbered action items"
}- Errors:
400no file uploaded ·400audio conversion failed
- Fork the repository
- Create a new branch (
feature/new-feature) - Commit changes & push
- Open a PR 🎉
Vasu Goel
Built for Shulker (AI Video Conferencing Assistant) - a quiz generation microservice extending this repo lives at Shulker_RAG.