A powerful real-time meeting transcription and speaker identification system built with Python (FastAPI backend) and JavaScript (frontend). The system provides live transcription, multi-speaker detection, conversation analytics, and real-time meeting insights.
- Real-Time Transcription: Live speech-to-text using Whisper AI
- Multi-Speaker Detection: Advanced voice embeddings for speaker identification
- Voice Activity Detection: Multiple VAD methods for accurate speech detection
- Live Analytics: Real-time speaking time, word count, and participation metrics
- WebSocket Communication: Live updates without page refreshes
- Conversation Insights: Detailed meeting analytics and statistics
- Robust Audio Processing: Enhanced noise handling and audio preprocessing
- FastAPI: Modern Python web framework
- Whisper: OpenAI's speech recognition model
- librosa: Audio feature extraction
- sounddevice: Real-time audio capture
- webrtcvad: Voice activity detection
- numpy/scipy: Scientific computing
- asyncio: Asynchronous processing
- Vanilla JavaScript: No framework dependencies
- WebSocket API: Real-time communication
- Responsive Design: Works on desktop and mobile
- Chart.js: Data visualization for meeting analytics
- Python 3.8 or higher
- Node.js (optional, for development server)
- Microphone access
- Modern web browser with WebSocket support
-
Clone the repository
git clone https://github.com/yourusername/meeting-assistant.git cd meeting-assistant -
Create and activate virtual environment
# Windows python -m venv venv venv\Scripts\activate # macOS/Linux python3 -m venv venv source venv/bin/activate
-
Install Python dependencies
cd backend pip install -r requirements.txt -
Create environment file (optional)
cp .env.example .env # Edit .env with your configurations if needed -
Run the backend server
python server.py
Or using uvicorn directly:
uvicorn server:app --host 0.0.0.0 --port 8001 --reload
The backend API will be available at
http://localhost:8001
-
Navigate to frontend directory
cd frontend -
Serve the frontend
Option A: Using Python's built-in server
python -m http.server 8000
Option B: Using Node.js (if installed)
npx serve -s . -p 8000Option C: Using any static file server
# Example with nginx, apache, or any static server # Point document root to the frontend directory
-
Access the application Open your browser and navigate to
http://localhost:8000
- Open the web application in your browser
- Grant microphone permissions when prompted
- Click "Start Meeting" to begin recording
- The system will automatically detect speakers and transcribe speech
- View real-time statistics and transcriptions on the dashboard
- Live Transcription: See transcriptions appear in real-time
- Speaker Detection: Different speakers are automatically identified
- Speaking Time: Monitor who's talking and for how long
- Voice Activity: Visual indicators show when speech is detected
- Click "Stop Meeting" to end the session
- View final meeting summary and analytics
- Export conversation logs (if implemented)
GET /api/healthPOST /api/meeting/start # Start recording session
POST /api/meeting/stop # Stop recording session
GET /api/meeting/status # Get current meeting statusGET /api/meeting/conversation-insights # Get detailed analytics
GET /api/debug/audio # Debug audio system statusWS /api/ws # Real-time communication endpoint- Sample Rate: 16kHz (configurable in
server.py) - Channels: Mono (1 channel)
- Frame Size: 50ms processing windows
- VAD Sensitivity: Multiple aggressiveness levels
- Similarity Threshold: 0.45 (adjustable for speaker sensitivity)
- Max Speakers: 10 concurrent speakers
- Feature Vector: 200+ dimensional voice embeddings
- Model: Whisper base.en
- Language: English (configurable)
- Quality Filtering: Automatic low-quality segment removal
-
Enable debug logging
logging.basicConfig(level=logging.DEBUG)
-
Hot reload during development
uvicorn server:app --reload --host 0.0.0.0 --port 8001
-
Testing endpoints
curl http://localhost:8001/api/health
-
Auto-reload with live server
# Using VS Code Live Server extension # Or any development server with auto-reload
-
WebSocket testing
// Test WebSocket connection in browser console const ws = new WebSocket('ws://localhost:8001/api/ws'); ws.onmessage = (event) => console.log(JSON.parse(event.data));
Audio not detected
- Check microphone permissions in browser
- Verify microphone is working in other applications
- Check browser console for errors
No speakers detected
- Speak louder or closer to microphone
- Adjust similarity threshold in configuration
- Check VAD sensitivity settings
WebSocket connection fails
- Ensure backend is running on correct port
- Check firewall/proxy settings
- Verify WebSocket URL in frontend
Transcription quality issues
- Reduce background noise
- Ensure clear speech
- Check Whisper model installation
Use the debug endpoint to diagnose issues:
curl http://localhost:8001/api/debug/audiomeeting-assistant/
├── backend/
│ ├── server.py # Main FastAPI application
│ ├── requirements.txt # Python dependencies
│ └── .env.example # Environment variables template
├── frontend/
│ ├── index.html # Main HTML file
│ ├── app.js # Frontend JavaScript
│ ├── styles.css # Styling
│ └── assets/ # Static assets
└── README.md # This file
- CPU Usage: Moderate during active transcription
- Memory: ~200-500MB depending on meeting length
- Latency: ~200-500ms for transcription
- Browser Support: Chrome, Firefox, Safari, Edge (latest versions)
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
- OpenAI Whisper for speech recognition
- FastAPI for the excellent web framework
- The open-source audio processing community