Aura Note

Enhanced Real-Time Meeting Assistant

A powerful real-time meeting transcription and speaker identification system built with Python (FastAPI backend) and JavaScript (frontend). The system provides live transcription, multi-speaker detection, conversation analytics, and real-time meeting insights.

Features

Real-Time Transcription: Live speech-to-text using Whisper AI
Multi-Speaker Detection: Advanced voice embeddings for speaker identification
Voice Activity Detection: Multiple VAD methods for accurate speech detection
Live Analytics: Real-time speaking time, word count, and participation metrics
WebSocket Communication: Live updates without page refreshes
Conversation Insights: Detailed meeting analytics and statistics
Robust Audio Processing: Enhanced noise handling and audio preprocessing

Tech Stack

Backend

FastAPI: Modern Python web framework
Whisper: OpenAI's speech recognition model
librosa: Audio feature extraction
sounddevice: Real-time audio capture
webrtcvad: Voice activity detection
numpy/scipy: Scientific computing
asyncio: Asynchronous processing

Frontend

Vanilla JavaScript: No framework dependencies
WebSocket API: Real-time communication
Responsive Design: Works on desktop and mobile
Chart.js: Data visualization for meeting analytics

Prerequisites

Python 3.8 or higher
Node.js (optional, for development server)
Microphone access
Modern web browser with WebSocket support

Installation

Backend Setup

Clone the repository

git clone https://github.com/yourusername/meeting-assistant.git
cd meeting-assistant

Create and activate virtual environment

# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

Install Python dependencies

cd backend
pip install -r requirements.txt

Create environment file (optional)

cp .env.example .env
# Edit .env with your configurations if needed

Run the backend server
```
python server.py
```
Or using uvicorn directly:
```
uvicorn server:app --host 0.0.0.0 --port 8001 --reload
```
The backend API will be available at http://localhost:8001

Frontend Setup

Navigate to frontend directory
```
cd frontend
```

Serve the frontend

Option A: Using Python's built-in server

python -m http.server 8000

Option B: Using Node.js (if installed)

npx serve -s . -p 8000

Option C: Using any static file server

# Example with nginx, apache, or any static server
# Point document root to the frontend directory

Access the application Open your browser and navigate to http://localhost:8000

Usage

Starting a Meeting

Open the web application in your browser
Grant microphone permissions when prompted
Click "Start Meeting" to begin recording
The system will automatically detect speakers and transcribe speech
View real-time statistics and transcriptions on the dashboard

During the Meeting

Live Transcription: See transcriptions appear in real-time
Speaker Detection: Different speakers are automatically identified
Speaking Time: Monitor who's talking and for how long
Voice Activity: Visual indicators show when speech is detected

After the Meeting

Click "Stop Meeting" to end the session
View final meeting summary and analytics
Export conversation logs (if implemented)

API Documentation

Health Check

GET /api/health

Meeting Control

POST /api/meeting/start    # Start recording session
POST /api/meeting/stop     # Stop recording session
GET /api/meeting/status    # Get current meeting status

Analytics

GET /api/meeting/conversation-insights    # Get detailed analytics
GET /api/debug/audio                     # Debug audio system status

WebSocket Connection

WS /api/ws    # Real-time communication endpoint

Configuration

Audio Settings

Sample Rate: 16kHz (configurable in server.py)
Channels: Mono (1 channel)
Frame Size: 50ms processing windows
VAD Sensitivity: Multiple aggressiveness levels

Speaker Detection

Similarity Threshold: 0.45 (adjustable for speaker sensitivity)
Max Speakers: 10 concurrent speakers
Feature Vector: 200+ dimensional voice embeddings

Transcription

Model: Whisper base.en
Language: English (configurable)
Quality Filtering: Automatic low-quality segment removal

Development

Backend Development

Enable debug logging

logging.basicConfig(level=logging.DEBUG)

Hot reload during development

uvicorn server:app --reload --host 0.0.0.0 --port 8001

Testing endpoints
```
curl http://localhost:8001/api/health
```

Frontend Development

Auto-reload with live server

# Using VS Code Live Server extension
# Or any development server with auto-reload

WebSocket testing

// Test WebSocket connection in browser console
const ws = new WebSocket('ws://localhost:8001/api/ws');
ws.onmessage = (event) => console.log(JSON.parse(event.data));

Troubleshooting

Common Issues

Audio not detected

Check microphone permissions in browser
Verify microphone is working in other applications
Check browser console for errors

No speakers detected

Speak louder or closer to microphone
Adjust similarity threshold in configuration
Check VAD sensitivity settings

WebSocket connection fails

Ensure backend is running on correct port
Check firewall/proxy settings
Verify WebSocket URL in frontend

Transcription quality issues

Reduce background noise
Ensure clear speech
Check Whisper model installation

Debug Endpoints

Use the debug endpoint to diagnose issues:

curl http://localhost:8001/api/debug/audio

File Structure

meeting-assistant/
├── backend/
│   ├── server.py           # Main FastAPI application
│   ├── requirements.txt    # Python dependencies
│   └── .env.example       # Environment variables template
├── frontend/
│   ├── index.html         # Main HTML file
│   ├── app.js            # Frontend JavaScript
│   ├── styles.css        # Styling
│   └── assets/           # Static assets
└── README.md             # This file

Performance Notes

CPU Usage: Moderate during active transcription
Memory: ~200-500MB depending on meeting length
Latency: ~200-500ms for transcription
Browser Support: Chrome, Firefox, Safari, Edge (latest versions)

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

Acknowledgments

OpenAI Whisper for speech recognition
FastAPI for the excellent web framework
The open-source audio processing community

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
frontend		frontend
scripts		scripts
README.md		README.md
backend_test.py		backend_test.py
websocket_test.py		websocket_test.py

Folders and files

Latest commit

History

Repository files navigation

Aura Note

Enhanced Real-Time Meeting Assistant

Features

Tech Stack

Backend

Frontend

Prerequisites

Installation

Backend Setup

Frontend Setup

Usage

Starting a Meeting

During the Meeting

After the Meeting

API Documentation

Health Check

Meeting Control

Analytics

WebSocket Connection

Configuration

Audio Settings

Speaker Detection

Transcription

Development

Backend Development

Frontend Development

Troubleshooting

Common Issues

Debug Endpoints

File Structure

Performance Notes

Contributing

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages