GitHub - abdullaharifx/LipReader: Transcribe mute videos by reading lips

🧪 Usage Notes & Tips

Video Quality Requirements:
- Clear frontal face with visible mouth region
- Consistent lighting with minimal shadows
- Limited head movement for best accuracy
- Default expecting video dimensions cropped to mouth region (46x140 pixels)
- Optimal frame rate: 25fps (model was trained at this rate)
Performance Considerations:
- GPU acceleration significantly improves processing speed
- For CPU-only environments, expect slower inference times
- Batch processing is more efficient for multiple videos
Limitations:
- Limited vocabulary to training dataset words
- English language only in current implementation
- Reduced accuracy with extreme facial angles or poor lighting

📚 Resources

This project builds upon research and implementations from:

LipNet: End-to-End Sentence-level Lipreading - Original research paper

📄 License

👥 Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an issue for bugs, questions, or feature requests.# LipNet Transcription Web Application A deep learning application that transcribes mute videos by reading the lips of people speaking in English using a custom LipNet architecture.

🧠 Overview

This application performs automatic lip-reading transcription on silent videos using a 3D CNN + Bidirectional LSTM architecture. The system processes video frames to recognize spoken words without audio, making it useful for accessibility, silent video understanding, and situations where audio is unavailable.

✨ Features

🎥 Upload and process silent .mpg video files
🤖 Advanced lip-reading with 3D CNN + Bidirectional LSTM architecture
📋 Character-level transcription with CTC loss optimization
🔄 Frame-by-frame video processing pipeline
⚠️ Robust error handling for various input scenarios

🛠️ Technologies & Requirements

Python 3.8+
Deep Learning Framework: TensorFlow 2.x
Computer Vision: OpenCV for video processing
Web Framework: Flask for deployment interface
Data Processing: NumPy for numerical operations
GPU Support: CUDA-compatible for faster inference

Additional dependencies:

matplotlib (visualization)
imageio (additional image processing)
gdown (for model downloading)

pip install -r requirements.txt

🔄 Data Pipeline & Preprocessing

The LipNet data pipeline includes several key preprocessing steps:

Video Frame Extraction:
- Extract frames at 25fps from .mpg videos
- Convert frames to grayscale for simplified processing
- Crop to mouth region (rows 190-236, columns 80-220)
- Apply mean and standard deviation normalization
Text Processing:
- Character-level tokenization with a vocabulary of lowercase letters, numbers, and special characters
- Convert between text and numerical representations using StringLookup layers
- Handle alignment files that map frames to phonetic sequences
Data Augmentation:
- Batch processing with padded sequences
- Dataset shuffling and prefetching for training efficiency

The pipeline uses TensorFlow's tf.data API for efficient data loading and preprocessing, with tf.py_function wrappers to integrate custom Python functions into the TensorFlow graph.

🧠 Model Architecture

The LipNet architecture combines spatial and temporal processing:

Input Video Frames → 3D CNNs → Bidirectional LSTMs → Dense Output → CTC Decoder

Detailed Architecture:

3D Convolutional Layers:
- Input shape: (75, 46, 140, 1) - 75 frames of 46x140 grayscale images
- Three convolutional blocks with increasing filter depth (128 → 256 → 75)
- Each block: Conv3D + ReLU + MaxPool3D
- Spatial downsampling through MaxPool3D operations
Recurrent Layers:
- Reshape output to sequence format (75 timesteps)
- Two Bidirectional LSTM layers (128 units each)
- Dropout (0.5) between LSTM layers for regularization
Output Layer:
- Dense layer with softmax activation
- Output size matches vocabulary plus blank character (for CTC)
Loss Function:
- Connectionist Temporal Classification (CTC) loss
- Handles variable-length sequence alignment without exact frame-level labels

🚀 Training Process

The model is trained with:

Adam optimizer with 0.0001 learning rate
Learning rate scheduling (exponential decay after 30 epochs)
Checkpoint saving after each epoch
Custom callback to monitor transcription quality during training

🚀 Installation & Usage

Clone this repository:

git clone https://github.com/abdullaharif381/LipReader.git
cd LipReader

Install dependencies:
```
pip install -r requirements.txt
```
Place your LipNet model in models folder
```
save the lipnet.keras model in /models
```
Run the Flask app:
```
flask run
```
Access the application: Open your browser and navigate to http://127.0.0.1:5000
Using the application:
- Upload a silent .mpg video file that clearly shows a person's lips
- The model processes the video frames and predicts the spoken text
- Results are displayed in the web interface

📁 Project Structure

LipReader/
│
├── app.py                        # Main Flask application
├── model/
│   ├── lipnet.keras        # your model goes here
├── templates/
│   └── index.html                # HTML front-end
├── static/
│   ├── css/
│   │   └── style.css             # Custom CSS styles
│   └── js/
│       └── script.js             # JavaScript functionality
├── notebooks/
│   └── syncvsr-preprocessing.ipynb
|   └── syncvsr-training.ipynb
|   └── lipnet-training.ipynb
└── requirements.txt              # Python dependencies

Team members:

Tahmooras Khan - Linkedin. Ibtehaj Ali - Github

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
models		models
notebooks		notebooks
static		static
temp_uploads		temp_uploads
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧪 Usage Notes & Tips

📚 Resources

📄 License

👥 Contributing

🧠 Overview

✨ Features

🛠️ Technologies & Requirements

🔄 Data Pipeline & Preprocessing

🧠 Model Architecture

Detailed Architecture:

🚀 Training Process

🚀 Installation & Usage

📁 Project Structure

Team members:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧪 Usage Notes & Tips

📚 Resources

📄 License

👥 Contributing

🧠 Overview

✨ Features

🛠️ Technologies & Requirements

🔄 Data Pipeline & Preprocessing

🧠 Model Architecture

Detailed Architecture:

🚀 Training Process

🚀 Installation & Usage

📁 Project Structure

Team members:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages