A deep learning project that detects and classifies facial emotions in real-time using a custom Convolutional Neural Network (CNN) and YuNet face detector.
- Overview
- Features
- Detected Emotions
- Project Structure
- Model Architecture
- Installation
- Usage
- Dataset
- Performance
- Requirements
- Configuration
- Limitations
This project implements a complete pipeline for facial emotion recognition:
- Training: Train a custom CNN on labeled emotion datasets
- Face Detection: Use YuNet (OpenCV DNN) for robust face detection
- Real-Time Inference: Classify emotions from webcam feed
- Custom CNN architecture with 4 convolutional blocks
- Real-time emotion detection using webcam
- YuNet face detector for accurate face localization
- CUDA acceleration support
- Training visualization (loss, accuracy curves)
- Data augmentation for better generalization
- Model checkpointing with best validation accuracy
The model recognizes 7 different emotions:
| Emotion | Description |
|---|---|
| Angry | Anger, frustration |
| Disgust | Disgust, distaste |
| Fear | Fear, anxiety |
| Happy | Happiness, joy |
| Neutral | Neutral expression |
| Sad | Sadness, sorrow |
| Surprise | Surprise, shock |
face-emotion-recognition/
│
├── model/
│ ├── __init__.py
│ ├── face_emotion_recognition_cnn.py # CNN architecture
│ └── utils.py # Helper functions (load, preprocess)
│
├── checkpoints/
│ ├── emotion_recognition.pth # Trained emotion model weights
│ └── yunet.onnx # YuNet face detection model
│
├── dataset/
│ └── images/
│ ├── train/ # Training images (80%)
│ │ ├── angry/
│ │ ├── disgust/
│ │ ├── fear/
│ │ ├── happy/
│ │ ├── neutral/
│ │ ├── sad/
│ │ └── surprise/
│ └── validation/ # Validation images (20%)
│ ├── angry/
│ └── ...
│
├── emotion_recognition_cnn_pytorch.ipynb # Training notebook
├── main.py # Real-time inference script
├── requirements.txt # Python dependencies
└── Readme.md # Project documentation
The model consists of:
Convolutional Blocks (4 blocks):
- Block 1: Conv2D(1→64) + BatchNorm + ReLU + MaxPool + Dropout(0.25)
- Block 2: Conv2D(64→128) + BatchNorm + ReLU + MaxPool + Dropout(0.25)
- Block 3: Conv2D(128→512) + BatchNorm + ReLU + MaxPool + Dropout(0.25)
- Block 4: Conv2D(512→512) + BatchNorm + ReLU + MaxPool + Dropout(0.25)
Classifier (3 fully connected layers):
- FC1: 512×3×3 → 256 + BatchNorm + ReLU + Dropout(0.25)
- FC2: 256 → 512 + BatchNorm + ReLU + Dropout(0.25)
- FC3: 512 → 7 (output classes)
Input: Grayscale images of size 48×48
Output: 7-class probability distribution
Total Parameters: ~6.5M parameters
git clone https://github.com/yourusername/face-emotion-recognition.git
cd face-emotion-recognitionpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtYuNet Face Detector:
- Download
face_detection_yunet_2023mar.onnxfrom OpenCV Zoo - Rename it to
yunet.onnx - Place it in
checkpoints/directory
Trained Emotion Model:
- Train your own model using the notebook (see Training)
- Or download pre-trained weights
- Place
emotion_recognition.pthincheckpoints/directory
-
Prepare Dataset:
- Organize images into folders by emotion class
- Split into
train/andvalidation/directories - Each subdirectory should be named after the emotion class
-
Open Training Notebook:
jupyter notebook emotion_recognition_cnn_pytorch.ipynb
-
Configure Training Parameters:
BATCH_SIZE = 128 EPOCHS = 60 LEARNING_RATE = 0.001 IMG_SIZE = (48, 48)
-
Run All Cells:
- Data loading and visualization
- Model definition
- Training loop with validation
- Performance visualization (loss/accuracy curves)
- Model checkpoint saving
-
Monitor Training:
- Training and validation loss
- Training and validation accuracy
- Best model saved automatically
Run the main script to start real-time emotion detection:
python main.pyControls:
- Press
qto quit the application
What happens:
- Opens your webcam
- Detects faces using YuNet
- Classifies emotion for each detected face
- Displays bounding boxes with emotion labels
The project uses a dataset organized in the following structure:
dataset/images/
├── train/ # 80% of total data
│ ├── angry/
│ ├── disgust/
│ ├── fear/
│ ├── happy/
│ ├── neutral/
│ ├── sad/
│ └── surprise/
└── validation/ # 20% of total data
└── (same structure)
Recommended Datasets:
Data Augmentation (Training):
- Random horizontal flip
- Random rotation (±12°)
- Grayscale conversion
- Normalization (mean=0.5, std=0.5)
| Parameter | Value |
|---|---|
| Batch Size | 128 |
| Epochs | 60 |
| Optimizer | Adam |
| Learning Rate | 0.001 |
| Scheduler | ReduceLROnPlateau |
| Loss Function | CrossEntropyLoss |
- Minimum: CPU with 4GB RAM
- Recommended: NVIDIA GPU with CUDA support (8GB+ VRAM)
- Inference Speed:
- GPU (CUDA): ~30-60 FPS
torch>=2.0.0
torchvision>=0.15.0
opencv-python>=4.8.0
tqdm>=4.65.0
numpy>=1.24.0
matplotlib>=3.7.0
scikit-learn>=1.3.0
Python Version: 3.8 or higher
# Image size (must match training)
IMG_SIZE = (48, 48)
# Device selection
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Class names (order must match training dataset)
CLASS_NAMES = ["angry", "disgust", "fear", "happy", "neutral", "sad", "surprise"]
# YuNet parameters
score_threshold = 0.6 # Face detection confidence threshold
nms_threshold = 0.3 # Non-maximum suppression thresholdBATCH_SIZE = 128 # Adjust based on GPU memory
EPOCHS = 60 # Number of training epochs
LEARNING_RATE = 0.001 # Initial learning rate
SEED = 42 # Random seed for reproducibility- Face Visibility: Requires clearly visible faces (frontal view works best)
- Lighting: Performance degrades in poor lighting conditions
- Occlusion: Partially occluded faces may produce incorrect predictions
- Distance: Faces too far from camera may not be detected
Third-Party Models:
- YuNet: OpenCV Zoo - License
- OpenCV team for YuNet face detector
- PyTorch team for the deep learning framework
- Dataset contributors for facial emotion datasets
For questions or feedback, please open an issue on GitHub.
If you find this project helpful, please consider giving it a star!