🎥 NVIDIA Video Classification Project

A deep learning–based multi-class video classification system built end-to-end to understand both spatial and temporal patterns in video data.
This project was developed as Project-1 (Industry-Sponsored by NVIDIA) and trained on NVIDIA GPU servers.

📌 Overview

This system classifies videos into four content categories:

Animation
Gaming
Natural Content
Flat Content

Unlike image classification, video understanding requires modeling motion, temporal dependencies, and long-range context.
To address this, we designed a two-stage architecture combining CNNs, Bi-LSTMs, and self-attention.

🧠 Key Highlights

~93–95% classification accuracy on the test set
End-to-end system built from scratch (no plug-and-play repositories)
CNN + Bi-LSTM + Multi-Head Attention architecture
Model ensembling and Test-Time Augmentation (TTA) for robustness
Trained on NVIDIA A100 GPU (MIG partition)

🏗️ System Architecture

Stage 1: Spatial Feature Extraction

Pretrained CNN backbones:
- ResNet-50 / ResNet-101
- EfficientNet-V2-S / EfficientNet-V2-M
Extracts frame-level semantic features
Feature dimension: 1280

Stage 2: Temporal Modeling

Input projection with Layer Normalization
4-layer Bidirectional LSTM for temporal sequence learning
Multi-Head Self-Attention to focus on informative video segments
Attention pooling for sequence aggregation

📊 Dataset

Source: YouTube-8M
Total videos: ~4,000
Categories: 4 main classes with 46 subcategories
Split:
- Train: 70%
- Validation: 20%
- Test: 10%

⚙️ Training & Optimization

Framework: PyTorch
Loss Function: Focal Loss with Label Smoothing
Optimizer: AdamW
Learning Rate Scheduler: Cosine Annealing with Warm Restarts
Regularization:
- Dropout
- Gradient Clipping
- Weight Decay

🧪 Performance

Test Accuracy: ~93% (95% with TTA)
Balanced class-wise F1 scores (>95% with ensemble + TTA)
Robust performance across visually diverse categories

🚀 Deployment

Backend: Flask
Frontend: HTML, CSS (Tailwind), JavaScript
Inference supports:
- Single-video classification
- Ensemble inference
- Optional Test-Time Augmentation
Fast inference using pre-extracted features

🖥️ Hardware & Software

Hardware

GPU: NVIDIA A100 (MIG – 9.8 GB VRAM)
CPU: Intel Xeon Gold
RAM: 251 GB

Software

PyTorch, Torchvision
OpenCV
NumPy, Pandas
CUDA 11.8, cuDNN

👥 Team

Manas Kulkarni
Samiksha Nalawade
Rajlakshmi Desai

Faculty Guide: Dr. Shripad Bhatlawande

🎯 Conclusion

This project demonstrates a production-ready video classification pipeline that effectively combines deep learning research, large-scale data processing, and real-world deployment considerations. It highlights the challenges and solutions involved in moving from image understanding to full-fledged video intelligence.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Flask Local New		Flask Local New
video_classification_project		video_classification_project
.gitignore		.gitignore
README.md		README.md
configuration_analysis.json		configuration_analysis.json
create_structure.py		create_structure.py
folder_structure.txt		folder_structure.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎥 NVIDIA Video Classification Project

📌 Overview

🧠 Key Highlights

🏗️ System Architecture

Stage 1: Spatial Feature Extraction

Stage 2: Temporal Modeling

📊 Dataset

⚙️ Training & Optimization

🧪 Performance

🚀 Deployment

🖥️ Hardware & Software

Hardware

Software

👥 Team

🎯 Conclusion

About

Uh oh!

Contributors 3

Uh oh!

Languages

itsmanask/NVIDIA-Video-Classification-Project

Folders and files

Latest commit

History

Repository files navigation

🎥 NVIDIA Video Classification Project

📌 Overview

🧠 Key Highlights

🏗️ System Architecture

Stage 1: Spatial Feature Extraction

Stage 2: Temporal Modeling

📊 Dataset

⚙️ Training & Optimization

🧪 Performance

🚀 Deployment

🖥️ Hardware & Software

Hardware

Software

👥 Team

🎯 Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 3

Uh oh!

Languages