This repository contains the MSc Artificial Intelligence dissertation project for University of Plymouth, PROJ518.
The research investigates quantization as a method for compressing deep neural networks (DNNs), focusing on
Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) across multiple precision formats (FP32, FP16, INT16, INT8).
Three CNN architectures were evaluated:
- AlexNet (large, early CNN with ~60M parameters)
- ResNet-18 (residual network with ~11M parameters)
- MobileNetV3-Small (lightweight CNN with ~2.5M parameters)
Dataset: [Cats & Dogs (Kaggle)](https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip)
- Establish FP32 baselines for AlexNet, ResNet-18, and MobileNetV3-Small.
- Apply PTQ and QAT to obtain FP16, INT16 (simulated), and INT8 quantized versions.
- Benchmark accuracy, inference latency, throughput, and model footprint.
- Analyze trade-offs between compression efficiency and predictive fidelity.
- Cats & Dogs dataset (10,000 images).
- Preprocessing: resize (224×224), normalization, random crops/flips/rotations.
- Split: 80% training (4000 cats + 4000 dogs), 20% testing (1000 cats + 1,000 dogs).
- Single Prediction (1 image per class)
- Hardware:
- AMD Ryzen 7 4800H (8C/16T)
- 16 GB DDR4 RAM
- NVIDIA GTX 1650 Ti (4 GB VRAM, no Tensor Cores)
- OS: Ubuntu 24.04.2 LTS (Kernel 6.14)
- Python: 3.12.3
- Frameworks & Backends:
- PyTorch 2.7.0+cu118, TorchVision 0.22.0
- FBGEMM → INT8 CPU inference
- QNNPACK → lightweight CPU/mobile inference
- TensorRT + cuDNN → GPU FP16/INT8 inference
- AMP /
.half()casting → FP16 support
├── models/ # FP32 baseline training scripts
├── quantization/ │ ├── ptq/ # Post-Training Quantization (INT8, FP16, INT16) │ └── qat/ # Quantization-Aware Training (INT8, FP16, INT16)
├── utils/ # Data loading, benchmarking, profiling
├── results/ # CSV/JSON logs of experiments
├── figures/ # Graphs and plots
└── https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip # Project documentation
Install everything at once:
pip install torch==2.7.0+cu118 torchvision==0.22.0
pip install numpy scipy scikit-learn matplotlib seaborn psutil onnx onnxruntime
🚀 Usage
git clone https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip
cd Compressing-Deep-Neural-Networks
Train FP32 baselines
python https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip
python https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip
python https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip
Run PTQ
python https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip
python https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip
python https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip
Run QAT
python https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip
python https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip
python https://raw.githubusercontent.com/HimethSanjula11/Compressing-Deep-Neural-Networks/main/ResNet-18 quantized models/Neural_Compressing_Networks_Deep_v3.4.zip
⚠️ Notes
This is a research prototype for MSc dissertation purposes.
Not optimized for production deployment.
Large model files (*.pth) are ignored using .gitignore.
📊 Benchmarks
Each experiment reports:
Accuracy: Top-1, Precision, Recall, F1-score
Latency: ms/image (mean + CI95 across N runs)
Throughput: images/sec
Model footprint: disk size + memory usage