Skip to content

codewithyug06/Deepfake-detection-pipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

🚀 WYT-Net: Wavelet-YOLO-Transformer Network

Lightweight Hybrid YOLOv8 Framework for Real-Time Deepfake Detection on Edge Devices


📖 Overview

WYT-Net (Wavelet-YOLO-Transformer Network) is a lightweight, frequency-aware deep learning architecture designed for real-time deepfake detection on edge devices.

Unlike heavy Vision Transformers or deep CNN-based forensic models, WYT-Net:

  • Integrates Discrete Wavelet Transform (DWT) for frequency-domain artifact detection.
  • Uses SimAM parameter-free attention.
  • Incorporates FastViT-inspired hybrid CNN-Transformer blocks.
  • Maintains a highly efficient footprint of < 2.6 Million parameters.
  • Runs efficiently on edge hardware like Raspberry Pi 4 and NVIDIA Jetson Nano.

The framework achieves an optimal balance between accuracy, efficiency, and deployability.


⚠️ Problem Statement

The rapid rise of high-fidelity deepfake media threatens digital integrity, journalism, biometric systems, and public trust.

Existing Limitations:

  • Heavy Models: Architectures like ResNet, ViT, and EfficientNet are computationally expensive.
  • Hardware Dependency: High reliance on server-grade GPUs for inference.
  • Edge Unfriendly: Poor feasibility for deployment on mobile or IoT devices.
  • Spatial Bias: Limited frequency-domain awareness allows subtle generative artifacts to go unnoticed.

Key Insight:

GANs and Diffusion models leave high-frequency artifacts that spatial CNNs struggle to capture. WYT-Net addresses this using wavelet-domain feature engineering combined with a lightweight hybrid architecture.


🎯 Objectives

  • Frequency-Domain Artifact Detection: Achieve this via Wavelet Decomposition.
  • Architectural Efficiency: Design a hybrid CNN-Transformer backbone under 2.6M parameters.
  • High Accuracy: Surpass 94% classification accuracy on challenging datasets.
  • Edge Deployment: Guarantee real-time performance on edge devices.
  • Robust Generalization: Ensure resilience against real-world artifacts using comprehensive data augmentation.

🛠️ Methodology

1️⃣ Feature Engineering – 4-Channel Wavelet Input

Instead of standard RGB, WYT-Net uses Daubechies-2 (db2) Wavelet Decomposition:

Channel Description
LL Approximation (Global Spatial Features)
LH Horizontal Details (High-Frequency)
HL Vertical Details (High-Frequency)
HH Diagonal Details (High-Frequency)

This explicit separation exposes forgery-induced noise patterns that are otherwise invisible in the RGB spectrum.

2️⃣ Proposed Architecture – WYT-Net

Modified from the YOLOv8n backbone using architectural "surgery":

  • Split Backbone ➡️ Multi-scale feature extraction.
  • SimAM Attention ➡️ Parameter-free 3D attention refinement for localized texture traces.
  • FastViT Block ➡️ Structural reparameterization + token mixing for global structural consistency.
  • Feature Fusion ➡️ Dual-stream Global Average Pooling (GAP) concatenation.
  • Final MLP Head ➡️ Optimized binary classification.

📊 Experimental Results

🔹 Performance Comparison

Metric Baseline YOLOv8n WYT-Net (Proposed)
Accuracy 91.45% 94.44%
mAP@0.5 0.8521 0.9371
F1-Score (Fake) 0.7145 0.8022
Parameters 2.04 M 2.57 M
GFLOPs 0.2498 0.2841

🔹 Detailed Classification Report (WYT-Net)

Class Precision Recall F1-Score Support
Real 0.9735 0.9619 0.9677 23,834
Fake 0.7738 0.8328 0.8022 3,731
Average / Total 0.9465 0.9444 0.9453 27,565

🚀 Edge Deployment Results

🔹 Raspberry Pi 4 (NCNN - FP16)

  • Inference Engine: Tencent NCNN
  • Latency: ~124.5 ms per frame
  • FPS: ~8 FPS
  • Optimization: Multi-threading (4 cores) + ARM Neon SIMD

🔹 NVIDIA Jetson Nano (TensorRT)

Precision Latency
FP32 28.5 ms
FP16 14.2 ms
INT8 8.1 ms
  • Peak Power Consumption: ~3.1W
  • Performance: Near real-time inference achieved natively on edge hardware.

📂 Repository Structure

.
├── Models/
│   ├── yolov8_ai_augmented.py   # Main WYT-Net implementation
│   ├── yolov8_hybrid_ai.py      # Hybrid backbone variant
│   ├── augment_dataset.py       # Data augmentation pipeline
│   └── extract_faces.py         # Face detection & DWT preprocessing
├── Results/
│   ├── results_journal_Yolov8_Ai_augmented/ # Proposed model metrics & plots
│   ├── results_hybrid_no_ai_augmented/      # Ablation study metrics
│   └── results_baseline/                    # Reference benchmark metrics
├── test_model.py                # Inference and validation script
└── requirements.txt             # Project dependencies

⚡ Quick Start

🔹 Installation

git clone [https://github.com/codewithyug06/deepfake-detection-pipelines.git](https://github.com/codewithyug06/deepfake-detection-pipelines.git)
cd deepfake-detection-pipelines
pip install -r requirements.txt

📦 Dataset

The models were trained and evaluated on data derived from the Celeb-DF (v2) dataset.

  • Original Dataset: 27,565 images
  • Augmented Dataset: 3× expansion resulting in a more balanced and robust training set

🔹 Augmentations Applied

  • Random Rotation (±15°)
  • Horizontal Flip
  • Color Jittering

🔬 Ablation Study

Configuration Accuracy
Baseline YOLOv8n 91.45%
+ Wavelet Input 92.81%
+ SimAM Attention 93.56%
+ FastViT (WYT-Net) 94.44%

Key Takeaway:
Wavelet-domain information, when combined with hybrid global reasoning via FastViT, significantly improves deepfake detection reliability.


🧠 Key Contributions

  • First of Its Kind: Lightweight YOLO-based frequency-aware deepfake detector explicitly optimized for edge computing
  • Extreme Efficiency: Highly pruned hybrid CNN-Transformer architecture with fewer than 2.6M parameters
  • Edge-Ready: Proven real-time inference on Raspberry Pi 4 and NVIDIA Jetson Nano
  • Optimal Tradeoff: Superior balance between detection accuracy and computational complexity compared to state-of-the-art forensic models

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages