🚀 WYT-Net: Wavelet-YOLO-Transformer Network

Lightweight Hybrid YOLOv8 Framework for Real-Time Deepfake Detection on Edge Devices

📖 Overview

WYT-Net (Wavelet-YOLO-Transformer Network) is a lightweight, frequency-aware deep learning architecture designed for real-time deepfake detection on edge devices.

Unlike heavy Vision Transformers or deep CNN-based forensic models, WYT-Net:

Integrates Discrete Wavelet Transform (DWT) for frequency-domain artifact detection.
Uses SimAM parameter-free attention.
Incorporates FastViT-inspired hybrid CNN-Transformer blocks.
Maintains a highly efficient footprint of < 2.6 Million parameters.
Runs efficiently on edge hardware like Raspberry Pi 4 and NVIDIA Jetson Nano.

The framework achieves an optimal balance between accuracy, efficiency, and deployability.

⚠️ Problem Statement

The rapid rise of high-fidelity deepfake media threatens digital integrity, journalism, biometric systems, and public trust.

Existing Limitations:

Heavy Models: Architectures like ResNet, ViT, and EfficientNet are computationally expensive.
Hardware Dependency: High reliance on server-grade GPUs for inference.
Edge Unfriendly: Poor feasibility for deployment on mobile or IoT devices.
Spatial Bias: Limited frequency-domain awareness allows subtle generative artifacts to go unnoticed.

Key Insight:

GANs and Diffusion models leave high-frequency artifacts that spatial CNNs struggle to capture. WYT-Net addresses this using wavelet-domain feature engineering combined with a lightweight hybrid architecture.

🎯 Objectives

✅ Frequency-Domain Artifact Detection: Achieve this via Wavelet Decomposition.
✅ Architectural Efficiency: Design a hybrid CNN-Transformer backbone under 2.6M parameters.
✅ High Accuracy: Surpass 94% classification accuracy on challenging datasets.
✅ Edge Deployment: Guarantee real-time performance on edge devices.
✅ Robust Generalization: Ensure resilience against real-world artifacts using comprehensive data augmentation.

🛠️ Methodology

1️⃣ Feature Engineering – 4-Channel Wavelet Input

Instead of standard RGB, WYT-Net uses Daubechies-2 (db2) Wavelet Decomposition:

Channel	Description
LL	Approximation (Global Spatial Features)
LH	Horizontal Details (High-Frequency)
HL	Vertical Details (High-Frequency)
HH	Diagonal Details (High-Frequency)

This explicit separation exposes forgery-induced noise patterns that are otherwise invisible in the RGB spectrum.

2️⃣ Proposed Architecture – WYT-Net

Modified from the YOLOv8n backbone using architectural "surgery":

Split Backbone ➡️ Multi-scale feature extraction.
SimAM Attention ➡️ Parameter-free 3D attention refinement for localized texture traces.
FastViT Block ➡️ Structural reparameterization + token mixing for global structural consistency.
Feature Fusion ➡️ Dual-stream Global Average Pooling (GAP) concatenation.
Final MLP Head ➡️ Optimized binary classification.

📊 Experimental Results

🔹 Performance Comparison

Metric	Baseline YOLOv8n	WYT-Net (Proposed)
Accuracy	91.45%	94.44%
mAP@0.5	0.8521	0.9371
F1-Score (Fake)	0.7145	0.8022
Parameters	2.04 M	2.57 M
GFLOPs	0.2498	0.2841

🔹 Detailed Classification Report (WYT-Net)

Class	Precision	Recall	F1-Score	Support
Real	0.9735	0.9619	0.9677	23,834
Fake	0.7738	0.8328	0.8022	3,731
Average / Total	0.9465	0.9444	0.9453	27,565

🚀 Edge Deployment Results

🔹 Raspberry Pi 4 (NCNN - FP16)

Inference Engine: Tencent NCNN
Latency: ~124.5 ms per frame
FPS: ~8 FPS
Optimization: Multi-threading (4 cores) + ARM Neon SIMD

🔹 NVIDIA Jetson Nano (TensorRT)

Precision	Latency
FP32	28.5 ms
FP16	14.2 ms
INT8	8.1 ms

Peak Power Consumption: ~3.1W
Performance: Near real-time inference achieved natively on edge hardware.

📂 Repository Structure

.
├── Models/
│   ├── yolov8_ai_augmented.py   # Main WYT-Net implementation
│   ├── yolov8_hybrid_ai.py      # Hybrid backbone variant
│   ├── augment_dataset.py       # Data augmentation pipeline
│   └── extract_faces.py         # Face detection & DWT preprocessing
├── Results/
│   ├── results_journal_Yolov8_Ai_augmented/ # Proposed model metrics & plots
│   ├── results_hybrid_no_ai_augmented/      # Ablation study metrics
│   └── results_baseline/                    # Reference benchmark metrics
├── test_model.py                # Inference and validation script
└── requirements.txt             # Project dependencies

⚡ Quick Start

🔹 Installation

git clone [https://github.com/codewithyug06/deepfake-detection-pipelines.git](https://github.com/codewithyug06/deepfake-detection-pipelines.git)
cd deepfake-detection-pipelines
pip install -r requirements.txt

📦 Dataset

The models were trained and evaluated on data derived from the Celeb-DF (v2) dataset.

Original Dataset: 27,565 images
Augmented Dataset: 3× expansion resulting in a more balanced and robust training set

🔹 Augmentations Applied

Random Rotation (±15°)
Horizontal Flip
Color Jittering

🔬 Ablation Study

Configuration	Accuracy
Baseline YOLOv8n	91.45%
+ Wavelet Input	92.81%
+ SimAM Attention	93.56%
+ FastViT (WYT-Net)	94.44%

Key Takeaway:
Wavelet-domain information, when combined with hybrid global reasoning via FastViT, significantly improves deepfake detection reliability.

🧠 Key Contributions

First of Its Kind: Lightweight YOLO-based frequency-aware deepfake detector explicitly optimized for edge computing
Extreme Efficiency: Highly pruned hybrid CNN-Transformer architecture with fewer than 2.6M parameters
Edge-Ready: Proven real-time inference on Raspberry Pi 4 and NVIDIA Jetson Nano
Optimal Tradeoff: Superior balance between detection accuracy and computational complexity compared to state-of-the-art forensic models

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Models		Models
Results		Results
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 WYT-Net: Wavelet-YOLO-Transformer Network

Lightweight Hybrid YOLOv8 Framework for Real-Time Deepfake Detection on Edge Devices

📖 Overview

⚠️ Problem Statement

Existing Limitations:

Key Insight:

🎯 Objectives

🛠️ Methodology

1️⃣ Feature Engineering – 4-Channel Wavelet Input

2️⃣ Proposed Architecture – WYT-Net

📊 Experimental Results

🔹 Performance Comparison

🔹 Detailed Classification Report (WYT-Net)

🚀 Edge Deployment Results

🔹 Raspberry Pi 4 (NCNN - FP16)

🔹 NVIDIA Jetson Nano (TensorRT)

📂 Repository Structure

⚡ Quick Start

🔹 Installation

📦 Dataset

🔹 Augmentations Applied

🔬 Ablation Study

🧠 Key Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 WYT-Net: Wavelet-YOLO-Transformer Network

Lightweight Hybrid YOLOv8 Framework for Real-Time Deepfake Detection on Edge Devices

📖 Overview

⚠️ Problem Statement

Existing Limitations:

Key Insight:

🎯 Objectives

🛠️ Methodology

1️⃣ Feature Engineering – 4-Channel Wavelet Input

2️⃣ Proposed Architecture – WYT-Net

📊 Experimental Results

🔹 Performance Comparison

🔹 Detailed Classification Report (WYT-Net)

🚀 Edge Deployment Results

🔹 Raspberry Pi 4 (NCNN - FP16)

🔹 NVIDIA Jetson Nano (TensorRT)

📂 Repository Structure

⚡ Quick Start

🔹 Installation

📦 Dataset

🔹 Augmentations Applied

🔬 Ablation Study

🧠 Key Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages