This repository contains the implementation of our CS607 course project, "Diffusion-based Adversarial Purification over Latent Embeddings" — a novel method for adversarial defense that leverages diffusion models in the latent space.
We use a Pix2Pix-based encoder-decoder architecture to project images into a compact latent space, perform diffusion-based purification to remove adversarial perturbations, and reconstruct clean images for robust classification.
Evaluated on the ImageNet dataset using a ResNet-50 classifier under PGD and FGSM attacks (((\epsilon = 8/255, 16/255))), our approach significantly boosts robust accuracy compared to unpurified adversarial images.
Authors:
Bhavik Shangari (12240410), Uday Bhardwaj (12241910), Vedant Marodkar (12240990)
Date: April 27, 2025
Course: CS607 - Adversarial Machine Learning
Repository: GitHub Link
Adversarial attacks introduce small, often imperceptible perturbations to images, leading deep neural networks to make incorrect predictions. Traditional defenses like adversarial training are attack-specific and computationally expensive.
We propose an alternative: adversarial purification using latent diffusion — a process that removes adversarial noise before classification.
Our method diffuses adversarial noise directly over latent embeddings (not raw images), preserving semantic content while being computationally efficient.
- Latent Diffusion: Purification is done in a 512-dimensional latent space, reducing computational overhead.
- Pix2Pix Encoder-Decoder: Skip connections ensure that semantic features are preserved during purification.
- Robustness: Achieves robust accuracies of:
- 43.4% on PGD attacks ((\epsilon = 16/255))
- 41.3% on FGSM attacks ((\epsilon = 16/255))
(Compared to 4.7% and 22.1% respectively for unpurified adversarial images.)
.png)
The purification pipeline consists of three major components:
-
Encoder
Maps (64\times64\times3) images to 512-dimensional latent embeddings using a convolutional network with LeakyReLU activations and batch normalization. -
Diffusion Model
Applies controlled noise to the latent space and denoises it using a feed-forward neural network conditioned on timesteps (DDPM-style scheduling). -
Decoder
Reconstructs purified (64\times64\times3) images from latent embeddings using a deconvolutional network with skip connections and ReLU activations.
Illustration:
Images → Latent Embeddings → Diffusion Purification → Reconstructed Images → Classification
Download model_epoch_resnet50_epoch_30.pth and model_epoch_50.pth
.
├── create_adv_examples.ipynb # Generate adversarial examples (PGD, FGSM)
├── DiffAE.ipynb # Train and evaluate purification pipeline
├── model_epoch_resnet50_epoch_30.pth # Pretrained ResNet-50 checkpoint
├── outputs/
│ ├── pipeline_checkpoints/ # Saved model checkpoints
│ │ ├── model_epoch_50.pth
│ ├── pipeline_plots/ # Plots during training (optional)
│ └── pipeline_samples/ # Sample images (training & validation)
├── README.md # This file
├── train_resnet.py # Train ResNet-50 classifier Make sure the following are installed:
- Python 3.8+
- PyTorch 1.9+
- torchvision
- NumPy
- Pillow
- Jupyter Notebook
- tqdm
Install them via:
pip install torch torchvision numpy pillow jupyter tqdmpython train_resnet.py- Outputs:
model_epoch_resnet50_epoch_30.pth
- Open the notebook:
jupyter notebook create_adv_examples.ipynb- Configure:
- Attack type: PGD or FGSM
- Epsilon values: ((16/255), (16/255))
- Checkpoint:
model_epoch_resnet50_epoch_30.pth
- Run to generate adversarial examples for a 512-image subset.
- Open:
jupyter notebook DiffAE.ipynb-
Set parameters:
- Epochs:
26 - Learning rate:
2e-4 - Diffusion timestep (t):
- (t = 0.1) for PGD
- (t = 0.075) for FGSM
- Epochs:
-
Outputs are saved in
outputs/pipeline_samples/every 10 epochs.
| Attack | Settings | Standard Acc | Adversarial Acc | Purified Acc |
|---|---|---|---|---|
| PGD | (\epsilon = 16/255, \alpha=4/255) | 62.5% | 4.7% | 43.4% |
| FGSM | (\epsilon = 16/255) | 62.5% | 22.1% | 41.3% |
- Raw Images: 62.5% standard accuracy
- Adversarial Images: Accuracy drops to 4.7% (PGD) and 22.1% (FGSM)
- Purified Images: Accuracy restored to 43.4% (PGD) and 41.3% (FGSM)
- Train ResNet-50 via
train_resnet.py. - Generate adversarial samples via
create_adv_examples.ipynb. - Run the purification pipeline via
DiffAE.ipynb. - Report accuracies following the same evaluation setup.
We thank the CS607 course instructors for their guidance throughout the project.
Special thanks to the authors of DiffPure for their foundational work on diffusion-based adversarial purification.
This project was developed as part of the Adversarial Machine Learning course at IIT Bhilai.
