Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
ef59aba
Update train_classifier.py
aynzabdz Sep 14, 2023
3a5331d
Add files via upload
aynzabdz Sep 14, 2023
be5f23f
Create gitignore
aynzabdz Sep 14, 2023
73003e1
Add files via upload
aynzabdz Sep 14, 2023
edafea5
Update dataloader.py
aynzabdz Sep 14, 2023
e24b3bc
Update celeba.json
aynzabdz Sep 17, 2023
70051fe
Update classify.json
aynzabdz Sep 17, 2023
b7d1df9
Update classify.json
aynzabdz Sep 17, 2023
5ab7681
Update k+1_gan.py
aynzabdz Sep 17, 2023
8088c74
Update data_downloader.py
aynzabdz Sep 17, 2023
a49a96e
Update engine.py
aynzabdz Sep 17, 2023
facd40d
Update k+1_gan.py
aynzabdz Sep 17, 2023
b0235d3
Update engine.py
aynzabdz Sep 17, 2023
283138f
Update engine.py
aynzabdz Sep 17, 2023
98f5eb7
Update engine.py
aynzabdz Sep 17, 2023
56c9e76
Update k+1_gan.py
aynzabdz Sep 17, 2023
fc056a9
Update utils.py
aynzabdz Sep 17, 2023
047cc24
Update README.md
aynzabdz Sep 21, 2023
6215aff
Update README.md
aynzabdz Sep 21, 2023
eb3d635
Update data_downloader.py
aynzabdz Sep 21, 2023
c8395e7
Update data_downloader.py
aynzabdz Sep 21, 2023
82d93ed
Update README.md
aynzabdz Sep 21, 2023
aba9c5d
Update README.md
aynzabdz Sep 21, 2023
fb53e72
Add files via upload
aynzabdz Sep 22, 2023
99fdf40
Update data_downloader.py
aynzabdz Sep 22, 2023
79124ab
Update data_downloader.py
aynzabdz Sep 22, 2023
572991b
Update train_classifier.py
aynzabdz Sep 22, 2023
ba2bdfd
Update k+1_gan.py
aynzabdz Sep 22, 2023
199dcf2
Update recovery.py
aynzabdz Sep 23, 2023
e725c4b
Add files via upload
aynzabdz Sep 23, 2023
51f7ff9
Update calculate_FID.py
aynzabdz Sep 23, 2023
85fc1bc
Add files via upload
aynzabdz Sep 24, 2023
7abc6e7
Update README.md
aynzabdz Sep 24, 2023
cac5dd0
Update README.md
aynzabdz Sep 24, 2023
b50ff55
Update Knowledge Enriched DMI.ipynb
aynzabdz Sep 24, 2023
ad32102
Update README.md
aynzabdz Sep 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
573 changes: 573 additions & 0 deletions Knowledge Enriched DMI.ipynb

Large diffs are not rendered by default.

167 changes: 132 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,138 @@
# Knowledge-Enriched-Distributional-Model-Inversion-Attacks

This is a PyTorch implementation of our paper at ICCV2021:

**Knowledge Enriched Distributional Model Inversion Attacks** \[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Chen_Knowledge-Enriched_Distributional_Model_Inversion_Attacks_ICCV_2021_paper.pdf)\] \[[arxiv](https://arxiv.org/abs/2010.04092)\]

We propose a novel **'Inversion-Specific GAN'** that can better distill knowledge useful for performing attacks on private models from public data. Moreover, we propose to *model a private data distribution* for each target class which refers to **'Distributional Recovery'**.

## Requirement
This code has been tested with Python 3.6, PyTorch 1.0 and cuda 10.0.

## Getting Started
* Install required packages.
* Download relevant datasets including Celeba, MNIST, CIFAR10.
* Get target model prepared or run our code
`python train_classifier.py` <br>
Note that this code only provides three model architectures: VGG16, IR152, Facenet. And pretrained checkpoints for the three models can be downloaded at https://drive.google.com/drive/folders/1U4gekn72UX_n1pHdm9GQUQwwYVDvpTfN?usp=sharing.

## Build a inversion-specific GAN
* Modify the configuration in 'celeba.json'.
* Modify the target model path in 'k+1_gan.py' to your customized path.
* Run
`python k+1_gan.py`.
* Model checkpoints and generated image results are saved in folder ’improvedGAN‘.
* A general GAN can be obtained as a baseline by running
`python binary_gan.py`.
* Pretrained binary GAN and inversion-specific GAN can be downloaded at https://drive.google.com/drive/folders/1L3frX-CE4j36pe5vVWuy3SgKGS9kkA70?usp=sharing.


## Distributional Recovery
Run
`python recovery.py`

* `--model` chooses the target model to attack.
* `--improved_flag` indicates if an inversion-specfic GAN is used. If False, then a general GAN will be applied.
* `--dist_flag` indicates if distributional recovery is performed. If False, then optimization is simply applied on a single sample instead of a distribution.
* By setting both `improved_flag` and `dist_flag` be False, we are simply using the method proposed in [[1]](#1).
This repository is dedicated to the first assignment of the "Data Protection Techniques" course. In this assignment, I will delve into a comprehensive study and conduct numerous experiments based on the paper: [Knowledge-Enriched Distributional Model Inversion Attacks]((https://openaccess.thecvf.com/content/ICCV2021/papers/Chen_Knowledge-Enriched_Distributional_Model_Inversion_Attacks_ICCV_2021_paper.pdf)\] ), which can also be accessed on [arXiv](https://arxiv.org/abs/2010.04092).

## Contents

- [Download Data](#download-data)
- [Training Classifier](#training-classifier)
- [Training Inversion-Specific GAN](#training-inversion-specific-gan)
- [Attacking The Model](#attacking-the-model)
- [Calculating Frechet Inception Distance (FID)](#calculating-fréchet-inception-distance)
- [Reference](#reference)


## Requirements

- Python 3.6
- PyTorch
- CUDA
- TensorBoardX
- OpenCV
- PIL

## Download Data

This project uses the CelebA dataset, which originally labels images with attributes like 'smiling' or 'glasses.' We are not concerned with those labels, instead, we’re interested in the identity of datapoints. So, we work with 1,000 unique identities. For these identities, we have 27,018 training images and 3,009 test images. You can find the images and their corresponding labels in ./data/trainset.txt for training data and ./data/testset.txt for test data.

Furthermore, there is a distinct dataset available in ./data/ganset.txt which is public and is utilized to pretrain the GAN. It's crucial to note that this dataset does not have class overlap with the private data designated for training the classifier, ensuring that the identities in the GAN pretraining set are exclusive from those in the classifier’s training and test sets."

To download the CelebA dataset, you can run the provided Python script as follows:

```sh
python data_downloader.py
```

> ### NOTE
> There are occasions when downloading the CelebA dataset using torchvision
> results in a Google Drive error. This is due to the dataset being hosted on
> Google Drive, which sometimes restricts the number of download requests.
>
> In such cases, an alternative is to download the CelebA dataset directly
> from Kaggle using the following link:
> [CelebA Dataset on Kaggle](https://www.kaggle.com/datasets/jessicali9530/celeba-dataset)


## Training Classifier
The paper is based on attacking a classifier model and extracting its private training data. To train such a classifier, we take some famous well-known model architectures that have been trained on the ImageNet dataset and fine-tune them to classify our celebrity identities. Follow the steps below to train a classifier.

1. To train a `FaceNet` or `IR152` based classifier, make sure to download model backbones from [here](https://drive.google.com/drive/folders/1ZTTrRJr-2HOgfyxndP8a9R2Hb_UOgV6J).
2. Run this in your command line:

```sh
python train_classifier.py
```

**NOTES**:

- Training parameters can be accessed and modified from `./config/classify.json`.
- Pretrained checkpoints of these classifiers can also be found [here](https://drive.google.com/drive/folders/1ZTTrRJr-2HOgfyxndP8a9R2Hb_UOgV6J).
- `--model_name` parameter indicates the backbone architecrure of classifier.

The table below shows the details of pretrained backbone models.
| Model | Size (MB) | Parameters | Depth | Time (ms) per inference step (CPU) | Time (ms) per inference step (GPU) |
|-------------|-----------|------------|-------|------------------------------------|------------------------------------|
| Resnet152 | 232 | 60.4M | 311 | 127.4 | 6.5 |
| VGG16 | 528 | 138.4M | 16 | 69.5 | 4.2 |
| FaceNet64 | 98 | 25.6M | 22 | 58.2 | 4.6


## Training Inversion-Specific GAN
The assigned paper is focused on proposing a novel GAN architecture named Inversion-Specific GAN. To train this GAN, follow the steps below:

1. For each of the models, make sure the classifier is present at `./target_model/target_ckp`. This classifier can be downloaded from [here](https://drive.google.com/drive/folders/1ZTTrRJr-2HOgfyxndP8a9R2Hb_UOgV6J) or trained at the previous stage.
2. For training a GAN against VGG16, make sure the target model is named `VGG16_86.30_allclass.tar`.
3. For training a GAN against FaceNet64, make sure the target model is named `FaceNet64_88.50.tar`.
4. For training a GAN against IR152, make sure the target model is named `IR152_91.16.tar`.
5. Run this in your commandline.
```sh
python k+1_gan.py --model_name_T "VGG16"
```

**NOTES**:

- `--model_name_T` specifies the target model being attacked.
- training parameters can be accessed from ./config/celeba.json
- Trained GANs and generated images are in `./improvedGAN`.
- Pretrained checkpoints of these GANs can be accessed [here](https://drive.google.com/drive/folders/1eCuJXdpKlrIAf9jIYxQ1cHviCQ4hxL--?usp=sharing).
- A general binary GAN can be trained by running this in commandline.

```sh
python binary_gan.py
```

## Attacking The Model

This section is where we perform the actual attack. Here, we put the target model and our trained GAN face-to-face and execute the attack, calculating accuracy and top-5 accuracy. Simply forwarding images from the target model would not work due to the high bias of this approach. Since the GAN is trained on the target model, it might generate poor, random patterns of pixels but still optimize the target model. To calculate accuracy, we need another model named the Evaluation Classifier to work as an oracle. In this case, we use `FaceNet_95.88.tar`, which can be downloaded [here](https://drive.google.com/drive/folders/1ZTTrRJr-2HOgfyxndP8a9R2Hb_UOgV6J) as our Evaluation Classifier. Follow these steps:

1. Download the Evaluation Classifier `FaceNet_95.88.tar` from [here](https://drive.google.com/drive/folders/1ZTTrRJr-2HOgfyxndP8a9R2Hb_UOgV6J) and place it at `./target_model/target_ckp`.
2. Make sure you have the Classifiers as explained in the previous stage with the correct names and correct locations.
3. Place the Inversion-Specific GAN or GMI in the `./improvedGAN` location. You can train these GANs or download them from [here](https://drive.google.com/drive/folders/1eCuJXdpKlrIAf9jIYxQ1cHviCQ4hxL--).
4. The names of these GAN generators should be as follows: `improved_celeba_G`, `improved_celeba_G_facenet`, and `improved_celeba_G_IR152` for 'VGG16', 'FaceNet', and `IR152`, respectively. For GMI, the name should be `celeba_G.tar`.
5. Ensure you also have discriminators; their names should be exactly as the generator name but with 'G' replaced by 'D'. They can also be downloaded from [here](https://drive.google.com/drive/folders/1eCuJXdpKlrIAf9jIYxQ1cHviCQ4hxL--).
6. Run the following command in the command line with your specified arguments:
```sh
python recovery.py
```

**NOTES**:

- The `improved_flag parameter` indicates whether the 'Inversion-Specific GAN,' as mentioned in the paper, is being used for the attack.
- The `dist_flag parameter` signifies whether distribution recovery is employed or not.

By setting both improved_flag and dist_flag be False, we are simply using the method proposed in [[1]](#1).

## Calculating Fréchet Inception Distance
**FID (Fréchet Inception Distance)** is a classic metric employed to quantify the performance of Generative Adversarial Networks (GANs). It has emerged as a widely adopted measure for assessing the quality and realism of images generated by GANs. The fundamental concept behind FID involves comparing the statistical distributions of generated images and real images within a feature space derived from a pre-trained deep neural network, typically an Inception Network.

In this project, since the original code provided by the authors lacked a module to calculate FID, I implemented it separately. By executing the provided code, we will be able to compute the FID for both the GMI model and the Inversion-Specific model, enabling a comprehensive analysis of their performances. To calculate FID, follow the steps below:

1. Ensure that the generator for the model you wish to evaluate is located in `./improvedGAN` and is named `improved_celeba_G.tar`.

2. For calculating the FID of the GMI model, ensure that the GMI model is in `./improvedGAN` and named `celeba_G.tar`.

3. Run the following command in your command line:
```sh
python calculate_FID.py
```
**NOTES**:

To calculate FID on the improved generator, use the `--improved_flag`. Omitting it will result in calculating FID for the GMI baseline model.

---
>**Important Note:** If you wish to execute these lines and reproduce my experiments, simply run the code provided for this assignment in the `Knowledge Enriched DMI.ipynb` file. Be sure to consult my documentation if you encounter any issues.
>
>Happy experimenting :)

## Reference
<a id="1">[1]</a>
Expand Down
163 changes: 163 additions & 0 deletions calculate_FID.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
import torch
import torch.nn.functional as F
import torch.nn
import argparse
import utils
import numpy as np
from generator import Generator
from torchvision.models import inception_v3
from scipy.linalg import sqrtm



def prepare_for_inception_v3(imgs_batch):
"""
Prepares the given batch of images for inception_v3 model.

Parameters:
imgs_batch (torch.Tensor): A batch of images to prepare.

Returns:
torch.Tensor: The prepared batch of images.
"""
imgs_resized = F.interpolate(imgs_batch, size=(299, 299), mode='bilinear', align_corners=False)
imgs_normalized = (imgs_resized * 2) - 1
return imgs_normalized


def get_real_batch(trainloader):
"""
Gets a batch of real images from the training loader and prepares it for inception_v3 model.

Parameters:
trainloader (DataLoader): The training data loader.

Returns:
torch.Tensor: A batch of real images prepared for inception_v3 model.
"""
data_iter = iter(trainloader)
real_images = next(data_iter)[0]
return prepare_for_inception_v3(real_images)


def generate_fake_batch(generator):
"""
Generates a batch of fake images using the given generator.

Parameters:
generator (Generator): The generator model to generate fake images.

Returns:
torch.Tensor: A batch of fake images prepared for inception_v3 model.
"""
z = torch.randn(64, 100).cuda() # Consider using .to(device) instead of .cuda()
fake_imgs = generator(z)
return prepare_for_inception_v3(fake_imgs)


def calculate_fid(real_imgs, fake_imgs):
"""
Calculates the Frechet Inception Distance (FID) between real and fake images.

Parameters:
real_imgs (torch.Tensor): A batch of real images.
fake_imgs (torch.Tensor): A batch of fake images.

Returns:
torch.Tensor: The calculated FID.
"""
real_imgs, fake_imgs = real_imgs.to(device), fake_imgs.to(device)
mu_real, sigma_real = compute_statistics(real_imgs)
mu_fake, sigma_fake = compute_statistics(fake_imgs)
sum_sq_diff = torch.sum((mu_real - mu_fake) ** 2)
sigma_sqrt = sqrtm((sigma_real @ sigma_fake).cpu().numpy())

if np.iscomplexobj(sigma_sqrt):
sigma_sqrt = sigma_sqrt.real

fid = sum_sq_diff + torch.trace(sigma_real + sigma_fake - 2 * torch.tensor(sigma_sqrt, device=device, dtype=torch.float32))
return fid


def compute_statistics(imgs):
"""
Computes the statistics (mean and covariance) for the given batch of images using inception_v3 model.

Parameters:
imgs (torch.Tensor): A batch of images.

Returns:
Tuple[torch.Tensor, torch.Tensor]: The calculated mean and covariance of the features.
"""
model = inception_v3(pretrained=True, transform_input=False).to(device)
model.eval()
with torch.no_grad():
features = model(imgs).view(imgs.size(0), -1)
mu = torch.mean(features, dim=0)
sigma = torch_cov(features, rowvar=False)
return mu, sigma


def torch_cov(m, rowvar=False):
"""
Computes the covariance matrix of a given matrix.

Parameters:
m (torch.Tensor): A 2D matrix.
rowvar (bool): If True, treat the rows as variables, otherwise, treat columns as variables.

Returns:
torch.Tensor: The covariance matrix.
"""
if m.size(0) == 1:
return torch.empty((m.size(1), m.size(1))).fill_(0).to(device)
if rowvar:
m = m.t()
fact = 1.0 / (m.size(1) - 1)
m -= torch.mean(m, dim=1, keepdim=True)
mt = m.t()
return fact * m.matmul(mt).squeeze()


if __name__ == "__main__":

device = torch.device("cpu")


parser = argparse.ArgumentParser(description='If Inversion-Specific GAN is used')
parser.add_argument('--improved_flag', action='store_true', default=False,
help='')
args = parser.parse_args()

file = "./config/classify.json"
config_args = utils.load_json(json_file=file)
train_file = config_args['dataset']['train_file_path']

print(f"using improved model: {args.improved_flag}")


if args.improved_flag:
generator_path = "./improvedGAN/improved_celeba_G.tar"

else:
generator_path = "./improvedGAN/celeba_G.tar"


G = Generator(100)
G = torch.nn.DataParallel(G)
ckp_G = torch.load(generator_path)
G.load_state_dict(ckp_G['state_dict'])
G.eval()

_, trainloader = utils.init_dataloader(config_args, train_file, mode="train")

fid_list = []

for i in range(5):
real_images, fake_images = get_real_batch(trainloader), generate_fake_batch(G)
FID = calculate_fid(real_images, fake_images)
fid_list.append(FID.item())
print(f"Batch {i+1} - FID: {FID.item():.2f}")

mean_fid = np.mean(fid_list)
print(f"Mean FID: {mean_fid:.2f}")
4 changes: 2 additions & 2 deletions config/celeba.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"gan_file_path": "./data/ganset.txt",
"model_name": "train_gan - first stage",
"name": "celeba",
"img_path": "./data/img_align_celeba_png",
"img_path": "./data/celeba/img_align_celeba",
"n_classes":1000
},

Expand All @@ -29,4 +29,4 @@
}


}
}
6 changes: 3 additions & 3 deletions config/classify.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
"name":"celeba",
"train_file_path":"./data/trainset.txt",
"test_file_path":"./data/testset.txt",
"img_path": "./data/img_align_celeba_png",
"img_path": "./data/celeba/img_align_celeba",
"model_name":"VGG16",
"mode":"reg",
"n_classes":1000,
"gpus":"0,1,2,3,4,5,6,7"
"gpus":"0"
},
"VGG16":{
"epochs":50,
Expand Down Expand Up @@ -68,4 +68,4 @@
}
}



Loading