SCccc21 · aynzabdz · Sep 14, 2023 · Sep 14, 2023 · Sep 14, 2023 · Sep 14, 2023
diff --git a/Knowledge Enriched DMI.ipynb b/Knowledge Enriched DMI.ipynb
diff --git a/README.md b/README.md
@@ -1,41 +1,138 @@
 # Knowledge-Enriched-Distributional-Model-Inversion-Attacks
 
-This is a PyTorch implementation of our paper at ICCV2021:
-
-**Knowledge Enriched Distributional Model Inversion Attacks** \[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Chen_Knowledge-Enriched_Distributional_Model_Inversion_Attacks_ICCV_2021_paper.pdf)\]  \[[arxiv](https://arxiv.org/abs/2010.04092)\]
-
-We propose a novel **'Inversion-Specific GAN'** that can better distill knowledge useful for performing attacks on private models from public data. Moreover,  we propose to *model a private data distribution* for each target class which refers to **'Distributional Recovery'**.
-
-## Requirement
-This code has been tested with Python 3.6, PyTorch 1.0 and cuda 10.0. 
-
-## Getting Started
-* Install required packages.
-* Download relevant datasets including Celeba, MNIST, CIFAR10.
-* Get target model prepared or run our code
-    `python train_classifier.py` <br>
-    Note that this code only provides three model architectures: VGG16, IR152, Facenet. And pretrained checkpoints for the three models can be downloaded at https://drive.google.com/drive/folders/1U4gekn72UX_n1pHdm9GQUQwwYVDvpTfN?usp=sharing.
-
-## Build a inversion-specific GAN
-* Modify the configuration in 'celeba.json'.
-* Modify the target model path in 'k+1_gan.py' to your customized path.
-* Run
-    `python k+1_gan.py`.
-* Model checkpoints and generated image results are saved in folder ’improvedGAN‘.
-* A general GAN can be obtained as a baseline by running
-    `python binary_gan.py`.
-* Pretrained binary GAN and inversion-specific GAN can be downloaded at https://drive.google.com/drive/folders/1L3frX-CE4j36pe5vVWuy3SgKGS9kkA70?usp=sharing.
-
-
-## Distributional Recovery
-Run
-    `python recovery.py`
-
-* `--model` chooses the target model to attack.
-* `--improved_flag` indicates if an inversion-specfic GAN is used. If False, then a general GAN will be applied.
-* `--dist_flag` indicates if distributional recovery is performed. If False, then optimization is simply applied on a single sample instead of a distribution.
-* By setting both `improved_flag` and `dist_flag` be False, we are simply using the method proposed in [[1]](#1).
+This repository is dedicated to the first assignment of the "Data Protection Techniques" course. In this assignment, I will delve into a comprehensive study and conduct numerous experiments based on the paper: [Knowledge-Enriched Distributional Model Inversion Attacks]((https://openaccess.thecvf.com/content/ICCV2021/papers/Chen_Knowledge-Enriched_Distributional_Model_Inversion_Attacks_ICCV_2021_paper.pdf)\] ), which can also be accessed on [arXiv](https://arxiv.org/abs/2010.04092).
 
+## Contents
+
+- [Download Data](#download-data)
+- [Training Classifier](#training-classifier)
+- [Training Inversion-Specific GAN](#training-inversion-specific-gan)
+- [Attacking The Model](#attacking-the-model)
+- [Calculating Frechet Inception Distance (FID)](#calculating-fréchet-inception-distance)
+- [Reference](#reference)
+
+
+## Requirements
+
+- Python 3.6
+- PyTorch
+- CUDA
+- TensorBoardX
+-  OpenCV
+- PIL
+
+## Download Data
+
+This project uses the CelebA dataset, which originally labels images with attributes like 'smiling' or 'glasses.' We are not concerned with those labels, instead, we’re interested in the identity of datapoints. So, we work with 1,000 unique identities. For these identities, we have 27,018 training images and 3,009 test images. You can find the images and their corresponding labels in ./data/trainset.txt for training data and ./data/testset.txt for test data.
+
+Furthermore, there is a distinct dataset available in ./data/ganset.txt which is public and is utilized to pretrain the GAN. It's crucial to note that this dataset does not have class overlap with the private data designated for training the classifier, ensuring that the identities in the GAN pretraining set are exclusive from those in the classifier’s training and test sets."
+
+To download the CelebA dataset, you can run the provided Python script as follows:
+
+```sh
+python data_downloader.py
+```
+
+> ### NOTE
+> There are occasions when downloading the CelebA dataset using torchvision
+> results in a Google Drive error. This is due to the dataset being hosted on
+> Google Drive, which sometimes restricts the number of download requests.
+>
+> In such cases, an alternative is to download the CelebA dataset directly 
+> from Kaggle using the following link:
+> [CelebA Dataset on Kaggle](https://www.kaggle.com/datasets/jessicali9530/celeba-dataset)
+
+
+## Training Classifier
+The paper is based on attacking a classifier model and extracting its private training data. To train such a classifier, we take some famous well-known model architectures that have been trained on the ImageNet dataset and fine-tune them to classify our celebrity identities. Follow the steps below to train a classifier.
+
+1. To train a `FaceNet` or `IR152` based classifier, make sure to download model backbones from [here](https://drive.google.com/drive/folders/1ZTTrRJr-2HOgfyxndP8a9R2Hb_UOgV6J).
+2. Run this in your command line: 
+
+```sh
+python train_classifier.py
+```
+
+**NOTES**:
+
+- Training parameters can be accessed and modified from `./config/classify.json`.
+- Pretrained checkpoints of these classifiers can also be found [here](https://drive.google.com/drive/folders/1ZTTrRJr-2HOgfyxndP8a9R2Hb_UOgV6J).
+- `--model_name` parameter indicates the backbone architecrure of classifier.
+
+The table below shows the details of pretrained backbone models.
+| Model       | Size (MB) | Parameters | Depth | Time (ms) per inference step (CPU) | Time (ms) per inference step (GPU) |
+|-------------|-----------|------------|-------|------------------------------------|------------------------------------|
+| Resnet152   | 232       | 60.4M      | 311   | 127.4                              | 6.5                                |
+| VGG16       | 528       | 138.4M     | 16    | 69.5                               | 4.2                                |
+| FaceNet64   | 98        | 25.6M      | 22    | 58.2                               | 4.6             
+
+
+## Training Inversion-Specific GAN
+The assigned paper is focused on proposing a novel GAN architecture named Inversion-Specific GAN. To train this GAN, follow the steps below:
+
+1. For each of the models, make sure the classifier is present at `./target_model/target_ckp`. This classifier can be downloaded from [here](https://drive.google.com/drive/folders/1ZTTrRJr-2HOgfyxndP8a9R2Hb_UOgV6J) or trained at the previous stage.
+2. For training a GAN against VGG16, make sure the target model is named `VGG16_86.30_allclass.tar`.
+3. For training a GAN against FaceNet64, make sure the target model is named `FaceNet64_88.50.tar`.
+4. For training a GAN against IR152, make sure the target model is named `IR152_91.16.tar`.
+5. Run this in your commandline.
+```sh
+python k+1_gan.py --model_name_T "VGG16"
+```
+
+**NOTES**:
+
+- `--model_name_T` specifies the target model being attacked.
+- training parameters can be accessed from ./config/celeba.json
+- Trained GANs and generated images are in `./improvedGAN`.
+- Pretrained checkpoints of these GANs can be accessed [here](https://drive.google.com/drive/folders/1eCuJXdpKlrIAf9jIYxQ1cHviCQ4hxL--?usp=sharing).
+- A general binary GAN can be trained by running this in commandline.
+
+```sh
+python binary_gan.py
+```
+
+## Attacking The Model
+
+This section is where we perform the actual attack. Here, we put the target model and our trained GAN face-to-face and execute the attack, calculating accuracy and top-5 accuracy. Simply forwarding images from the target model would not work due to the high bias of this approach. Since the GAN is trained on the target model, it might generate poor, random patterns of pixels but still optimize the target model. To calculate accuracy, we need another model named the Evaluation Classifier to work as an oracle. In this case, we use `FaceNet_95.88.tar`, which can be downloaded [here](https://drive.google.com/drive/folders/1ZTTrRJr-2HOgfyxndP8a9R2Hb_UOgV6J) as our Evaluation Classifier. Follow these steps:
+
+1. Download the Evaluation Classifier `FaceNet_95.88.tar` from [here](https://drive.google.com/drive/folders/1ZTTrRJr-2HOgfyxndP8a9R2Hb_UOgV6J) and place it at `./target_model/target_ckp`.
+2. Make sure you have the Classifiers as explained in the previous stage with the correct names and correct locations.
+3. Place the Inversion-Specific GAN or GMI in the `./improvedGAN` location. You can train these GANs or download them from [here](https://drive.google.com/drive/folders/1eCuJXdpKlrIAf9jIYxQ1cHviCQ4hxL--).
+4. The names of these GAN generators should be as follows: `improved_celeba_G`, `improved_celeba_G_facenet`, and `improved_celeba_G_IR152` for 'VGG16', 'FaceNet', and `IR152`, respectively. For GMI, the name should be `celeba_G.tar`.
+5. Ensure you also have discriminators; their names should be exactly as the generator name but with 'G' replaced by 'D'. They can also be downloaded from [here](https://drive.google.com/drive/folders/1eCuJXdpKlrIAf9jIYxQ1cHviCQ4hxL--).
+6. Run the following command in the command line with your specified arguments:
+```sh
+python recovery.py
+```
+
+**NOTES**:
+
+- The `improved_flag parameter` indicates whether the 'Inversion-Specific GAN,' as mentioned in the paper, is being used for the attack.
+- The `dist_flag parameter` signifies whether distribution recovery is employed or not.
+
+By setting both improved_flag and dist_flag be False, we are simply using the method proposed in [[1]](#1).
+
+## Calculating Fréchet Inception Distance
+**FID (Fréchet Inception Distance)** is a classic metric employed to quantify the performance of Generative Adversarial Networks (GANs). It has emerged as a widely adopted measure for assessing the quality and realism of images generated by GANs. The fundamental concept behind FID involves comparing the statistical distributions of generated images and real images within a feature space derived from a pre-trained deep neural network, typically an Inception Network.
+
+In this project, since the original code provided by the authors lacked a module to calculate FID, I implemented it separately. By executing the provided code, we will be able to compute the FID for both the GMI model and the Inversion-Specific model, enabling a comprehensive analysis of their performances. To calculate FID, follow the steps below:
+
+1. Ensure that the generator for the model you wish to evaluate is located in `./improvedGAN` and is named `improved_celeba_G.tar`.
+
+2. For calculating the FID of the GMI model, ensure that the GMI model is in `./improvedGAN` and named `celeba_G.tar`.
+
+3. Run the following command in your command line:
+```sh
+python calculate_FID.py
+```
+**NOTES**:
+
+To calculate FID on the improved generator, use the `--improved_flag`. Omitting it will result in calculating FID for the GMI baseline model.
+
+---
+>**Important Note:** If you wish to execute these lines and reproduce my experiments, simply run the code provided for this assignment in the `Knowledge Enriched DMI.ipynb` file. Be sure to consult my documentation if you encounter any issues.
+>
+>Happy experimenting :)
 
 ## Reference
 <a id="1">[1]</a> 

diff --git a/calculate_FID.py b/calculate_FID.py
@@ -0,0 +1,163 @@
+import torch
+import torch.nn.functional as F
+import torch.nn
+import argparse
+import utils
+import numpy as np
+from generator import Generator
+from torchvision.models import inception_v3
+from scipy.linalg import sqrtm
+
+
+
+def prepare_for_inception_v3(imgs_batch):
+    """
+    Prepares the given batch of images for inception_v3 model.
+
+    Parameters:
+        imgs_batch (torch.Tensor): A batch of images to prepare.
+
+    Returns:
+        torch.Tensor: The prepared batch of images.
+    """
+    imgs_resized = F.interpolate(imgs_batch, size=(299, 299), mode='bilinear', align_corners=False)
+    imgs_normalized = (imgs_resized * 2) - 1
+    return imgs_normalized
+
+
+def get_real_batch(trainloader):
+    """
+    Gets a batch of real images from the training loader and prepares it for inception_v3 model.
+
+    Parameters:
+        trainloader (DataLoader): The training data loader.
+
+    Returns:
+        torch.Tensor: A batch of real images prepared for inception_v3 model.
+    """
+    data_iter = iter(trainloader)
+    real_images = next(data_iter)[0]
+    return prepare_for_inception_v3(real_images)
+
+
+def generate_fake_batch(generator):
+    """
+    Generates a batch of fake images using the given generator.
+
+    Parameters:
+        generator (Generator): The generator model to generate fake images.
+
+    Returns:
+        torch.Tensor: A batch of fake images prepared for inception_v3 model.
+    """
+    z = torch.randn(64, 100).cuda()  # Consider using .to(device) instead of .cuda()
+    fake_imgs = generator(z)
+    return prepare_for_inception_v3(fake_imgs)
+
+
+def calculate_fid(real_imgs, fake_imgs):
+    """
+    Calculates the Frechet Inception Distance (FID) between real and fake images.
+
+    Parameters:
+        real_imgs (torch.Tensor): A batch of real images.
+        fake_imgs (torch.Tensor): A batch of fake images.
+
+    Returns:
+        torch.Tensor: The calculated FID.
+    """
+    real_imgs, fake_imgs = real_imgs.to(device), fake_imgs.to(device)
+    mu_real, sigma_real = compute_statistics(real_imgs)
+    mu_fake, sigma_fake = compute_statistics(fake_imgs)
+    sum_sq_diff = torch.sum((mu_real - mu_fake) ** 2)
+    sigma_sqrt = sqrtm((sigma_real @ sigma_fake).cpu().numpy())
+
+    if np.iscomplexobj(sigma_sqrt):
+        sigma_sqrt = sigma_sqrt.real
+
+    fid = sum_sq_diff + torch.trace(sigma_real + sigma_fake - 2 * torch.tensor(sigma_sqrt, device=device, dtype=torch.float32))
+    return fid
+
+
+def compute_statistics(imgs):
+    """
+    Computes the statistics (mean and covariance) for the given batch of images using inception_v3 model.
+
+    Parameters:
+        imgs (torch.Tensor): A batch of images.
+
+    Returns:
+        Tuple[torch.Tensor, torch.Tensor]: The calculated mean and covariance of the features.
+    """
+    model = inception_v3(pretrained=True, transform_input=False).to(device)
+    model.eval()
+    with torch.no_grad():
+        features = model(imgs).view(imgs.size(0), -1)
+    mu = torch.mean(features, dim=0)
+    sigma = torch_cov(features, rowvar=False)
+    return mu, sigma
+
+
+def torch_cov(m, rowvar=False):
+    """
+    Computes the covariance matrix of a given matrix.
+
+    Parameters:
+        m (torch.Tensor): A 2D matrix.
+        rowvar (bool): If True, treat the rows as variables, otherwise, treat columns as variables.
+
+    Returns:
+        torch.Tensor: The covariance matrix.
+    """
+    if m.size(0) == 1:
+        return torch.empty((m.size(1), m.size(1))).fill_(0).to(device)
+    if rowvar:
+        m = m.t()
+    fact = 1.0 / (m.size(1) - 1)
+    m -= torch.mean(m, dim=1, keepdim=True)
+    mt = m.t()
+    return fact * m.matmul(mt).squeeze()
+
+
+if __name__ == "__main__":
+
+    device = torch.device("cpu")
+
+
+    parser = argparse.ArgumentParser(description='If Inversion-Specific GAN is used')
+    parser.add_argument('--improved_flag', action='store_true', default=False,
+                        help='')
+    args = parser.parse_args()
+
+    file = "./config/classify.json"
+    config_args = utils.load_json(json_file=file)
+    train_file = config_args['dataset']['train_file_path']
+
+    print(f"using improved model: {args.improved_flag}")
+
+
+    if args.improved_flag:
+        generator_path = "./improvedGAN/improved_celeba_G.tar"
+
+    else:
+        generator_path = "./improvedGAN/celeba_G.tar"
+
+
+    G = Generator(100)
+    G = torch.nn.DataParallel(G)
+    ckp_G = torch.load(generator_path)
+    G.load_state_dict(ckp_G['state_dict'])
+    G.eval()
+
+    _, trainloader = utils.init_dataloader(config_args, train_file, mode="train")
+
+    fid_list = []
+
+    for i in range(5):
+        real_images, fake_images = get_real_batch(trainloader), generate_fake_batch(G)
+        FID = calculate_fid(real_images, fake_images)
+        fid_list.append(FID.item())
+        print(f"Batch {i+1} - FID: {FID.item():.2f}")
+
+    mean_fid = np.mean(fid_list)
+    print(f"Mean FID: {mean_fid:.2f}")
diff --git a/config/celeba.json b/config/celeba.json
@@ -3,7 +3,7 @@
         "gan_file_path": "./data/ganset.txt",
         "model_name": "train_gan - first stage",
         "name": "celeba",
-        "img_path": "./data/img_align_celeba_png",
+        "img_path": "./data/celeba/img_align_celeba",
         "n_classes":1000
     },
 
@@ -29,4 +29,4 @@
     }
 
 
-}
+}
diff --git a/config/classify.json b/config/classify.json
@@ -3,11 +3,11 @@
       "name":"celeba",
       "train_file_path":"./data/trainset.txt",
       "test_file_path":"./data/testset.txt",
-      "img_path": "./data/img_align_celeba_png",
+      "img_path": "./data/celeba/img_align_celeba",
       "model_name":"VGG16",
       "mode":"reg",
       "n_classes":1000,
-      "gpus":"0,1,2,3,4,5,6,7"
+      "gpus":"0"
     },
     "VGG16":{
       "epochs":50,
@@ -68,4 +68,4 @@
     }
   }
 
-
+