Skip to content
/ dpdl Public

Easy experimentation for Differentially Private Deep Learning

Notifications You must be signed in to change notification settings

DPBayes/dpdl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

860 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DPDL logo

Experiment framework for Differentially Private Deep Learning

Installation and usage

Prerequisites

  • Python >= 3.10
  • PyTorch (CPU or GPU build appropriate for your system)

Install from source

Clone the repository:

git clone --branch joss --single-branch --depth 1 git@github.com:PROBIC/dpdl.git
cd ./dpdl

Create and activate a virtual environment, then install DPDL.

Note that you might want to use --system-site-packages, if you are installing DPDL on your cluster.

python -m venv .venv
source .venv/bin/activate
pip install -U pip

# You might want to install PyTorch for your platform/CUDA/ROCm first.
# See https://pytorch.org/get-started/locally/

pip install -e .

Some features (--use-steps and --normalize-clipping) require our fork of Opacus:

pip install "git+https://github.com/DPBayes/opacus.git"

Otherwise, the official Opacus can be installed by

pip install opacus

Test your installation

Run the CPU-only test suite (uses the fake dataset; no downloads):

pip install -e ".[test]"
pytest -m "not gpu"

To run GPU smoke tests (requires CUDA and a visible GPU):

pytest -m gpu

Command line usage

The entry point is run.py (also installed as the dpdl CLI).

Example usage

At minimum, specify --epochs (or --use-steps with --total-steps).

Real-world example (CIFAR-10 + ResNetV2; downloads data and weights):

dpdl train --epochs 10 --dataset-name uoft-cs/cifar10 --model-name resnetv2_50x1_bit.goog_in21k --device auto

Quick CPU sanity check (no downloads; uses the fake dataset):

DPDL_FAKE_DATASET=1 dpdl train --epochs 1 --dataset-name fake --model-name resnet18 --device cpu --batch-size 64 --physical-batch-size 32 --num-workers 0

Architecture

DPDL Architecture

How to use?

Command line help

Run dpdl --help (or python run.py --help).

Creating a Slurm script

There is a tool for creating Slurm run scripts for LUMI

$ bin/create-run-script.sh
Usage: bin/create-run-script.sh script_name [options...]

script_name               Name of the script to be created.

Options:
  --help                  Show this help message.
  project                 Slurm project (default: project_462000213).
  partition               Slurm partition (default: standard-g).
  gpus                    Number of GPUs (default: 8).
  time                    Time allocation (default: 1:00:00, 00:15:00 for dev-g).
  mem_per_gpu             Memory per GPU (default: 60G).
  cpus_per_task           Number of CPUs per task (default: 7).

Example:
  bin/create-run-script.sh run.sh project_462000213 small-g 1

Training under DP

Check out an example

Training without DP

Check an example

High-level architecture

DPDL Architecture

Entry point

The entrypoint run.py provides a CLI using Python's Typer module.

Command-line interface

The CLI implementation is in dpdl/cli.py

Training

The CLI calls the fit method of trainer

Hyperparameter optimization

The CLI calls the optimize_hypers method of hyperparameteroptimizer.

The ranges/options for the different hyperparameters is in conf/optuna_hypers.conf.

See the detailed guide: docs/hyperparameter-optimization.md.

Example (optimize learning rate and batch size):

dpdl optimize --target-hypers learning_rate --target-hypers batch_size --n-trials 20 --optuna-config conf/optuna_hypers.conf

Callbacks

The system provides a flexible callback system.

Add a new dataset?

Create a new datamodule.

NB: The code currently should support all Huggingface image datasets by using, for example a --dataset-name cifar100 command line parameter.

Add a new model?

Create a new model in dpdl/models and add it to models.py.

Add a new optimizer?

Add a new optimizer in optimizers.

Acknowledgements

We borrow the callback idea from fastai and the datamodule idea from PyTorch Lightning.

About

Easy experimentation for Differentially Private Deep Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7