- Python >= 3.10
- PyTorch (CPU or GPU build appropriate for your system)
Clone the repository:
git clone --branch joss --single-branch --depth 1 git@github.com:PROBIC/dpdl.git
cd ./dpdlCreate and activate a virtual environment, then install DPDL.
Note that you might want to use --system-site-packages, if you are installing DPDL on your cluster.
python -m venv .venv
source .venv/bin/activate
pip install -U pip
# You might want to install PyTorch for your platform/CUDA/ROCm first.
# See https://pytorch.org/get-started/locally/
pip install -e .Some features (--use-steps and --normalize-clipping) require our fork of Opacus:
pip install "git+https://github.com/DPBayes/opacus.git"Otherwise, the official Opacus can be installed by
pip install opacusRun the CPU-only test suite (uses the fake dataset; no downloads):
pip install -e ".[test]"
pytest -m "not gpu"To run GPU smoke tests (requires CUDA and a visible GPU):
pytest -m gpuThe entry point is run.py (also installed as the dpdl CLI).
At minimum, specify --epochs (or --use-steps with --total-steps).
Real-world example (CIFAR-10 + ResNetV2; downloads data and weights):
dpdl train --epochs 10 --dataset-name uoft-cs/cifar10 --model-name resnetv2_50x1_bit.goog_in21k --device autoQuick CPU sanity check (no downloads; uses the fake dataset):
DPDL_FAKE_DATASET=1 dpdl train --epochs 1 --dataset-name fake --model-name resnet18 --device cpu --batch-size 64 --physical-batch-size 32 --num-workers 0Run dpdl --help (or python run.py --help).
There is a tool for creating Slurm run scripts for LUMI
$ bin/create-run-script.sh
Usage: bin/create-run-script.sh script_name [options...]
script_name Name of the script to be created.
Options:
--help Show this help message.
project Slurm project (default: project_462000213).
partition Slurm partition (default: standard-g).
gpus Number of GPUs (default: 8).
time Time allocation (default: 1:00:00, 00:15:00 for dev-g).
mem_per_gpu Memory per GPU (default: 60G).
cpus_per_task Number of CPUs per task (default: 7).
Example:
bin/create-run-script.sh run.sh project_462000213 small-g 1
Check out an example
Check an example
The entrypoint run.py provides a CLI using Python's Typer module.
The CLI implementation is in dpdl/cli.py
The CLI calls the fit method of trainer
The CLI calls the optimize_hypers method of hyperparameteroptimizer.
The ranges/options for the different hyperparameters is in conf/optuna_hypers.conf.
See the detailed guide: docs/hyperparameter-optimization.md.
Example (optimize learning rate and batch size):
dpdl optimize --target-hypers learning_rate --target-hypers batch_size --n-trials 20 --optuna-config conf/optuna_hypers.conf
The system provides a flexible callback system.
Create a new datamodule.
NB: The code currently should support all Huggingface image datasets by using, for example a --dataset-name cifar100 command line parameter.
Create a new model in dpdl/models and add it to models.py.
Add a new optimizer in optimizers.
We borrow the callback idea from fastai and the datamodule idea from PyTorch Lightning.

