This repository contains the implementation of benchmarked differential privacy Python libraries and frameworks. The libraries and frameworks are evaluated based on utility and execution time.
- Mean relative error of the sum query for experiments with synthetic datasets.
- Execution time of the tools in the Spark environment, experimented on synthetic datasets of varying sizes of upto 1 billion data points.
# sample command to run a query on a dataset size using a library/framwework
python3 run_tool.py --size 100k --query VARIANCE --tool opendp| Argument | Description | Type | Default |
|---|---|---|---|
| size | Dataset size to run query. Valid values are {1k, 10k, 100k} |
str | 10k |
| query | Query to run. Valid values are { count, sum, mean, variance} |
str | count |
| tool | Library/Framework to use. Valid values are { diffprivlib, opendp, tmlt_ana, pipelinedp_local, pipelinedp_spark} |
str | diffprivlib |
This git repository is also referenced by our four-part differential privacy articles on DSAID Medium:
-
Part 1: Sharing Data with Differential Privacy: A Primer — A beginner’s guide to understanding the fundamental concepts of differential privacy with simplified mathematical interpretation.
-
Part 2: Practitioners’ Guide to Accessing Emerging Differential Privacy Tools — Explore the emerging differential privacy tools developed by prominent researchers and institutions, with practical guidance on their adoption for real-world use cases.
-
Part 3: Evaluating Differential Privacy Tools’ Performance — A comparative analysis of the accuracy and execution time of differential privacy tools in both standalone and distributed environments, with a focus on common analytical queries.
-
Part 4: Getting Started with Scalable Differential Privacy Tools on the Cloud — A step-by-step guide to deploying differential privacy tools in a distributed environment on AWS services, specifically AWS Glue and Amazon EMR, to support the analysis of large datasets.
The following folders are used as references to Part 4:
|- glue/ - Glue differential privacy examples |- emr/ - EMR differential privacy examples
- @anshu-gt (anshu@dsaid.gov.sg)
- @syah-ri (syahri@dsaid.gov.sg)
💪 by DSAID Data Privacy Protection Capability Centre (DPPCC) of GovTech


