Welcome to the consolidated, publication-ready repository for DP-CLA. This top-level README provides the fastest path to installation and reproducing the key experiments. For an in-depth walkthrough of every file see dpcla/README.md.
DP-CLA is tested on Python 3.9 – 3.11. Please do not use 3.12+ yet – many deep-learning wheels are still missing. A reliable workflow on macOS/Linux is:
# one-off: ensure the right interpreter is available
pyenv install 3.11.7 # skip if already present
# clone & enter the repo
git clone <your-fork-url> && cd dpcla_release
# activate a clean virtual-environment
pyenv local 3.11.7 # picks 3.11 inside this folder
python -m venv .venv && source .venv/bin/activate
# install in *editable* mode so CLI entry-points stay up-to-date
pip install -U pip setuptools wheel
pip install -e . # pulls in numpy, torch, transformers, …After installation two console scripts become available:
dpcla-demo– run the NumPy toy demonstrationdpcla-run-all– reproduce all experiments end-to-end
python run_all_experiments.py --quickCUDA_VISIBLE_DEVICES=0 dpcla-run-all # may take ~1-2 h on one GPURun a faster variant (paper-subset of 4 languages and 500-example cap):
CUDA_VISIBLE_DEVICES=0 dpcla-run-all --languages 3 5 8 12 \ # TyDiQA IDs used in the paper
--subset 500 # trim dataset for a fast smoke testAll artefacts (JSON metrics, model checkpoints, plots) are written to results/ and plots/.
Every experiment is driven by a YAML config. Run any script with --config path/to/file.yaml to override CLI defaults, e.g.
dpcla-run-all --config dpcla/config/xnli_dpcla.yamlhuman_eval_sample.txt already contains 100 Swahili sentence pairs in the
format expected by human_eval_instructions.md. If you prefer a different
sample, copy the file and replace the sentences manually – no code changes are
required.
dpcla/ ← core library (NumPy & PyTorch code)
scripts/ ← runnable shell helpers for XNLI / TyDiQA
tests/ ← unit tests (92 % coverage)
.github/workflows/ ← CI: Black, isort, Flake8, pytest-cov
run_all_experiments.py ← master script (demo + ablation + metrics)
results/ ← JSON outputs (created on first run)
plots/ ← Figures produced by the toy demo
Temporary artefacts results/, plots/ and any checkpoints/ directories can
be deleted at any time – they will be regenerated on the next run.
A GitHub Actions workflow (.github/workflows/ci.yml) verifies formatting (Black/isort), linting (Flake8), unit tests and >80 % coverage. The quick mode of run_all_experiments.py executes in CI to catch runtime regressions.
@inproceedings{dpcla2025,
title = {Differentially Private Cross-lingual Alignment},
author = {Anonymous},
year = {2025},
booktitle = {-}
}