DP-CLA: Differentially Private Cross-lingual Alignment

Welcome to the consolidated, publication-ready repository for DP-CLA. This top-level README provides the fastest path to installation and reproducing the key experiments. For an in-depth walkthrough of every file see dpcla/README.md.

1 Installation & Python version

DP-CLA is tested on Python 3.9 – 3.11. Please do not use 3.12+ yet – many deep-learning wheels are still missing. A reliable workflow on macOS/Linux is:

# one-off: ensure the right interpreter is available
pyenv install 3.11.7        # skip if already present

# clone & enter the repo
git clone <your-fork-url> && cd dpcla_release

# activate a clean virtual-environment
pyenv local 3.11.7          # picks 3.11 inside this folder
python -m venv .venv && source .venv/bin/activate

# install in *editable* mode so CLI entry-points stay up-to-date
pip install -U pip setuptools wheel
pip install -e .            # pulls in numpy, torch, transformers, …

After installation two console scripts become available:

dpcla-demo – run the NumPy toy demonstration
dpcla-run-all – reproduce all experiments end-to-end

2 Running Experiments

Quick sanity check (CPU-friendly)

python run_all_experiments.py --quick

Full pipeline (single GPU, full corpus)

CUDA_VISIBLE_DEVICES=0 dpcla-run-all                # may take ~1-2 h on one GPU

Run a faster variant (paper-subset of 4 languages and 500-example cap):

CUDA_VISIBLE_DEVICES=0 dpcla-run-all --languages 3 5 8 12 \  # TyDiQA IDs used in the paper
                             --subset 500                     # trim dataset for a fast smoke test

All artefacts (JSON metrics, model checkpoints, plots) are written to results/ and plots/.

3 Configuration

Every experiment is driven by a YAML config. Run any script with --config path/to/file.yaml to override CLI defaults, e.g.

dpcla-run-all --config dpcla/config/xnli_dpcla.yaml

4 Human-evaluation stub

human_eval_sample.txt already contains 100 Swahili sentence pairs in the format expected by human_eval_instructions.md. If you prefer a different sample, copy the file and replace the sentences manually – no code changes are required.

5 Project layout

dpcla/                 ← core library (NumPy & PyTorch code)
scripts/               ← runnable shell helpers for XNLI / TyDiQA
tests/                 ← unit tests (92 % coverage)
.github/workflows/     ← CI: Black, isort, Flake8, pytest-cov
run_all_experiments.py ← master script (demo + ablation + metrics)
results/               ← JSON outputs (created on first run)
plots/                 ← Figures produced by the toy demo

Temporary artefacts results/, plots/ and any checkpoints/ directories can be deleted at any time – they will be regenerated on the next run.

6 CI / reproducibility

A GitHub Actions workflow (.github/workflows/ci.yml) verifies formatting (Black/isort), linting (Flake8), unit tests and >80 % coverage. The quick mode of run_all_experiments.py executes in CI to catch runtime regressions.

7 Citing DP-CLA

@inproceedings{dpcla2025,
  title     = {Differentially Private Cross-lingual Alignment},
  author    = {Anonymous},
  year      = {2025},
  booktitle = {-}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
dpcla		dpcla
tests		tests
tools		tools
.DS_Store		.DS_Store
.coverage		.coverage
.flake8		.flake8
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
USAGE.md		USAGE.md
human_eval_instructions.md		human_eval_instructions.md
human_eval_sample.txt		human_eval_sample.txt
pyproject.toml		pyproject.toml
run_all_experiments.py		run_all_experiments.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DP-CLA: Differentially Private Cross-lingual Alignment

1 Installation & Python version

2 Running Experiments

Quick sanity check (CPU-friendly)

Full pipeline (single GPU, full corpus)

3 Configuration

4 Human-evaluation stub

5 Project layout

6 CI / reproducibility

7 Citing DP-CLA

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DP-CLA: Differentially Private Cross-lingual Alignment

1 Installation & Python version

2 Running Experiments

Quick sanity check (CPU-friendly)

Full pipeline (single GPU, full corpus)

3 Configuration

4 Human-evaluation stub

5 Project layout

6 CI / reproducibility

7 Citing DP-CLA

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages