Skip to content

mert-ogul/dpcla

Repository files navigation

DP-CLA: Differentially Private Cross-lingual Alignment

Welcome to the consolidated, publication-ready repository for DP-CLA. This top-level README provides the fastest path to installation and reproducing the key experiments. For an in-depth walkthrough of every file see dpcla/README.md.


1 Installation & Python version

DP-CLA is tested on Python 3.9 – 3.11. Please do not use 3.12+ yet – many deep-learning wheels are still missing. A reliable workflow on macOS/Linux is:

# one-off: ensure the right interpreter is available
pyenv install 3.11.7        # skip if already present

# clone & enter the repo
git clone <your-fork-url> && cd dpcla_release

# activate a clean virtual-environment
pyenv local 3.11.7          # picks 3.11 inside this folder
python -m venv .venv && source .venv/bin/activate

# install in *editable* mode so CLI entry-points stay up-to-date
pip install -U pip setuptools wheel
pip install -e .            # pulls in numpy, torch, transformers, …

After installation two console scripts become available:

  • dpcla-demo – run the NumPy toy demonstration
  • dpcla-run-all – reproduce all experiments end-to-end

2 Running Experiments

Quick sanity check (CPU-friendly)

python run_all_experiments.py --quick

Full pipeline (single GPU, full corpus)

CUDA_VISIBLE_DEVICES=0 dpcla-run-all                # may take ~1-2 h on one GPU

Run a faster variant (paper-subset of 4 languages and 500-example cap):

CUDA_VISIBLE_DEVICES=0 dpcla-run-all --languages 3 5 8 12 \  # TyDiQA IDs used in the paper
                             --subset 500                     # trim dataset for a fast smoke test

All artefacts (JSON metrics, model checkpoints, plots) are written to results/ and plots/.


3 Configuration

Every experiment is driven by a YAML config. Run any script with --config path/to/file.yaml to override CLI defaults, e.g.

dpcla-run-all --config dpcla/config/xnli_dpcla.yaml

4 Human-evaluation stub

human_eval_sample.txt already contains 100 Swahili sentence pairs in the format expected by human_eval_instructions.md. If you prefer a different sample, copy the file and replace the sentences manually – no code changes are required.


5 Project layout

dpcla/                 ← core library (NumPy & PyTorch code)
scripts/               ← runnable shell helpers for XNLI / TyDiQA
tests/                 ← unit tests (92 % coverage)
.github/workflows/     ← CI: Black, isort, Flake8, pytest-cov
run_all_experiments.py ← master script (demo + ablation + metrics)
results/               ← JSON outputs (created on first run)
plots/                 ← Figures produced by the toy demo

Temporary artefacts results/, plots/ and any checkpoints/ directories can be deleted at any time – they will be regenerated on the next run.

6 CI / reproducibility

A GitHub Actions workflow (.github/workflows/ci.yml) verifies formatting (Black/isort), linting (Flake8), unit tests and >80 % coverage. The quick mode of run_all_experiments.py executes in CI to catch runtime regressions.


7 Citing DP-CLA

@inproceedings{dpcla2025,
  title     = {Differentially Private Cross-lingual Alignment},
  author    = {Anonymous},
  year      = {2025},
  booktitle = {-}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors