A comprehensive Python package for machine unlearning in large language models, enabling efficient removal of unwanted knowledge while preserving model utility.
Unlearun addresses the critical need to remove specific knowledge from trained language models without expensive retraining. This is essential for:
- Privacy Compliance: GDPR "right to be forgotten" requirements
- Copyright Protection: Removing copyrighted content from models
- AI Safety: Eliminating harmful or dangerous knowledge
- Model Correction: Fixing outdated or incorrect information
- 5 State-of-the-Art Methods: GradAscent, GradDiff, DPO, RMU, SimNPO
- Simple High-Level API: Get started with just a few lines of code
- Comprehensive Evaluation: Built-in metrics for forget quality, utility preservation, and privacy
- Flexible Data Loading: Support for JSON, JSONL, HuggingFace datasets, and Python lists
- Production Ready: Extensive test coverage and benchmarking
- HuggingFace Integration: Seamless integration with
transformersandaccelerate
pip install unlearunOr install from source:
git clone https://github.com/shashuat/unlearun.git
cd unlearun
pip install -e .from unlearun import Unlearning
# Initialize unlearner with RMU method
unlearner = Unlearning(
method="rmu",
model="gpt2-medium",
output_dir="./unlearned_model"
)
# Load your data
forget_data = [
{"question": "What is the capital of France?", "answer": "Paris"},
{"question": "Who wrote Romeo and Juliet?", "answer": "Shakespeare"}
]
retain_data = [
{"question": "What is the capital of Germany?", "answer": "Berlin"},
{"question": "Who painted the Mona Lisa?", "answer": "Leonardo da Vinci"}
]
unlearner.load_data(
forget_data=forget_data,
retain_data=retain_data,
max_length=128
)
# Run unlearning
unlearner.run(
batch_size=4,
learning_rate=5e-5,
num_epochs=3
)
# Evaluate results
results = unlearner.evaluate(
metrics=["perplexity", "forget_quality", "model_utility"]
)
print(f"Forget Quality: {results['forget_quality']:.4f}")
print(f"Model Utility: {results['model_utility']:.4f}")| Method | Description | Reference Model | Best For |
|---|---|---|---|
| RMU | Representation Misdirection for Unlearning | Required | Safety-critical applications, robust forgetting |
| GradDiff | Gradient Difference (ascent on forget, descent on retain) | Optional | Balanced forget/retain trade-off |
| DPO | Direct Preference Optimization | Required | Preference-based unlearning with alternate answers |
| SimNPO | Simple Negative Preference Optimization | Not required | Stable unlearning without reference model |
| GradAscent | Gradient Ascent on forget set | Not required | Simple baseline, quick experiments |
# For safety-critical unlearning (e.g., removing hazardous knowledge)
unlearner = Unlearning(method="rmu", model="model_name", adaptive=True)
# For balanced forgetting with good retain data
unlearner = Unlearning(method="grad_diff", model="model_name",
gamma=1.0, alpha=1.0)
# When you have alternate acceptable answers
unlearner = Unlearning(method="dpo", model="model_name", beta=1.0)
# For simple, stable unlearning
unlearner = Unlearning(method="simnpo", model="model_name")
# Quick baseline for experiments
unlearner = Unlearning(method="grad_ascent", model="model_name")from unlearun import Unlearning
# RMU is the most robust method for safety-critical unlearning
unlearner = Unlearning(
method="rmu",
model="gpt2",
output_dir="./rmu_model",
# RMU-specific parameters
steering_coeff=1.0, # Steering strength
target_layer=8, # Which transformer layer to steer
adaptive=True # Use adaptive coefficient (recommended)
)
# Load data from JSON files
unlearner.load_data(
forget_data="forget_set.json",
retain_data="retain_set.json",
max_length=128
)
# Configure training
unlearner.run(
batch_size=4,
learning_rate=1e-5,
num_epochs=3,
gradient_accumulation_steps=2,
warmup_steps=100,
logging_steps=10
)
# Comprehensive evaluation
results = unlearner.evaluate(
metrics=[
"perplexity",
"forget_quality",
"model_utility",
"rouge",
"verbatim_memorization",
"mia"
]
)from unlearun import Unlearning
# GradDiff with KL divergence for smoother retain preservation
unlearner = Unlearning(
method="grad_diff",
model="gpt2-medium",
output_dir="./graddiff_model",
# GradDiff-specific parameters
gamma=1.0, # Weight for forget loss
alpha=1.0, # Weight for retain loss
retain_loss_type="KL" # Use KL divergence (requires ref model)
)
unlearner.load_data(
forget_data="forget.json",
retain_data="retain.json"
)
unlearner.run(
batch_size=2,
learning_rate=5e-5,
num_epochs=5
)from datasets import load_dataset
from unlearun import Unlearning
# Load from HuggingFace Hub
forget_dataset = load_dataset("your_username/forget_dataset", split="train")
retain_dataset = load_dataset("your_username/retain_dataset", split="train")
unlearner = Unlearning(
method="simnpo",
model="meta-llama/Llama-2-7b-hf",
output_dir="./unlearned_llama"
)
unlearner.load_data(
forget_data=forget_dataset,
retain_data=retain_dataset,
question_key="prompt", # Specify your column names
answer_key="completion",
max_length=512
)
unlearner.run(batch_size=1, num_epochs=3)from unlearun import Unlearning
from unlearun.evaluation import (
compute_perplexity,
compute_verbatim_memorization,
compute_mia
)
# After training
unlearner = Unlearning(method="rmu", model="gpt2", output_dir="./model")
unlearner.load_data(forget_data="forget.json", retain_data="retain.json")
unlearner.run(batch_size=2, num_epochs=3)
# Custom evaluation with specific parameters
forget_ppl = compute_perplexity(
model=unlearner.model,
dataset=unlearner.forget_dataset,
tokenizer=unlearner.tokenizer,
batch_size=4
)
# Check for verbatim memorization
verbatim_score = compute_verbatim_memorization(
model=unlearner.model,
forget_dataset=unlearner.forget_dataset,
tokenizer=unlearner.tokenizer,
prefix_length=50,
max_new_tokens=100,
num_samples=100
)
# Membership inference attack
mia_score = compute_mia(
model=unlearner.model,
forget_dataset=unlearner.forget_dataset,
retain_dataset=unlearner.retain_dataset,
tokenizer=unlearner.tokenizer,
batch_size=4
)
print(f"Forget Perplexity: {forget_ppl:.2f}")
print(f"Verbatim Memorization: {verbatim_score:.4f}")
print(f"MIA AUROC: {mia_score:.4f}")The package includes comprehensive evaluation metrics:
- Perplexity: Measures how "forgotten" the data is (higher = better)
- Verbatim Memorization: ROUGE score between generated and ground truth
- Knowledge Retention: QA accuracy on forget topics
- Model Utility: Performance on retain set
- General Knowledge: Evaluation on holdout data
- Task Performance: Accuracy on downstream tasks
- Membership Inference Attack (MIA): Resistance to privacy attacks
- Extraction Attack: Difficulty of extracting forgotten data
unlearun/
βββ unlearun/
β βββ __init__.py # Package entry point
β βββ core.py # High-level Unlearning class
β βββ methods/ # Unlearning methods
β β βββ grad_ascent.py
β β βββ grad_diff.py
β β βββ dpo.py
β β βββ rmu.py
β β βββ simnpo.py
β βββ data/ # Data handling
β β βββ dataset.py
β β βββ collators.py
β βββ trainer/ # Custom trainer
β β βββ trainer.py
β βββ utils/ # Utilities
β β βββ losses.py
β β βββ helpers.py
β βββ evaluation/ # Evaluation metrics
β βββ metrics.py
βββ tests/ # Test suite
β βββ test_unlearning.py
βββ pyproject.toml # Package configuration
βββ requirements.txt # Dependencies
βββ README.md # This file
Run the test suite:
# Install dev dependencies
pip install -e ".[dev]"
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=unlearun --cov-report=html
# Skip slow tests
pytest tests/ -v -m "not slow"- Python β₯ 3.8
- PyTorch β₯ 2.0.0
- Transformers β₯ 4.30.0
- Datasets β₯ 2.12.0
- Accelerate β₯ 0.20.0
See requirements.txt for full dependency list.
The package is compatible with standard unlearning benchmarks:
- TOFU (Task of Fictitious Unlearning for LLMs)
- WMDP (Weapons of Mass Destruction Proxy)
- MUSE (Machine Unlearning Six-Way Evaluation)
# Example: Evaluate on TOFU benchmark
from datasets import load_dataset
tofu_forget = load_dataset("locuslab/TOFU", "forget01", split="train")
tofu_retain = load_dataset("locuslab/TOFU", "retain99", split="train")
unlearner = Unlearning(method="rmu", model="phi-1.5")
unlearner.load_data(forget_data=tofu_forget, retain_data=tofu_retain)
unlearner.run(batch_size=2, num_epochs=3)
results = unlearner.evaluate()We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Ensure all tests pass (
pytest tests/) - Format code (
black unlearun/ tests/) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
git clone https://github.com/shashuat/unlearun.git
cd unlearun
pip install -e ".[dev]"
pre-commit install # Optional: for automatic formattingThis project is licensed under the MIT License - see the LICENSE file for details.
If you use Unlearun in your research, please cite:
@software{unlearun2025,
title = {Unlearun: Machine Unlearning for Fine-tuned LLMs},
author = {Your Name},
year = {2025},
url = {https://github.com/shashuat/unlearun},
version = {0.1.0}
}This package implements methods from:
@inproceedings{li2024wmdp,
title={The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning},
author={Li, Nathaniel and Pan, Alexander and others},
booktitle={ICML},
year={2024}
}
@inproceedings{rafailov2023dpo,
title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model},
author={Rafailov, Rafael and Sharma, Archit and others},
booktitle={NeurIPS},
year={2023}
}
@inproceedings{maini2024tofu,
title={TOFU: A Task of Fictitious Unlearning for LLMs},
author={Maini, Pratyush and Feng, Zhili and others},
booktitle={COLM},
year={2024}
}- Built on HuggingFace Transformers
- Inspired by research from CMU, Stanford, and other leading institutions
- Thanks to the machine unlearning research community
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: your.email@example.com
- Documentation: Full Documentation
- PyPI: Package on PyPI
- Paper: arXiv (coming soon)
- WMDP Benchmark: https://www.wmdp.ai/
- TOFU Benchmark: https://github.com/locuslab/tofu
Status: Active Development | Version: 0.1.0 | Last Updated: October 2025
Made with β€οΈ for AI Safety and Privacy