[ACL'2025 Oral] The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models

📢 If you are interested in our work, please star ⭐ our project.

🌈 Introduction

Ensuring awareness of fairness and privacy in Large Language Models (LLMs) is critical. Interestingly, we discover a counter-intuitive trade-off phenomenon that enhancing an LLM's privacy awareness through Supervised Fine-Tuning (SFT) methods significantly decreases its fairness awareness with thousands of samples. To address this issue, inspired by the information theory, we introduce a training-free method to Suppress the Privacy and faIrness coupled Neurons (SPIN), which theoretically and empirically decrease the mutual information between fairness and privacy awareness.

Extensive experimental results demonstrate that SPIN eliminates the trade-off phenomenon and significantly improves LLMs' fairness and privacy awareness simultaneously without compromising general capabilities, e.g., improving Qwen-2-7B-Instruct's fairness awareness by 12.2% and privacy awareness by 14.0%.
More crucially, SPIN remains robust and effective with limited annotated data or even when only malicious fine-tuning data is available, whereas SFT methods may fail to perform properly in such scenarios.
Furthermore, we show that SPIN could generalize to other potential trade-off dimensions.

We hope this study provides valuable insights into concurrently addressing fairness and privacy concerns in LLMs and can be integrated into comprehensive frameworks to develop more ethical and responsible AI systems.

🚩Main Results

SPIN enhances LLM’s awareness of fairness and privacy simultaneously without compromising general capabilities.

SPIN remains robust even when only malicious fine-tuning data is available.

SPIN encourages the model to produce more cautionary language related to fairness and privacy.

🚀Quick Start

🔧Requirements

The following pakages are required to run the code:

python==3.11.5
pytorch==2.1.2
transformers==4.40.0
datasets==2.18.0

🌟Usage

1. Compute and save the importance score

cd src/

datasets=(
    "beaver_train330k_privacy_safe_1k"
    "beaver_train330k_fairness_safe_1k"
    "alpaca_cleaned_no_safety"
)

for dataset in "${datasets[@]}"; do
    python compute_importance_score.py \
    --model your_model \
    --model_path your_model_path \
    --nsamples 128 \
    --dataset $dataset
done

2. Run and evaluate SPIN

cd src/

python main.py \
--model your_model \
--model_path your_model_path \
--nsamples 128 \
--dataset1 beaver_train330k_privacy_safe_1k \
--dataset2 beaver_train330k_fairness_safe_1k \
--target_module mlp \
--p 5e-7 \
--q 5e-7

📝License

Distributed under the Apache-2.0 License. See LICENSE for more information.

Acknowledgements

Some code in this project is adapted from resources provided by the following repositories:

We greatly appreciate the contributions of the original authors.

📖BibTeX

@article{qian2024tug,
  title={The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models},
  author={Qian, Chen and Liu, Dongrui and Zhang, Jie and Liu, Yong and Shao, Jing},
  journal={arXiv preprint arXiv:2410.16672},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ACL'2025 Oral] The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models

🌈 Introduction

🚩Main Results

🚀Quick Start

🔧Requirements

🌟Usage

📝License

Acknowledgements

📖BibTeX

About

Uh oh!

Releases

Packages

Languages

License

ChnQ/SPIN

Folders and files

Latest commit

History

Repository files navigation

[ACL'2025 Oral] The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models

🌈 Introduction

🚩Main Results

🚀Quick Start

🔧Requirements

🌟Usage

📝License

Acknowledgements

📖BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages