Understanding Privacy Norms Around LLM-Based Chatbots: A Contextual Integrity Perspective
This is the data and code repository for the AIES'2025 paper Understanding Privacy Norms Around LLM-Based Chatbots: A Contextual Integrity Perspective, availiable at http://arxiv.org/abs/2508.06760
Paper Abstract: LLM-driven chatbots like ChatGPT have created unprecedented volumes of conversational data, yet little is known about user privacy expectations for this information. We surveyed 300 US ChatGPT users to understand privacy norms around chatbot data sharing using the contextual integrity framework. Our findings reveal a stark disconnect between user concerns and behavior. While 82% of respondents rated chatbot conversations as sensitive or highly sensitive—more than email or social media posts—nearly half reported discussing health topics and over one-third discussed personal finances with ChatGPT. Participants expressed strong privacy concerns (t(299) = 8.5, p
Preregistered Study: Contextual Integrity and Chatlogs: A Factorial Vignette Survey
Preregistered Study Amendments: Transparent Changes Document
Data additionally made available at: Replication Data for: Understanding Privacy Norms Around LLM-Based Chatbots: A Contextual Integrity Perspective
1. Repository Structure
├── README.md
├── data # PII and demographic information (i.e political affiliation) have been removed from the data
│ ├── aligned_data
│ │ ├── final_aligned_survey_data.csv # This CSV contains the data for evaluating the factorial vignettes
│ │ └── sensitivity_rankings.csv # This CSV contains the data for evaluating chat data sensitivity
│ ├── clean
│ │ └── cleaned_survey_data.csv
│ └── raw
│ └── raw_survey_data.csv
├── data_analysis
│ ├── chat_data_sensitivity.R # Contains the R code for evaluating 'Chat Data Sensitivity'
│ ├── linear_mixed_model.R # Contains the R code for the linear mixed models used to evaluate the factorial vignettes
│ └── privacy_control_questions.R # Contains the R code for the t-tests used to evaluate privacy attitudes and private data exchange value.
└── data_cleaning
│ ├── data_cleaning.py # Script used to create the final_aligned_survey_data.csv
│ └── sensitivity_ranking.py # Script used to create the sensitivity_rankings.csv
├── paper
└── Contextual-Integrity-AIES-2025.pdf # Full Paper, including Appendix
2. Paper & Citation
@article{tran2025understanding,
title={Understanding Privacy Norms Around LLM-Based Chatbots: A Contextual Integrity Perspective},
author={Tran, Sarah and Lu, Hongfan and Slaughter, Isaac and Herman, Bernease and Dangol, Aayushi and Fu, Yue and Chen, Lufei and Gebreyohannes, Biniyam and Howe, Bill and Hiniker, Alexis and Weber, Nicholas and Wolfe, Robert},
journal={arXiv preprint arXiv: http://arxiv.org/abs/2508.06760},
year={2025}
}3. Data Citation
@data{DVN/M6ABJ3_2025,
author={Sarah Tran and Robert Wolfe and Nicholas Weber},
publisher={Harvard Dataverse},
title={{Replication Data for: Understanding Privacy Norms Around LLM-Based Chatbots: A Contextual Integrity Perspective}},
UNF={UNF:6:J5vjU4szG7SZrFY9MPiKaQ==},
year={2025},
version={V1},
doi={10.7910/DVN/M6ABJ3},
url={https://doi.org/10.7910/DVN/M6ABJ3}
}