Transformer-based embedding of posts regarding mental health from cancer patients and their caregivers
Posts from cancer patients and their caregivers on various platforms were analyzed using an unsupervised transformer-based neural network to generate a high-dimensional embedding vector space for visualizing and identifying the similarities and dissimilarities between the various posts. This analysis has the potential to provide insights into the emotional aspects of cancer patients' posts for a mental health study.
Over 10,000 posts from cancer patients and their caregivers on platforms like Reddit, Daily Strength, and the Health Board were collected (More details about the dataset, as well as the dataset, can be found here: https://www.kaggle.com/datasets/irinhoque/mental-health-insights-vulnerable-cancer-patients). The posts were related to five types of cancer: brain, colon, liver, leukemia, and lung cancer. Two team members scored each post based on the emotions expressed, using a scale from -2 to 1. Negative scores (-1 or -2) were given for posts showing grief or suffering, positive scores (1) for happy emotions like relief or accomplishment, and posts with no emotion received a score of 0 and were considered neutral.
Wang K, Reimers N, Gurevych I. TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning, In Conference on Empirical Methods in Natural Language Processing, 2021.
Paper link: https://arxiv.org/abs/2104.06979
- PyTorch (https://pytorch.org)
- Sentence Transformers (https://www.sbert.net)