Skip to content

SMIL-SPCRAS/ExPAM

Repository files navigation

ExPAM: Explainable Personality Assessment Method using Heterogeneous Linguistic Features and Off-the-Shelf LLMs


Elena Ryumina, Dmitry Ryumin, Maxim Markitantov, Alexey Karpov


Abstract

Many organizations increasingly adopt personalization techniques to enhance user satisfaction. However, current systems generally cannot automatically infer and interpret individual Personality traits (PTs), which are key drivers of user behavior. While Large Language Models (LLMs) are widely used, they remain poorly suited to reliable and explainable Personality Assessment (PA). To address this gap, we propose ExPAM, a novel Explainable Personality Assessment Method that combines hybrid feature fusion with in-context learning in off-the-shelf LLMs to predict Big Five PTs from text. ExPAM explicitly grounds predictions in interpretable linguistic patterns without requiring LLMs fine-tuning. Its hybrid fusion is designed to improve both predictive performance and interpretability in PA. Transformer-based embeddings encode local contextual information, whereas features extracted using the Linguistic Inquiry and Word Count (LIWC) dictionary provide complementary global and local linguistic indicators of PTs. These interpretable feature patterns are included into prompts that guide the LLM to produce both PTs predictions and human-understandable explanations. ExPAM outperforms multi-task models on the ChaLearn First Impressions v2 (FIv2) corpus and single-task models on the PANDORA corpus that rely on a single feature set. On FIv2, it achieves a mean accuracy (mAC) of 0.891 and a Concordance Correlation Coefficient (CCC) of 0.333. On PANDORA, it achieves a mean Pearson Correlation Coefficient (PCC) of 0.240 and a CCC of 0.101. Prompting the LLM with hybrid global-local patterns further improves CCC by 9.9% on FIv2 and 15.8% on PANDORA. Qualitative interpretability analysis reveals trait-specific linguistic patterns, highlighting the potential of ExPAM for psychological research, computational linguistics, and paralinguistic studies.


Framework Pipeline

ExPAM Pipeline

Figure 1: Pipeline of ExPAM.


Materials

The project uses the FIv2 and PANDORA corpora. FIv2 includes 10K video recordings of more than 3000 individuals. PANDORA contains approximately 2.8M English Reddit comments from about 1.4K authors.

These corpara are available after registration. Both corpara have annotations on Big Five (OCEAN) personality model: Openness (O), Conscientiousness (C), Extraversion (E), Agreeableness (N), non-Neuroticism (N). For the FIv2 corpus, the speech transcription are automatically extracted using the Whisper model and the prepared data is available at src/prepered_dataframes. The PANDORA data can be downloaded from the official repository and prepared in a similar manner to FIv2 dataframes, taking into account the fold division proposed by the corpus authors.

Code Information

The codebase is structured as follows:

project_root/
├── llms/ # Testing LLMs
├── figures/ # visualizations
├── src/
│ ├── prepered_dataframes
│ │ ├── dev_full_with_ASR.csv
│ │ ├── test_full_with_ASR.csv
│ │ ├── train_full_with_ASR.csv
│ ├── datasets.py # Custom PyTorch datasets and collate functions
│ ├── losses.py # Loss functions (LogCoshGL, etc.)
│ ├── measures.py # Evaluation metrics (CCC, MAE)
│ ├── models.py # Model architectures (BiLSTMAtt, MambaAtt, fusion_model)
│ ├── text_preprocessing.py # Embedding extraction and LIWC feature generation
│ ├── training_utils.py # Training loops, early stopping, checkpointing
│ └── utils.py # Helper functions
├── get_attention_weights.py # Generate attention weights for test set
├── get_explanation_for_LLM.py # Generate hybrid-based explanations for LLMs
├── get_explanation_with_LLM.py # Generate trait explanations + LLM refinement
├── refine_with_llm.py # Code for refined predictions and explanations using an LLM
├── train_single_models.py # Train base models (XLM, LIWC)
├── train_fusion_model.py # Train ensemble/fusion model
├── transcribe_with_asr.py # Generate ASR transcripts from audio
└── README.md # This file

Usage Instructions

1. Setup Environment

# Clone repository
git clone https://github.com/yourname/ExPAM.git
cd ExPAM

# Install dependencies
pip install -r requirements.txt

2. Generate ASR Transcripts (Optional)

python transcribe_with_asr.py \
  --data_path "path/to/audio/" \
  --df_path "path/to/csvs/" \
  --whisper_model "openai/whisper-large-v3-turbo" \
  --device "cuda:0"

This will add text_ASR column to your CSVs. The prepared data is available at src/prepered_dataframes.

3. Train Single Models

Train single models on XLM-RoBERTa, JINA, BERT and LIWC features:

python train_single_models.py \
  --models BiLSTMAtt ReBiLSTMAtt MambaAtt ReMambaAtt \
  --encoders xlm jina-v3 bert bge liwc \
  --lrs 1e-5 1e-4\
  --dropouts 0.1 0.0\
  --hds 64 128\
  --epochs 60 \
  --seed 42 \
  --patience 10 \
  --bs 32 \
  --save_dir "saved_single_models"

4. Train Fusion Model

Combine predictions from best single models:

python train_fusion_model.py \
  --nn_model_path "saved_single_models/BEST MODEL BASED ON DEEP FEATURES" \
  --hc_model_path "saved_single_models/BEST MODEL BASED ON HAND-CRAFTED FEATURES" \
  --save_dir "saved_fusion_models" \
  --deep_model_architecture "BEST DEEP MODEL ARCHITECTURE" \
  --hc_model_architecture "BEST HAND-CRAFTED MODEL ARCHITECTURE" \
  --deep_encoder "BEST DEEP ENCODER" \
  --lr 1e-2\
  --epochs 500 \
  --seed 42 \
  --patience 100 \
  --bs 128 \
  --save_dir "saved_fusion_models"  

5. Generate Attention Weights for All Train / Test Examples

This step is necessary for interpreting the hybrid model results and generating explanations:

python get_attention_weights.py \
  --nn_model_path "saved_single_models/BEST MODEL BASED ON DEEP FEATURES" \
  --hc_model_path "saved_single_models/BEST MODEL BASED ON HAND-CRAFTED FEATURES" \
  --save_dir "saved_fusion_models" \
  --deep_model_architecture "BEST DEEP MODEL ARCHITECTURE" \
  --hc_model_architecture "BEST HAND-CRAFTED MODEL ARCHITECTURE" \
  --deep_encoder "BEST DEEP ENCODER" \
  --dataset_path "src/prepered_dataframes/train_full_with_ASR.csv" \
  --subset "train"

This step is required before running get_explanation_for_LLM.py and get_explanation_with_LLM.py.

6. Generate Explanations Without LLM for All Test Examples

To generate explanations for all test examples for building prompts for LLM, you should use:

python get_explanation_for_LLM.py \
--test_csv "src/prepered_dataframes/test_full_with_ASR.csv" \
--train_weights "train_attention_weights.pickle" \
--test_weights "test_attention_weights.pickle" \
--liwc_path "LIWC2007.txt" \
--save_pickle "test_explanations.pickle"

7. Refine Predictions and Explanation with LLM for All Test Examples

Several Large Language Models (LLMs) were evaluated in four different experimental setups (zero-shot, one-shot, few-shot, and explanation-based):

On FIv2, in terms of performance measures (mAC, CCC), Gemma4-31B outperformed the others. See Figure 2:

ExPAM Performance measures of LLMs.

Figure 2: Performance measures of LLM. ZS, OS, FS and EX refer to zero-, one-, few-shot and explanation-based setups. T means a thinking mode.

On Pandora, in terms of performance measures (mPCC, CCC), Gemma4-31B and Gemma4-E4B outperformed the others. See Figure 2:

ExPAM Performance measures of LLMs.

Figure 2: Performance measures of LLM. ZS, OS, FS and EX refer to zero-, one-, few-shot and explanation-based setups. T means a thinking mode.

To obtain refined predictions and explanations using an LLM, you should use:

python refine_with_llm.py \
--prompt_type explanation \
--prompt_pickle "test_explanations.pickle" \
--input_csv "src/prepered_dataframes/test_full_with_ASR.csv" \
--output_csv out_expl.csv \
--log_file log_expl.txt \
--llm_model_id tiiuae/Falcon-H1-7B-Instruct

8. Generate Explanations With / Without LLM for One Example

For a specific video and trait:

python get_explanation_with_LLM.py \
  --train_weights "train_attention_weights.pickle" \
  --test_weights "test_attention_weights.pickle" \
  --video_name "BSfClgoqf00.001" \
  --test_csv "test_full_with_ASR.csv" \
  --run_llm \
  --llm_model_path "tiiuae/Falcon-H1-7B-Instruct" \
  --output_dir "results/BSfClgoqf00.001" \

Methodology

  1. Data Preprocessing
  • Text normalization: lowercase, contraction expansion, punctuation removal.
  • Tokenization and embedding extraction via XLM-RoBERTa / JINA / BERT.
  • LIWC feature extraction per token using dictionary matching.
  1. Model Architecture
  • Single Models: BiLSTM + Attention (BiLSTMAtt) / Residual + BiLSTM + Attention (ReBiLSTMAtt) / Mamba + Attention (BiMambaAtt) / Residual + Mamba + Attention (ReMambaAtt) for each modality.
  • Fusion Model: Concatenates predictions from single models based on deep and hand-crafted features → single dense layer → sigmoid output.
  1. Interpretability
  • Global attention weights aggregated across training set.
  • Local token-level attention normalized and visualized.
  • Explanation generated based on top positive/negative tokens and categories.
  1. LLM integration
  • Prompt instructs LLM to reinterpret scores based on our explanation-based prompt, ignoring initial predictions unless supported.
  • Output: refined scores + natural language explanation (200 words).

Citations

If you use this work, please cite the following paper (currently under review):

@article{ryumina2026expam,
  title   = {ExPAM: Explainable Personality Assessment Method using Heterogeneous Linguistic Features and Off-the-Shelf LLMs},
  author  = {Ryumina, Elena and Ryumin, Dmitry and Markitantov, Maxim and Karpov, Alexey},
  journal = {Big Data and Cognitive Computing},
  year    = {2026},
  note    = {Under review}
}

License

This project is released under the MIT License — see LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages