Flexible Thinking for Multimodal Emotional Support Conversation via Reinforcement Learning

Fanfan Wang, Xiangqing Shen, Jianfei Yu*, and Rui Xia*

Emotional Support Conversation (ESC) systems aim to alleviate user distress. However, current Chain-of-Thought based ESC methods often employ rigid, text-only reasoning, limiting adaptability in dynamic, multimodal interactions and introducing reasoning noise that degrades support quality. To address this, we introduce "Flexible Thinking" for multimodal ESC, enabling models to adaptively select contextually relevant thinking aspects: Visual Scene, Emotion, Situation, and Response Strategy. We first construct training data by manually curating flexible thinking demonstrations on the MESC dataset, then using a Multimodal Large Language Model to synthesize these processes for the full training set. Then, we propose FIRES, a framework integrating Supervised Fine-Tuning (SFT) for initial learning with Reinforcement Learning for refinement. This two-stage approach helps FIRES transcend SFT’s generalization limits and, crucially, directly links thinking processes to response quality via tailored rewards, moving beyond imitating potentially imperfect synthetic data. Experiments on MESC and EMOTyDA datasets demonstrate FIRES’s effectiveness and generalizability in fostering higher-quality emotional support responses through adaptive reasoning.

🛠️ Installation

📂 Data Preparation

Based on the timestamps provided by the MESC dataset, we extract video clips from raw episodes of In Treatment via FFmpeg: ffmpeg -i {video_file} -vf select='eq(pict_type\,I)' -vsync vfr -f image2 frame_%d.png. We concatenate consecutive utterances from the same speaker to consolidate their conversational turns, and designate each therapist's turn as the target response for a instance, with preceding utterances serving as the conversation history.

The processed data files for training and inference are located in the ⁠data/ folder.

🏋️ Training & Evaluation

Backbone: Qwen2.5-VL-7B-Instruct

📝 Citation

If you find this repo useful in your research, please consider citing:

@inproceedings{wang2025flexible,
  title={Flexible Thinking for Multimodal Emotional Support Conversation via Reinforcement Learning},
  author={Wang, Fanfan and Shen, Xiangqing and Yu, Jianfei and Xia, Rui},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2025},
  pages={1341--1356},
  year={2025}
}

🙏 Acknowledgement

Our implementation benefits from ms-swift, LlamaFactory, ESC, bert_score and EmpGPT-3. We appreciate their valuable contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flexible Thinking for Multimodal Emotional Support Conversation via Reinforcement Learning

🛠️ Installation

📂 Data Preparation

🏋️ Training & Evaluation

📝 Citation

🙏 Acknowledgement

About

Uh oh!

Releases

Packages

License

NUSTM/FIRES

Folders and files

Latest commit

History

Repository files navigation

Flexible Thinking for Multimodal Emotional Support Conversation via Reinforcement Learning

🛠️ Installation

📂 Data Preparation

🏋️ Training & Evaluation

📝 Citation

🙏 Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages