FSENet: Face-Guided Sentiment Boundary Enhancement for Weakly-Supervised Temporal Sentiment Localization

Overview

🎉 FSENet is accepted by CVPR 2026!

Abstract

Point-level weakly-supervised temporal sentiment localization (P-WTSL) aims to detect sentiment-relevant segments in untrimmed multimodal videos using timestamp sentiment annotations, which greatly reduces the costly frame-level labeling. To further tackle the challenges of imprecise sentiment boundaries in P-WTSL, we propose the Face-guided Sentiment Boundary Enhancement Network FSENet, a unified framework that leverages fine-grained facial features to guide sentiment localization. Specifically, our approach first introduces the Face-guided Sentiment Discovery (FSD) module, which integrates facial features into multimodal interaction via dual-branch modeling for effective sentiment stimuli clues; We then propose the Point-aware Sentiment Semantics Contrast (PSSC) strategy to discriminate sentiment semantics of candidate points (frame-level) near annotation points via contrastive learning, thereby enhancing the model's ability to recognize sentiment boundaries. At last, we design the Boundary-aware Sentiment Pseudo-label Generation (BSPG) approach to convert sparse point annotations into temporally smooth supervisory pseudo-labels. Extensive experiments and visualizations on the benchmark demonstrate the effectiveness of our framework, achieving state-of-the-art performance under full supervision, video-level, and point-level weak supervision, thereby showcasing the strong generalization ability of our FSENet across different annotation settings.

Getting Started

📋 Environment Requirements

Python 3.9+
PyTorch 2.0+

🚀 Installation

# Clone the repository
git clone https://github.com/CeilingHan/FSENet.git
cd FSENet

# Install dependencies
pip install -r core/requirements.txt

📁 Dataset Preparation

The project uses two datasets: CMU- MOSEI and TSL300, available on Hugging Face:

https://huggingface.co/datasets/Cerilong/Sentiment_Localization/upload/main

💾 Download Dataset

# Using huggingface-cli
pip install huggingface-hub
huggingface-cli download Cerilong/Sentiment_Localization --local-dir ./dataset

# Or using Python
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="Cerilong/Sentiment_Localization",
    local_dir="./dataset",
    repo_type="dataset"
)

📁 Dataset Structure

dataset/
├── CMU_SOMI/               # CMU_SOMI dataset
│   ├── features/       # Features directory
│   │   ├── train/      # Training set features
│   │   │   ├── rgb/    # RGB features
│   │   │   ├── logmfcc/ # MFCC features
│   │   │   └── img/    # Face features
│   │   └── test/       # Test set features
│   │       ├── rgb/    # RGB features
│   │       ├── logmfcc/ # MFCC features
│   │       └── img/    # Face features
│   ├── gt.json         # Ground truth annotations
│   ├── fps_dict.json   # Video frame rate information
│   ├── split_train.txt # Training set split
│   ├── split_test.txt  # Test set split
│   └── point_gaussian/ # Point annotations
└── VideoSenti/            # VideoSenti dataset
    └── ...             # Similar structure to CMU_SOMI

🔧 Model Training

1. Configuration

Modify the dataset path in core/options.py:

parser.add_argument('--data_path', type=str, default='./dataset/VideoSenti')

2. Training Script

Use run_train.sh to start training:

cd core
bash run_train.sh

Or run directly:

cd core
python main.py

📊 Evaluation

1. Evaluation Script

Use run_eval.sh to evaluate the model:

cd core
bash run_eval.sh

Or run directly:

cd core
python main_eval.py --data_path ./dataset/VideoSenti --model_file ./models/train/model_seed_123.pkl

📝 Cite Us

If you find this work useful, please cite our paper:

@inproceedings{FSENet2026,
 title={Face-Guided Sentiment Boundary Enhancement for Weakly-Supervised Temporal Sentiment Localization}, 
      author={Cailing Han and Zhangbin Li and Jinxing Zhou and Wei Qian and Jingjing Hu and Yanghao Zhou and Zhangling Duan and Dan Guo},
      journal = {arXiv preprint arXiv:2511.13719},
      year={2026}
}

🙏 Acknowledgements

We referenced the repos below for the code.

TSL300 Dataset Contributors: For creating and sharing the TSL300 dataset for sentiment localization research. GitHub
SF-Net Contributors: For their pioneering work on single-frame supervision for temporal localization. GitHub
Learning Action Completeness from Points Contributors: For their insights on learning completeness from point annotations. GitHub
HR-Pro Contributors: For their work on hierarchical reliability propagation for point-supervised localization. GitHub
Hugging Face: For providing the platform to host our datasets, making them easily accessible to the research community.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Ablation llama		Ablation llama
assets		assets
core		core
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSENet: Face-Guided Sentiment Boundary Enhancement for Weakly-Supervised Temporal Sentiment Localization

Overview

Abstract

Getting Started

📋 Environment Requirements

🚀 Installation

📁 Dataset Preparation

💾 Download Dataset

📁 Dataset Structure

🔧 Model Training

1. Configuration

2. Training Script

📊 Evaluation

1. Evaluation Script

📝 Cite Us

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FSENet: Face-Guided Sentiment Boundary Enhancement for Weakly-Supervised Temporal Sentiment Localization

Overview

Abstract

Getting Started

📋 Environment Requirements

🚀 Installation

📁 Dataset Preparation

💾 Download Dataset

📁 Dataset Structure

🔧 Model Training

1. Configuration

2. Training Script

📊 Evaluation

1. Evaluation Script

📝 Cite Us

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages