Skip to content

PRISM-AILAB/MCHPM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCHPM

Official implementation of:

Lim, H., Park, S., Li, Q., Li, X., & Kim, J. (2026). What makes a review helpful? A multimodal prediction model in e-commerce. Electronic Commerce Research and Applications, 76, 101586. Paper

Overview

This repository is the official implementation of MCHPM (Multimodal Cue-based Helpfulness Prediction Model), published in Electronic Commerce Research and Applications (2026).

Most multimodal review helpfulness prediction (MRHP) models rely on deep semantic representations of text and images and overlook surface-level cues such as readability, sentiment intensity, and image quality. MCHPM addresses this gap by drawing on the Elaboration Likelihood Model (ELM) from consumer psychology, which describes how readers process information through two parallel routes — a central route based on careful cognitive engagement, and a peripheral route based on superficial heuristics.

For each modality (text and image), MCHPM extracts both central cues (deep semantic representations from BERT and VGG-16) and peripheral cues (surface-level features like readability and image clarity). Within each modality, central and peripheral cues are integrated through co-attention; the resulting text and image representations are then fused via a Gated Multimodal Unit (GMU) that adaptively weights the two modalities.

The model predicts a continuous review-helpfulness score, defined as log(1 + helpful_vote), as a regression target. Quantitative comparisons against unimodal and multimodal baselines on large-scale Amazon datasets are reported in Experimental Results.

Repository Structure

├── data/
│   ├── raw/                        # Source datasets — place {fname}.jsonl.gz here
│   ├── processed/                  # Pipeline parquet caches (labeled / cued)
│   └── review_images/              # Downloaded review images, grouped by dataset name
│
├── model/
│   ├── mchpm.py                    # MCHPM architecture, trainer, tester
│   ├── mchpm_architecture.png      # Architecture diagram
│   └── save/                       # Best checkpoint per dataset (best.pth)
│
├── src/
│   ├── config.yaml                 # Single source of truth for all hyperparameters
│   ├── data_processing.py          # DataProcessor pipeline + DataLoader factory
│   ├── text_cue_extractor.py       # BERT central + peripheral text cues
│   ├── image_cue_extractor.py      # VGG-16 central + peripheral image cues
│   ├── review_image_downloader.py  # Parallel review image downloader (cache-aware)
│   ├── text_processing.py          # Review-text cleaning and row filters
│   ├── path.py                     # Project path constants (auto-creates runtime folders)
│   └── utils.py                    # Generic helpers — I/O, metrics, seeding
│
├── main.py                         # Entry point: data preparation → train → test
├── requirements.txt
└── README.md

Model Description

MCHPM consists of three sequential modules. Cue extraction runs in src/text_cue_extractor.py and src/image_cue_extractor.py; the integration and fusion network is in model/mchpm.py. The full architecture is illustrated below.

MCHPM Architecture

1. Multi-Cue Extraction Module

Extracts central and peripheral cues from review text and images in parallel.

Central cues (deep semantic representations):

  • Text: BERT [CLS] embedding
  • Image: VGG-16 fc2 activation

Peripheral cues (surface-level features):

  • Text — polarity, subjectivity, readability, extremity
  • Image — brightness, contrast, saturation, edge intensity

2. Cue-Integration Module

Within each modality, central and peripheral representations attend to each other through co-attention (CoAttentionBlock): central queries peripheral, peripheral queries central, and the two attended outputs are combined via element-wise multiplication. The same pattern is applied independently to the text and image sides, yielding modality-specific integrated vectors.

3. Multimodal Fusion Module

The integrated text and image vectors are passed through tanh projections, then fused by a Gated Multimodal Unit (MCHPM.gate_layer). A sigmoid gate, computed from the concatenated representations, adaptively weights the contribution of each modality. The fused vector is forwarded to an MLP regressor (MCHPM.regressor) that outputs the predicted helpfulness score.

How to Run

Configuration

All hyperparameters live in src/config.yaml — it is the single source of truth. Defaults reproduce the paper experiments.

A CUDA-capable GPU is recommended; main.py falls back to CPU with a warning if CUDA is unavailable. See requirements.txt for the GPU wheel and CPU-only setup.

End-to-end run:

conda create -n mchpm python=3.11
conda activate mchpm
pip install -r requirements.txt
python main.py

Data Preparation

Place the dataset as data/raw/{fname}.jsonl.gz where {fname} matches data.fname in config.yaml. The file is read as gzipped JSON-lines (one review object per line) — each line must carry the columns below, or the run aborts at load with a KeyError.

Column Role
user_id Reviewer id (non-null; also disambiguates downloaded image filenames).
parent_asin Product id (non-null).
timestamp Epoch-millisecond review time → review_date.
text Review body → raw_review; cleaned and fed to BERT.
images List of review-image URLs → review_images; rows with no image are dropped.
helpful_vote Helpful-vote count; the label is log(1 + helpful_vote), and zero / missing-vote rows are dropped.
verified_purchase Boolean flag; only verified-purchase reviews are kept.

Any other columns are ignored. The pipeline writes two cache layers under data/processed/:

  • {fname}_labeled.parquet — written after the row filters, text cleaning, and label construction.
    • Columns: user_id, parent_asin, timestamp, review_date, raw_review, clean_review, review_images, helpful_vote, label (the regression target log(1 + helpful_vote)).
  • {fname}_cued.parquet — adds the downloaded image paths and the extracted cues.
    • Columns: the labeled columns + review_image_paths, review_text_central / review_text_peripheral (BERT semantic + readability/sentiment text cues), review_image_central / review_image_peripheral (VGG-16 semantic + clarity image cues).

To reuse externally-extracted BERT/VGG features, save the data as {fname}_labeled.parquet with review_text_central and/or review_image_central columns pre-populated. The pipeline will skip BERT/VGG and only compute peripheral cues.

Re-runs and caching

On every python main.py, the pipeline resumes from the most-complete cache on disk, checking newest-first (cued → labeled → image folder) and falling through to the next-earliest stage. The train/test split is rebuilt fresh in memory each run, so changes to test_size, seed, or val_ratio take effect on the next run. To re-trigger an upstream stage, delete its parquet (or the data/review_images/{fname}/ folder for image re-downloads).

Experimental Results

MCHPM was evaluated on two large-scale Amazon review datasets: Cell Phones & Accessories and Electronics. The results demonstrate that MCHPM consistently outperforms strong unimodal and multimodal baselines across all evaluation metrics, achieving average improvements of 3.864% in MAE, 4.061% in MSE, 2.172% in RMSE, and 6.349% in MAPE compared with the strongest benchmark model.

Model Cell Phones & Accessories Electronics
MAE MSE RMSE MAPE MAE MSE RMSE MAPE
LSTM 0.6470.8210.84956.702 0.7110.8960.94657.678
TNN 0.6430.7140.84556.650 0.7220.9040.85159.556
DMAF 0.6250.6910.83653.139 0.6970.8800.93955.198
CS-IMD 0.6150.6810.82552.392 0.6870.8310.91256.032
MFRHP 0.6250.6950.83753.116 0.6950.8400.91657.488
MCHPM (Proposed) 0.6070.6790.82450.706 0.6740.8250.90853.712

Citation

If you use this repository in your research, please cite:

@article{LIM2026101586,
  title = {What makes a review helpful? A multimodal prediction model in e-commerce},
  author = {Heena Lim and Seonu Park and Qinglong Li and Xinzhe Li and Jaekyeong Kim},
  journal = {Electronic Commerce Research and Applications},
  volume = {76},
  pages = {101586},
  year = {2026},
  doi = {10.1016/j.elerap.2026.101586}  
}

Contact

For research inquiries or collaborations, please contact:

Seonu Park
Ph.D. Student, Department of Big Data Analytics
Kyung Hee University
Email: sunu0087@khu.ac.kr

Qinglong Li
Assistant Professor, Division of Computer Engineering
Hansung University
Email: leecy@hansung.ac.kr

Last updated: June 2026

About

About Official implementation of "What makes a review helpful? A multimodal prediction model in e-commerce" (Electronic Commerce Research and Applications, 2026)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages