Official implementation of:
Lim, H., Park, S., Li, Q., Li, X., & Kim, J. (2026). What makes a review helpful? A multimodal prediction model in e-commerce. Electronic Commerce Research and Applications, 76, 101586. Paper
This repository is the official implementation of MCHPM (Multimodal Cue-based Helpfulness Prediction Model), published in Electronic Commerce Research and Applications (2026).
Most multimodal review helpfulness prediction (MRHP) models rely on deep semantic representations of text and images and overlook surface-level cues such as readability, sentiment intensity, and image quality. MCHPM addresses this gap by drawing on the Elaboration Likelihood Model (ELM) from consumer psychology, which describes how readers process information through two parallel routes — a central route based on careful cognitive engagement, and a peripheral route based on superficial heuristics.
For each modality (text and image), MCHPM extracts both central cues (deep semantic representations from BERT and VGG-16) and peripheral cues (surface-level features like readability and image clarity). Within each modality, central and peripheral cues are integrated through co-attention; the resulting text and image representations are then fused via a Gated Multimodal Unit (GMU) that adaptively weights the two modalities.
The model predicts a continuous review-helpfulness score, defined as log(1 + helpful_vote), as a regression target. Quantitative comparisons against unimodal and multimodal baselines on large-scale Amazon datasets are reported in Experimental Results.
├── data/
│ ├── raw/ # Source datasets — place {fname}.jsonl.gz here
│ ├── processed/ # Pipeline parquet caches (labeled / cued)
│ └── review_images/ # Downloaded review images, grouped by dataset name
│
├── model/
│ ├── mchpm.py # MCHPM architecture, trainer, tester
│ ├── mchpm_architecture.png # Architecture diagram
│ └── save/ # Best checkpoint per dataset (best.pth)
│
├── src/
│ ├── config.yaml # Single source of truth for all hyperparameters
│ ├── data_processing.py # DataProcessor pipeline + DataLoader factory
│ ├── text_cue_extractor.py # BERT central + peripheral text cues
│ ├── image_cue_extractor.py # VGG-16 central + peripheral image cues
│ ├── review_image_downloader.py # Parallel review image downloader (cache-aware)
│ ├── text_processing.py # Review-text cleaning and row filters
│ ├── path.py # Project path constants (auto-creates runtime folders)
│ └── utils.py # Generic helpers — I/O, metrics, seeding
│
├── main.py # Entry point: data preparation → train → test
├── requirements.txt
└── README.mdMCHPM consists of three sequential modules. Cue extraction runs in src/text_cue_extractor.py and src/image_cue_extractor.py; the integration and fusion network is in model/mchpm.py. The full architecture is illustrated below.
Extracts central and peripheral cues from review text and images in parallel.
Central cues (deep semantic representations):
- Text: BERT
[CLS]embedding - Image: VGG-16
fc2activation
Peripheral cues (surface-level features):
- Text — polarity, subjectivity, readability, extremity
- Image — brightness, contrast, saturation, edge intensity
Within each modality, central and peripheral representations attend to each other through co-attention (CoAttentionBlock): central queries peripheral, peripheral queries central, and the two attended outputs are combined via element-wise multiplication. The same pattern is applied independently to the text and image sides, yielding modality-specific integrated vectors.
The integrated text and image vectors are passed through tanh projections, then fused by a Gated Multimodal Unit (MCHPM.gate_layer). A sigmoid gate, computed from the concatenated representations, adaptively weights the contribution of each modality. The fused vector is forwarded to an MLP regressor (MCHPM.regressor) that outputs the predicted helpfulness score.
All hyperparameters live in src/config.yaml — it is the single source of truth. Defaults reproduce the paper experiments.
A CUDA-capable GPU is recommended; main.py falls back to CPU with a warning if CUDA is unavailable. See requirements.txt for the GPU wheel and CPU-only setup.
End-to-end run:
conda create -n mchpm python=3.11
conda activate mchpm
pip install -r requirements.txt
python main.pyPlace the dataset as data/raw/{fname}.jsonl.gz where {fname} matches data.fname in config.yaml. The file is read as gzipped JSON-lines (one review object per line) — each line must carry the columns below, or the run aborts at load with a KeyError.
| Column | Role |
|---|---|
user_id |
Reviewer id (non-null; also disambiguates downloaded image filenames). |
parent_asin |
Product id (non-null). |
timestamp |
Epoch-millisecond review time → review_date. |
text |
Review body → raw_review; cleaned and fed to BERT. |
images |
List of review-image URLs → review_images; rows with no image are dropped. |
helpful_vote |
Helpful-vote count; the label is log(1 + helpful_vote), and zero / missing-vote rows are dropped. |
verified_purchase |
Boolean flag; only verified-purchase reviews are kept. |
Any other columns are ignored. The pipeline writes two cache layers under data/processed/:
{fname}_labeled.parquet— written after the row filters, text cleaning, and label construction.- Columns:
user_id,parent_asin,timestamp,review_date,raw_review,clean_review,review_images,helpful_vote,label(the regression targetlog(1 + helpful_vote)).
- Columns:
{fname}_cued.parquet— adds the downloaded image paths and the extracted cues.- Columns: the labeled columns +
review_image_paths,review_text_central/review_text_peripheral(BERT semantic + readability/sentiment text cues),review_image_central/review_image_peripheral(VGG-16 semantic + clarity image cues).
- Columns: the labeled columns +
To reuse externally-extracted BERT/VGG features, save the data as {fname}_labeled.parquet with review_text_central and/or review_image_central columns pre-populated. The pipeline will skip BERT/VGG and only compute peripheral cues.
On every python main.py, the pipeline resumes from the most-complete cache on disk, checking newest-first (cued → labeled → image folder) and falling through to the next-earliest stage. The train/test split is rebuilt fresh in memory each run, so changes to test_size, seed, or val_ratio take effect on the next run. To re-trigger an upstream stage, delete its parquet (or the data/review_images/{fname}/ folder for image re-downloads).
MCHPM was evaluated on two large-scale Amazon review datasets: Cell Phones & Accessories and Electronics. The results demonstrate that MCHPM consistently outperforms strong unimodal and multimodal baselines across all evaluation metrics, achieving average improvements of 3.864% in MAE, 4.061% in MSE, 2.172% in RMSE, and 6.349% in MAPE compared with the strongest benchmark model.
| Model | Cell Phones & Accessories | Electronics | ||||||
|---|---|---|---|---|---|---|---|---|
| MAE | MSE | RMSE | MAPE | MAE | MSE | RMSE | MAPE | |
| LSTM | 0.647 | 0.821 | 0.849 | 56.702 | 0.711 | 0.896 | 0.946 | 57.678 |
| TNN | 0.643 | 0.714 | 0.845 | 56.650 | 0.722 | 0.904 | 0.851 | 59.556 |
| DMAF | 0.625 | 0.691 | 0.836 | 53.139 | 0.697 | 0.880 | 0.939 | 55.198 |
| CS-IMD | 0.615 | 0.681 | 0.825 | 52.392 | 0.687 | 0.831 | 0.912 | 56.032 |
| MFRHP | 0.625 | 0.695 | 0.837 | 53.116 | 0.695 | 0.840 | 0.916 | 57.488 |
| MCHPM (Proposed) | 0.607 | 0.679 | 0.824 | 50.706 | 0.674 | 0.825 | 0.908 | 53.712 |
If you use this repository in your research, please cite:
@article{LIM2026101586,
title = {What makes a review helpful? A multimodal prediction model in e-commerce},
author = {Heena Lim and Seonu Park and Qinglong Li and Xinzhe Li and Jaekyeong Kim},
journal = {Electronic Commerce Research and Applications},
volume = {76},
pages = {101586},
year = {2026},
doi = {10.1016/j.elerap.2026.101586}
}For research inquiries or collaborations, please contact:
Seonu Park
Ph.D. Student, Department of Big Data Analytics
Kyung Hee University
Email: sunu0087@khu.ac.kr
Qinglong Li
Assistant Professor, Division of Computer Engineering
Hansung University
Email: leecy@hansung.ac.kr
Last updated: June 2026
