ReSID

This repository provides a PyTorch reference implementation of the main models and training procedures described in our paper:

Yu Liang*, Zhongjin Zhang*, Yuxuan Zhu, Kerui Zhang, Zhiluohan Guo, Zhou Wenhang, Zonqi Yang, Kangle Wu, Yabo Ni, Anxiang Zeng, Cong Fu, Jianxin Wang, and Jiazhi Xia. Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs.

Paper & Resources

Hugging Face Papers: https://huggingface.co/papers/2602.02338
arXiv: https://arxiv.org/abs/2602.02338
Dataset (Hugging Face): https://huggingface.co/datasets/PIIR/ReSID-dataset

Overview

We propose ReSID, a recommendation-native, principled SID framework that rethinks representation learning and quantization from the perspective of information preservation and sequential predictability, without relying on LLMs. ReSID consists of two components: (i) Field-Aware Masked Auto-Encoding (FAMAE), which learns predictive-sufficient item representations from structured features, and (ii) Globally Aligned Orthogonal Quantization (GAOQ), which produces compact and predictable SID sequences by jointly reducing semantic ambiguity and prefix-conditional uncertainty.

Project Structure

The structure of this repository is as follows:

.
├── config/                   # All *.yaml configuration files for the pipeline
├── dataset/                  # Amazon-2023 review dataset processing code / downloaded dataset folder
├── model/                    # Model implementations
├── logger.py                 # Logging utilities for printing runtime outputs
├── main.py                   # Main entry point for training and evaluation
├── metrics.py                # Evaluation-related code
├── requirements.txt          # List of required Python packages and dependencies
├── run_pipelines.py          # One-click script to run the full ReSID pipeline
├── trainer.py                # Training script
├── utils.py                  # Training utilities, mainly for data loading
└── README.md                 # This file

Experiments

Setup

We recommend installing dependencies using requirements.txt. This setup has been tested on Ubuntu 18.04, CUDA 12.4, and Python 3.12.

pip3 install -r requirements.txt

Data

Option 1: Download the processed dataset (recommended)

Download the processed dataset from Hugging Face:

https://huggingface.co/datasets/PIIR/ReSID-dataset

After downloading, place the extracted dataset folder directly under dataset/, e.g.:

ReSID/
└── dataset/
    └── Musical_Instruments/   # the extracted dataset directory

Option 2: Reproduce the dataset from raw Amazon-2023 reviews (from scratch)

Download the raw Amazon-2023 review subsets and statistics:

bash dataset/download_amazon_2023.sh
bash dataset/download_amazon_2023_statistics.sh

Preprocess the downloaded data:
```
python dataset/data_process.py
```

After processing, the generated dataset will be saved under dataset/ (as configured in the scripts).

Training

To run ReSID, use the following command:

python run_pipelines.py --dataset Musical_Instruments --device cuda:0

Set --dataset to the name of the dataset you want to run.

Results

Citation

If you find this repository helpful, please consider citing our paper:

@misc{ReSID,
      title={Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs}, 
      author={Yu Liang and Zhongjin Zhang and Yuxuan Zhu and Kerui Zhang and Zhiluohan Guo and Wenhang Zhou and Zonqi Yang and Kangle Wu and Yabo Ni and Anxiang Zeng and Cong Fu and Jianxin Wang and Jiazhi Xia},
      year={2026},
      eprint={2602.02338},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2602.02338}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReSID

Paper & Resources

Overview

Project Structure

Experiments

Setup

Data

Option 1: Download the processed dataset (recommended)

Option 2: Reproduce the dataset from raw Amazon-2023 reviews (from scratch)

Training

Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
dataset		dataset
images		images
model		model
LICENSE		LICENSE
README.md		README.md
logger.py		logger.py
main.py		main.py
metrics.py		metrics.py
requirements.txt		requirements.txt
run_pipelines.py		run_pipelines.py
trainer.py		trainer.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

ReSID

Paper & Resources

Overview

Project Structure

Experiments

Setup

Data

Option 1: Download the processed dataset (recommended)

Option 2: Reproduce the dataset from raw Amazon-2023 reviews (from scratch)

Training

Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages