Jiaqi Han*
we propose SDPO, a general preference optimization method to approach trajectory alignment for discrete diffusion models. Importantly, we decompose the problem into a set of stepwise alignment objectives by matching the per-step factorized posterior. This framework enables efficient diffusion optimization, is compatible with arbitrary reward functions, and yields an equivalent optimal solution under additive factorization of the trajectory reward.
Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach.
Please give us a star ⭐ if you find our work interesting!
Our goal here is to optimize the activity of regulatory DNA sequences such that they drive gene expression in specific cell types, a critical task for cell and gene therapy.
We provide the source code of the DNA experiments in SDPO_dna/ folder. Please refer to SDPO_dna/REAMDE.md for detailed instructions.
Given a pretrained inverse folding model that generates sequences conditioned on the backbone’s conformation (3D structure), our goal is to optimize the stability of these generated sequences.
The code and instructions are in SDPO_protein/ folder. Please refer to SDPO_protein/REAMDE.md for detailed instructions.
We also apply our approach to a large-scale discrete diffusion for natural language modeling, demonstrating its efficacy towards preference optimization of large language diffusion models. We employ LLaDA-8B-Instruct as the reference model.
The code is provided in SDPO_llada/ folder.
To launch the experiments:
# cd SDPO_llada/
# pip install -r requirements.txt
ngpu=4
devices=0,1,2,3
CUDA_VISIBLE_DEVICES=${devices} \
accelerate launch \
--config_file accelerate_configs/deepspeed_zero3.yaml \
--num_processes=4 \
--main_process_port=29521 \
run.py \
config_llada.yamlWe use the data and checkpoints from the DRAKES repository for the DNA and protein experiments. The code for language modeling experiments was heavily built upon Simpo. We sincerely thank the authors for open-sourcing the codebase.
Please consider citing our work if you find it useful:
@inproceedings{
han2026discrete,
title={Discrete Diffusion Trajectory Alignment via Stepwise Decomposition},
author={Jiaqi Han and Austin Wang and Minkai Xu and Wenda Chu and Meihua Dang and Haotian Ye and Huayu Chen and Yisong Yue and Stefano Ermon},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=h9b5h69v3p}
}
If you have any question, welcome to contact me at:
Jiaqi Han: jiaqihan@stanford.edu
