This repository is an official PyTorch implementation of the paper Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees.
Autoregressive (AR) image models achieve diffusion-level quality but suffer from sequential inference, requiring approximately 2,000 steps for a 576x576 image. Speculative decoding with draft trees accelerates LLMs yet underperforms on visual AR models due to spatially varying token prediction difficulty. We identify a key obstacle in applying speculative decoding to visual AR models: inconsistent acceptance rates across draft trees due to varying prediction difficulties in different image regions. We propose Adjacency-Adaptive Dynamical Draft Trees (ADT-Tree), an adjacency-adaptive dynamic draft tree that dynamically adjusts draft tree depth and width by leveraging adjacent token states and prior acceptance rates. ADT-Tree initializes via horizontal adjacency, then refines depth/width via bisectional adaptation, yielding deeper trees in simple regions and wider trees in complex ones.
All main code refers to the project LANTERN
Thank the LANTERN team for their contributions to the open-source community
- [2025-11-28] TODO: Change the eagle tree
- [2025-11-20] 🎉🎉🎉 Our ADT-Tree is released! 🎉🎉🎉
- Paper Portal for Top Conferences in the Field of Artificial intelligence: CV_Paper_Portal
Below is a comparison of the effects of different methods
-
Install Required Packages Requirements
- Python >= 3.10
- PyTorch >= 2.4.0
Install the dependencies listed in
requirements.txt.git clone https://github.com/Haodong-Lei-Ray/ADT-Tree.git cd ADT-Tree conda create -n ADT-Tree python=3.10 -y conda activate ADT-Tree pip install -r requirements.txt -
Additional Setup
- Lumina-mGPT
For Lumina-mGPT, we need to install
flash_attentionandxllmxpackages.pip install flash-attn --no-build-isolation cd models/base_models/lumina_mgpt pip install -e .
- Lumina-mGPT
For Lumina-mGPT, we need to install
-
Checkpoints All model weights and other required data should be stored in
ckpts/.-
Lumina-mGPT For Lumina-mGPT, since currently the Chameleon implementation in transformers does not contain the VQ-VAE decoder, please manually download the original VQ-VAE weights provided by Meta and put them to the following directory:
ckpts └── lumina_mgpt └── chameleon └── tokenizer ├── text_tokenizer.json ├── vqgan.yaml └── vqgan.ckptAlso download the original model
Lumina-mGPT-7B-768from Huggingface 🤗 and put them to the following directory:ckpts └── lumina_mgpt └── Lumina-mGPT-7B-768 ├── config.json ├── generation_config.json ├── model-00001-of-00002.safetensors └── other files... -
Anole For Anole, download
Anole-7b-v0.1-hf, which is a huggingface style converted model fromAnole.In addition, you should download the original VQ-VAE weights provided by Meta and put them to the following directory:
ckpts └── anole ├── Anole-7b-v0.1-hf | ├── config.json | ├── generation_config.json | ├── model-00001-of-00003.safetensors | └── other files... └── chameleon └── tokenizer ├── text_tokenizer.json ├── vqgan.yaml └── vqgan.ckpt(Optional) Trained drafter To use trained drafter, you need to download
anole_drafterand save it under trained_drafters directory.ckpts └── anole └── trained_drafters └── anole_drafter ├── config.json ├── generation_config.json ├── pytorch_model.bin └── other files...
-
ADT-Tree+LANTERN in MSCOCO2017Val
cd ./ADT-Tree
prompt=MSCOCO2017Val
model=anole
temperature=1
model_type=eagle
lantern_delta=0.5
lantern_k=100
#output_path=/home/leihaodong/TIP26/exp/Anole/MSCOCO2017Val/lantern_ADT-Tree
output_path=<your out path>
mkdir -p ${output_path}
nohup python main.py generate_images \
--prompt $prompt \
--model $model \
--temperature $temperature \
--model_type $model_type \
--model_path leloy/Anole-7b-v0.1-hf \
--drafter_path jadohu/anole_drafter \
--output_dir $output_path \
--lantern \
--peanut \
--lantern_k $lantern_k \
--lantern_delta ${lantern_delta} \
--num_images -1 > ${output_path}.log 2>&1 &
ADT-Tree+LANTERN
This project is distributed under the Chameleon License by Meta Platforms, Inc. For more information, please see the LICENSE file in the repository.
This repository is built with extensive reference to FoundationVision/LlamaGen, Alpha-VLLM/Lumina-mGPT and SafeAILab/EAGLE, leveraging many of their core components and approaches.
@misc{lei2025fastinferencevisualautoregressive,
title={Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees},
author={Haodong Lei and Hongsong Wang and Xin Geng and Liang Wang and Pan Zhou},
year={2025},
eprint={2512.21857},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.21857},
}

