An efficient transfer paradigm enabling high-quality, end-to-end pixel-space diffusion with minimal computational overhead and data requirements.
- [2026/05] Technical report released.
| Status | Item |
|---|---|
| 🛠️ | 1K inference code & weights |
| 🛠️ | Training code |
| 🛠️ | 4K/8K/10K UHR generation |
| 🛠️ | Compatibility with more LDM T2I model |
pipe = L2PPipeLine.from_pretrained(
torch_dtype=torch.bfloat16,
device="cuda",
model_configs=[
ModelConfig(path=["path/to/L2P/main_model.safetensors"]),
ModelConfig(path=[
"path/to/Z-Image-Turbo/text_encoder/model-00001-of-00003.safetensors",
"path/to/Z-Image-Turbo/text_encoder/model-00002-of-00003.safetensors",
"path/to/Z-Image-Turbo/text_encoder/model-00003-of-00003.safetensors",
]),
],
tokenizer_config=ModelConfig(path="path/to/Z-Image-Turbo/tokenizer"),
)
image = pipe(
prompt="an origami pig on fire in the middle of a dark room with a pentagram on the floor",
seed=42,
rand_device="cuda",
num_inference_steps=30,
cfg_scale=2.0,
height=1024,
width=1024,
)
image.save("example.png")Launch a multi-GPU web UI:
python app.pyThe demo auto-detects free GPUs, dispatches each request to an idle device, and exposes a Gradio interface at http://0.0.0.0:23231.
bash train_run.shbash train_run_low_VRAM.shProvide a directory of images plus a CSV metadata file:
data/
├── images/ # raw image folder
└── metadata.csv # columns: file_name, text, ...
Update --dataset_base_path and --dataset_metadata_path in the launch script accordingly.
If you find this work useful, please consider citing:
@article{chen2026l2p,
title = {L2P: Unlocking Latent Potential for Pixel Generation},
author = {Chen, Zhennan and Zhu, Junwei and Chen, Xu and Zhang, Jiangning and
Chen, Jiawei and Zeng, Zhuoqi and Zhang, Wei and Wang, Chengjie and
Yang, Jian and Tai, Ying},
journal = {arXiv preprint arXiv:2605.12013},
year = {2026}
}
@article{chen2025dip,
title = {DiP: Taming Diffusion Models in Pixel Space},
author = {Chen, Zhennan and Zhu, Junwei and Chen, Xu and Zhang, Jiangning and
Hu, Xiaobin and Zhao, Hanzhen and Wang, Chengjie and Yang, Jian and
Tai, Ying},
journal = {arXiv preprint arXiv:2511.18822},
year = {2025}
}L2P is built upon the excellent open-source work of DiffSynth-Studio, Z-Image.