Skip to content

NJU-PCALab/L2P

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

L2P: Unlocking Latent Potential for Pixel Generation

Project Page arXiv

An efficient transfer paradigm enabling high-quality, end-to-end pixel-space diffusion with minimal computational overhead and data requirements.

⭐ If L2P helps your research or product, please consider giving the repo a star ⭐

📰 News

  • [2026/05] Technical report released.

🗺️ Roadmap

Status Item
🛠️ 1K inference code & weights
🛠️ Training code
🛠️ 4K/8K/10K UHR generation
🛠️ Compatibility with more LDM T2I model

📦 Installation


🎨 Inference

pipe = L2PPipeLine.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(path=["path/to/L2P/main_model.safetensors"]),
        ModelConfig(path=[
            "path/to/Z-Image-Turbo/text_encoder/model-00001-of-00003.safetensors",
            "path/to/Z-Image-Turbo/text_encoder/model-00002-of-00003.safetensors",
            "path/to/Z-Image-Turbo/text_encoder/model-00003-of-00003.safetensors",
        ]),
    ],
    tokenizer_config=ModelConfig(path="path/to/Z-Image-Turbo/tokenizer"),
)

image = pipe(
    prompt="an origami pig on fire in the middle of a dark room with a pentagram on the floor",
    seed=42,
    rand_device="cuda",
    num_inference_steps=30,
    cfg_scale=2.0,
    height=1024,
    width=1024,
)
image.save("example.png")

Gradio Demo

Launch a multi-GPU web UI:

python app.py

The demo auto-detects free GPUs, dispatches each request to an idle device, and exposes a Gradio interface at http://0.0.0.0:23231.


🏋️ Training

Standard training

bash train_run.sh

Low-VRAM training (single GPU < 24 GB VRAM)

bash train_run_low_VRAM.sh

Dataset format

Provide a directory of images plus a CSV metadata file:

data/
├── images/                # raw image folder
└── metadata.csv           # columns: file_name, text, ...

Update --dataset_base_path and --dataset_metadata_path in the launch script accordingly.


📜 Citation

If you find this work useful, please consider citing:

@article{chen2026l2p,
  title   = {L2P: Unlocking Latent Potential for Pixel Generation},
  author  = {Chen, Zhennan and Zhu, Junwei and Chen, Xu and Zhang, Jiangning and
             Chen, Jiawei and Zeng, Zhuoqi and Zhang, Wei and Wang, Chengjie and
             Yang, Jian and Tai, Ying},
  journal = {arXiv preprint arXiv:2605.12013},
  year    = {2026}
}

@article{chen2025dip,
  title   = {DiP: Taming Diffusion Models in Pixel Space},
  author  = {Chen, Zhennan and Zhu, Junwei and Chen, Xu and Zhang, Jiangning and
             Hu, Xiaobin and Zhao, Hanzhen and Wang, Chengjie and Yang, Jian and
             Tai, Ying},
  journal = {arXiv preprint arXiv:2511.18822},
  year    = {2025}
}

🙏 Acknowledgements

L2P is built upon the excellent open-source work of DiffSynth-Studio, Z-Image.

About

L2P: Unlocking Latent Potential for Pixel Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors