-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Hi authors, thanks for your inspiring work!
After reading through your YUME1.5 paper, I have a question about the foundation model. You mentioned "we first initialize the generator Gθ, fake model Gs, and real model Gt with weights from a foundation model [15]. " Do you mean the foundation model is already an autoregressive diffusion model and DMD distillation is only used to reduce the number of denoising steps? (as also indicated by Fig. 4, to my understanding).
Then in the Experiments part, you mentioned "We utilized the Wan2.2-5B as the pre-trained model. " This makes me a little confused. Do you mean in the foundation model, 1) an autoregressive generation model architecture like the in the self-forcing paper is used, but the base model is changed from their Wan 1.3B t2v to Wan 2.2 5B, and then, 2) reduce the number of denoising steps by DMD distillation, without changing the autoregressive model architecture? Or the bidirectional Wan 2.2 5B is trained as the foundation model and then distilled to an autoregressive model to take into account history information?
Thanks so much for your help in advance!