Question regarding the Yume 1.5 paper (about the foundation model and distillation method)

Hi authors, thanks for your inspiring work!

After reading through your YUME1.5 paper, I have a question about the foundation model. You mentioned **"we first initialize the generator Gθ, fake model Gs, and real model Gt with weights from a foundation model [15]. "** Do you mean the foundation model is already an autoregressive diffusion model and DMD distillation is only used to reduce the number of denoising steps? (as also indicated by Fig. 4, to my understanding).
Then in the Experiments part, you mentioned **"We utilized the Wan2.2-5B as the pre-trained model. "** This makes me a little confused. Do you mean in the foundation model, 1) an autoregressive generation model architecture like the in the self-forcing paper is used, but the base model is changed from their Wan 1.3B t2v to Wan 2.2 5B, and then, 2) reduce the number of denoising steps by DMD distillation, without changing the autoregressive model architecture?  Or the bidirectional Wan 2.2 5B is trained as the foundation model and then distilled to an autoregressive model to take into account history information?

Thanks so much for your help in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding the Yume 1.5 paper (about the foundation model and distillation method) #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question regarding the Yume 1.5 paper (about the foundation model and distillation method) #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions