Skip to content

Question regarding the Yume 1.5 paper (about the foundation model and distillation method) #20

@LeeKeyu

Description

@LeeKeyu

Hi authors, thanks for your inspiring work!

After reading through your YUME1.5 paper, I have a question about the foundation model. You mentioned "we first initialize the generator Gθ, fake model Gs, and real model Gt with weights from a foundation model [15]. " Do you mean the foundation model is already an autoregressive diffusion model and DMD distillation is only used to reduce the number of denoising steps? (as also indicated by Fig. 4, to my understanding).
Then in the Experiments part, you mentioned "We utilized the Wan2.2-5B as the pre-trained model. " This makes me a little confused. Do you mean in the foundation model, 1) an autoregressive generation model architecture like the in the self-forcing paper is used, but the base model is changed from their Wan 1.3B t2v to Wan 2.2 5B, and then, 2) reduce the number of denoising steps by DMD distillation, without changing the autoregressive model architecture? Or the bidirectional Wan 2.2 5B is trained as the foundation model and then distilled to an autoregressive model to take into account history information?

Thanks so much for your help in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions