Skip to content

Questions about method, dataset, and training cost #6

@LeeKeyu

Description

@LeeKeyu

Hi authors,

Thanks for sharing your inspiring work! After carefully reading through your paper, I have some questions regarding the method and dataset:

  1. In the two-stage training, you mentioned that in stage 1 only the VGGT part is trained with video generation part from Wan frozen, and camera control is added in Stage 2. Why not separately train the camera control modules before stage 1?
  2. Could you elaborate more on the dataset processing method used in this work, i.e., the reconstruction method to "generate multiview consistent depth maps using a reconstruction-based pipeline"?
  3. The minimum GPU hardware requirement to train and test this method.
  4. Is the FantasyWorld-1.0 on the leaderboard assosiated with the Wan 2.2 version in this repository?

Thanks a lot for your help in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions