Questions about method, dataset, and training cost

Hi authors, 

Thanks for sharing your inspiring work! After carefully reading through your paper, I have some questions regarding the method and dataset:
1. In the two-stage training, you mentioned that in stage 1 only the VGGT part is trained with video generation part from Wan frozen, and camera control is added in Stage 2. Why not separately train the camera control modules before stage 1?
2. Could you elaborate more on the dataset processing method used in this work, i.e., the reconstruction method to "generate multiview consistent depth maps using a reconstruction-based pipeline"?
3. The minimum GPU hardware requirement to train and test this method.
4. Is the FantasyWorld-1.0 on the leaderboard assosiated with the Wan 2.2 version in this repository?

Thanks a lot for your help in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about method, dataset, and training cost #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about method, dataset, and training cost #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions