First of all, thank you for your amazing work on this project! I notice a potential discrepancy. The paper mentions the use of a Wan 2.2-5B model for the experiments. However, when looking at the provided training script, it seems to be configured to use a 14B parameter model.