Discrepancy between paper's model (Wan 2.2 5B) and code's model (14B)

First of all, thank you for your amazing work on this project! I notice a potential discrepancy. The paper mentions the use of a Wan 2.2-5B model for the experiments. However, when looking at the provided training script, it seems to be configured to use a 14B parameter model.