The feature and motivation
In mindone trainer, there are various adamw optimizer implementations, as follows:
# https://github.com/mindspore-lab/mindone/blob/master/mindone/trainers/optim.py
from mindcv.optim.adamw import AdamW as AdamW_Refined
from mindspore.common.parameter import Parameter
from mindspore.nn.optim import Adam, AdamWeightDecay, Momentum, Optimizer
from .adamw_bf16 import BF16AdamW
from .adamw_mf import AdamW as AdamW_MF
from .adamw_mint import AdamW as AdamW_Mint
from .adamw_zero1 import AdamWeightDecayZeRO1
if name.lower() == "adam":
optim_cls = Adam
elif name.lower() == "adamw":
optim_cls = AdamWeightDecay
elif name.lower() == "adamw_re":
optim_cls = AdamW_Refined
elif name.lower() == "adamw_bf16":
optim_cls = BF16AdamW
elif name.lower() == "adamw_mf":
optim_cls = AdamW_MF
elif name.lower() == "adamw_zero1":
optim_cls = AdamWeightDecayZeRO1
elif name.lower() == "adamw_mint":
optim_cls = AdamW_Mint
They were gradually added either for better precision alignment (such as mint) or efficient memory usage. However, they become confusing for new users to MindONE.
We plan to simplify these adamw implementations based on MindSpore 2.7. This issue tracks the related PRs to fulfill this goal.
Plan:
The feature and motivation
In mindone trainer, there are various adamw optimizer implementations, as follows:
They were gradually added either for better precision alignment (such as mint) or efficient memory usage. However, they become confusing for new users to MindONE.
We plan to simplify these adamw implementations based on MindSpore 2.7. This issue tracks the related PRs to fulfill this goal.
Plan:
mindon/examples, verify whether the training pipeline is still runnable and meets precision and memory requirements after replacing with the unified AdamW implementation. (Need help!!)