Skip to content

[Refactor] Deprecate redundant adamw variants #1300

@SamitHuang

Description

@SamitHuang

The feature and motivation

In mindone trainer, there are various adamw optimizer implementations, as follows:

# https://github.com/mindspore-lab/mindone/blob/master/mindone/trainers/optim.py
from mindcv.optim.adamw import AdamW as AdamW_Refined

from mindspore.common.parameter import Parameter
from mindspore.nn.optim import Adam, AdamWeightDecay, Momentum, Optimizer

from .adamw_bf16 import BF16AdamW
from .adamw_mf import AdamW as AdamW_MF
from .adamw_mint import AdamW as AdamW_Mint
from .adamw_zero1 import AdamWeightDecayZeRO1

  if name.lower() == "adam":
      optim_cls = Adam
  elif name.lower() == "adamw":
      optim_cls = AdamWeightDecay
  elif name.lower() == "adamw_re":
      optim_cls = AdamW_Refined
  elif name.lower() == "adamw_bf16":
      optim_cls = BF16AdamW
  elif name.lower() == "adamw_mf":
      optim_cls = AdamW_MF
  elif name.lower() == "adamw_zero1":
      optim_cls = AdamWeightDecayZeRO1
  elif name.lower() == "adamw_mint":
      optim_cls = AdamW_Mint

They were gradually added either for better precision alignment (such as mint) or efficient memory usage. However, they become confusing for new users to MindONE.

We plan to simplify these adamw implementations based on MindSpore 2.7. This issue tracks the related PRs to fulfill this goal.

Plan:

  • Confirm the default choice for graph mode and pynative mode. For now, we prioritize precision over performance, so ideally we should use adamw mint, but this API relies on LRScheduler for dynamic learning rate adjustment. Currently LRScheduler is still experimental, it doesn't support well for LR cosine decay with warmup and doesn't support well for graph mode.
  • Remove redundant optimizers, such as adamw_mf, adamw_zero. And update the related model training recipes with the default recommended optimizer.
  • Update docs to explain necessary implementation variants
  • For models under mindon/examples, verify whether the training pipeline is still runnable and meets precision and memory requirements after replacing with the unified AdamW implementation. (Need help!!)

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions