[Refactor] Deprecate redundant adamw variants

## The feature and motivation

In mindone trainer, there are various adamw optimizer implementations, as follows:

```python
# https://github.com/mindspore-lab/mindone/blob/master/mindone/trainers/optim.py
from mindcv.optim.adamw import AdamW as AdamW_Refined

from mindspore.common.parameter import Parameter
from mindspore.nn.optim import Adam, AdamWeightDecay, Momentum, Optimizer

from .adamw_bf16 import BF16AdamW
from .adamw_mf import AdamW as AdamW_MF
from .adamw_mint import AdamW as AdamW_Mint
from .adamw_zero1 import AdamWeightDecayZeRO1

  if name.lower() == "adam":
      optim_cls = Adam
  elif name.lower() == "adamw":
      optim_cls = AdamWeightDecay
  elif name.lower() == "adamw_re":
      optim_cls = AdamW_Refined
  elif name.lower() == "adamw_bf16":
      optim_cls = BF16AdamW
  elif name.lower() == "adamw_mf":
      optim_cls = AdamW_MF
  elif name.lower() == "adamw_zero1":
      optim_cls = AdamWeightDecayZeRO1
  elif name.lower() == "adamw_mint":
      optim_cls = AdamW_Mint
```

They were gradually added either for better precision alignment (such as mint) or efficient memory usage. However, they become confusing for new users to MindONE. 

We plan to simplify these adamw implementations based on MindSpore 2.7. This issue tracks the related PRs to fulfill this goal.

Plan:
* [ ] Confirm the default choice for graph mode and pynative mode. For now, we prioritize precision over performance, so ideally we should use [adamw mint](https://www.mindspore.cn/docs/zh-CN/r2.7.0/api_python/mint/mindspore.mint.optim.AdamW.html), but this API relies on [LRScheduler](https://www.mindspore.cn/docs/zh-CN/r2.7.0/api_python/mindspore.experimental.html#lrscheduler%E7%B1%BB) for dynamic learning rate adjustment. Currently LRScheduler is still experimental, it doesn't support well for LR cosine decay with warmup and doesn't support well for graph mode.
* [ ] Remove redundant optimizers, such as adamw_mf, adamw_zero. And update the related model training recipes with the default recommended optimizer.
* [ ] Update docs to explain necessary implementation variants
* [ ] For models under `mindon/examples`, verify whether the training pipeline is still runnable and meets precision and memory requirements after replacing with the unified AdamW implementation. (Need help!!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Deprecate redundant adamw variants #1300

The feature and motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Refactor] Deprecate redundant adamw variants #1300

Description

The feature and motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions