Skip to content

Commit 4fe6a0b

Browse files
committed
docs: add ns_method parameter to Muon optimizer documentation
Add a params table for the Muon optimizer section in config-json.md, documenting all supported parameters including the new ns_method option for switching between Gram and standard Newton-Schulz iteration. Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
1 parent 748d253 commit 4fe6a0b

1 file changed

Lines changed: 13 additions & 1 deletion

File tree

docs/_pages/config-json.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,17 @@ toc_label: "Contents"
4141

4242
Muon optimizer is supported with ZeRO Stage 1, 2, and 3. To use Muon, set the optimizer name to `Muon`. The parameters applied for Muon are automatically determined by the matrix shape and name. For ZeRO Stage 3 with NVMe offloading, set `save_muon_momentum_buffer_in_memory` to `true` under `zero_optimization` to keep the Muon momentum buffer in GPU/CPU memory instead of swapping to NVMe.
4343

44+
Muon supports the following params:
45+
46+
| "params" key | Description | Default |
47+
| -------------- | -------------------------------------------------------------------------------------------------------------------- | --------- |
48+
| lr | Learning rate for all parameters. Overridden by `muon_lr` / `adam_lr` if set. | 0.001 |
49+
| momentum | Momentum coefficient for the Muon update. | 0.95 |
50+
| weight\_decay | Weight decay (AdamW-style). | 0.0 |
51+
| muon\_lr | Learning rate override for Muon parameters. Defaults to `lr` if not set. | - |
52+
| adam\_lr | Learning rate override for non-Muon (Adam) parameters. Defaults to `lr` if not set. | - |
53+
| ns\_method | Newton-Schulz orthogonalization method: `"gram"` for Gram NS (~2x faster on rectangular matrices), `"standard"` for the original iteration. Use `"standard"` to fall back if you encounter convergence issues. | `"gram"` |
54+
4455
Example of <i>**optimizer**</i> with Adam
4556

4657
```json
@@ -73,7 +84,8 @@ If not set, muon_lr will default to lr.
7384
"lr": 0.001,
7485
"momentum": 0.9,
7586
"weight_decay": 0.0,
76-
"muon_lr": 0.001
87+
"muon_lr": 0.001,
88+
"ns_method": "gram"
7789
}
7890
},
7991
"zero_optimization": {

0 commit comments

Comments
 (0)