We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent e9beb2d commit d17212eCopy full SHA for d17212e
1 file changed
deepspeed/runtime/zero/muon/original_muon.py
@@ -80,7 +80,7 @@ def zeropower_via_gram_newtonschulz(G, steps: int):
80
Falls back to standard Newton-Schulz for square matrices (n == m)
81
where there is no FLOP advantage.
82
83
- Reference: https://arxiv.org/abs/2503.02022
+ Reference: https://tridao.me/blog/2026/gram-newton-schulz/
84
"""
85
assert G.ndim >= 2
86
a, b, c = (3.4445, -4.7750, 2.0315)
0 commit comments