MOSO: Combining SOAP and Muon by mkhona-nvidia · Pull Request #222 · NVIDIA-NeMo/Emerging-Optimizers

mkhona-nvidia · 2026-06-03T19:28:08Z

MOSO

MOSO, short for Momentum One-Sided SOAP, combines Muon-style momentum with SOAP's eigenbasis Adam update, but keeps the preconditioner one-sided on the smaller matrix dimension. For a momentum matrix (M_t), MOSO accumulates a SOAP-style covariance over momentum instead of raw gradients, using $(C_t = \beta_s C_{t-1} + (1 - \beta_s) M_t M_t^T)$ for the left-preconditioned case, or $(C_t = \beta_s C_{t-1} + (1 - \beta_s) M_t^T M_t)$ for the right-preconditioned case. With $(C_t = Q_M \Lambda_M Q_M^T)$, the left-side update is

$$U_t = Q_M \text{Adam}(Q_M^T M_t),$$

with the analogous right-side projection $(U_t = \text{Adam}(M_t Q_M) Q_M^T)$ when the column dimension is smaller. This can be read as one-sided SOAP on Muon momentum: rotate $(M_t)$ into the momentum-covariance eigenbasis, run the inner Adam update there, and rotate back.

Signed-off-by: mikail <mkhona@nvidia.com>

copy-pr-bot · 2026-06-03T19:28:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-06-03T19:36:25Z

Greptile Summary

This PR introduces MOSO (Momentum One-Sided SOAP), a new optimizer that combines Muon-style EMA momentum with one-sided SOAP preconditioning. It maintains an Adam update in the covariance eigenbasis of the Muon momentum matrix, restricted to the smaller of the two matrix dimensions.

moso.py implements the full MOSO optimizer: Muon momentum EMA, bias-corrected one-sided Shampoo covariance, eigenbasis update (full eigh at step 0, orthogonal iteration thereafter), Adam-state basis-change rotation for exp_avg, and permutation of exp_avg_sq on eigenvalue sort.
tests/test_moso.py covers smoke steps for multiple shapes, registry lookup, covariance accumulation on the smaller side, and a closed-form equivalence check for the no-EMA case.

Confidence Score: 4/5

Safe to merge after addressing the shampoo_beta=1.0 NaN issue; all normal-range hyperparameter values work correctly.

The optimizer correctly implements all described algorithmic pieces and has matching tests. The one concrete defect is in the shampoo_beta bias-correction formula: passing shampoo_beta=1.0 produces a 0/0 NaN that silently corrupts state["M"] and all downstream state for the rest of training.

emerging_optimizers/soap/moso.py (the shampoo_beta bias-correction formula at line 156)

Important Files Changed

Filename	Overview
emerging_optimizers/soap/moso.py	Core MOSO optimizer; correctly structured but the shampoo_beta bias-correction formula silently produces NaN when shampoo_beta=1.0.
tests/test_moso.py	Tests cover smoke, registry, covariance shape, and closed-form equivalence.
emerging_optimizers/soap/init.py	Trivial registration of MOSO in the soap subpackage.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant MOSO
    participant MomentumFactor as _update_one_sided_momentum_factor
    participant EigUpdate as _update_eigenbasis_and_adam_exp_avgs
    participant Adam as calculate_adam_update

    Caller->>MOSO: step()
    MOSO->>MOSO: apply weight decay (decoupled)
    MOSO->>MOSO: "lerp momentum_buffer <- grad (Muon EMA)"
    MOSO->>MomentumFactor: update M (one-sided covariance of momentum)
    MomentumFactor-->>MOSO: M updated in-place
    MOSO->>EigUpdate: rotate exp_avg to new basis, permute exp_avg_sq, update Q_M
    Note over EigUpdate: step==0 uses eigh(M), step>0 uses orthogonal iteration
    EigUpdate-->>MOSO: (Q_M, exp_avg_in_new_basis, permuted_exp_avg_sq)
    MOSO->>MOSO: project momentum into Q_M basis
    MOSO->>Adam: calculate_adam_update(projected_momentum, exp_avg, exp_avg_sq)
    Adam-->>MOSO: adam_update (in Q_M basis)
    MOSO->>MOSO: project adam_update back to parameter space
    MOSO->>MOSO: clip RMS (optional)
    MOSO->>Caller: "p <- p - lr * update"

_{Reviews (7): Last reviewed commit: "Merge branch 'main' into mkhona/shmuon" | Re-trigger Greptile}

Signed-off-by: mikail <mkhona@nvidia.com>

skyw

Other than bit DRY, which is not this PR's problem, mostly ok. will approve after.

Signed-off-by: mikail <mkhona@nvidia.com>

mkhona-nvidia · 2026-06-04T21:36:32Z

/ok to test 1646e65

mkhona-nvidia added 2 commits June 3, 2026 12:16

Initial Implementation of ShMuon: SOAPified Muon

fa67645

Signed-off-by: mikail <mkhona@nvidia.com>

changed to adam in the shmuon eigenbasis

803a750

Signed-off-by: mikail <mkhona@nvidia.com>

mkhona-nvidia requested a review from skyw June 3, 2026 19:28

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread emerging_optimizers/soap/sh_muon.py Outdated

fix state init dtype

5ccdcdb

Signed-off-by: mikail <mkhona@nvidia.com>

mkhona-nvidia changed the title ~~ShMuon: Combining SOAP and Muon~~ MOSO: Combining SOAP and Muon Jun 3, 2026

renamed shmuon to moso

7fe9f1e

Signed-off-by: mikail <mkhona@nvidia.com>

skyw requested changes Jun 3, 2026

View reviewed changes

Comment thread emerging_optimizers/soap/moso.py

Comment thread emerging_optimizers/soap/moso.py Outdated

Comment thread emerging_optimizers/soap/moso.py Outdated

Comment thread emerging_optimizers/soap/sh_muon.py Outdated

Comment thread tests/test_sh_muon.py Outdated

mkhona-nvidia added 4 commits June 3, 2026 14:40

Addressed MR comments

5bceeca

Signed-off-by: mikail <mkhona@nvidia.com>

removed extra args from moso

fe9a22a

Signed-off-by: mikail <mkhona@nvidia.com>

remove muon scale factor since moso already has the adamW update

cc21885

Signed-off-by: mikail <mkhona@nvidia.com>

removed code for removed flags

9335c43

Signed-off-by: mikail <mkhona@nvidia.com>

skyw approved these changes Jun 4, 2026

View reviewed changes

Merge branch 'main' into mkhona/shmuon

1646e65

mkhona-nvidia enabled auto-merge (squash) June 4, 2026 21:36

copy-pr-bot Bot temporarily deployed to test June 4, 2026 21:36 Inactive

copy-pr-bot Bot temporarily deployed to public June 4, 2026 21:37 Inactive

copy-pr-bot Bot temporarily deployed to public June 4, 2026 21:38 Inactive

greptile-apps Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread emerging_optimizers/soap/moso.py

mkhona-nvidia merged commit a0e376b into NVIDIA-NeMo:main Jun 4, 2026
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MOSO: Combining SOAP and Muon #222

MOSO: Combining SOAP and Muon #222
mkhona-nvidia merged 9 commits into
NVIDIA-NeMo:mainfrom
mkhona-nvidia:mkhona/shmuon

mkhona-nvidia commented Jun 3, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 3, 2026

Uh oh!

greptile-apps Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

skyw left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkhona-nvidia commented Jun 4, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mkhona-nvidia commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MOSO

Uh oh!

copy-pr-bot Bot commented Jun 3, 2026

Uh oh!

greptile-apps Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

skyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkhona-nvidia commented Jun 4, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mkhona-nvidia commented Jun 3, 2026 •

edited

Loading

greptile-apps Bot commented Jun 3, 2026 •

edited

Loading