Add AdamW optimizer to extension/training by BryanBradfo · Pull Request #18848 · pytorch/executorch

BryanBradfo · 2026-04-13T21:01:41Z

Adds AdamW to the training optimizer extension. It's a port of the existing SGD implementation at extension/training/optimizer/sgd.{h,cpp}, with the main algorithmic difference being decoupled weight decay (the parameter gets decayed directly instead of mixing the decay into the gradient). Matches torch.optim.AdamW with default settings.

Fixes #18766

Scope

C++ only for this PR. Python bindings are left out on purpose: the pybindings file has a TODO to build a generic optimizer interface first, so copying PySGD to PyAdamW now would just add duplication. Happy to follow up with that. amsgrad and maximize are also left out, both rarely used and easy to add later if needed.

Test plan

Six new gtests pass, and the SGD regression stays green:

$ buck2 test //extension/training/optimizer/test:adamw_test
[  PASSED  ] 6 tests.

$ buck2 test //extension/training/optimizer/test:sgd_test
[  PASSED  ] 5 tests.

Output was also cross-checked against torch.optim.AdamW on four small cases (simple convergence, decoupled weight decay, multi-parameter). All four match to six decimal places.

cc @JacobSzwejbka

Ports AdamW alongside the existing SGD implementation, following the pattern in extension/training/optimizer/sgd.{h,cpp}. Weight decay is decoupled (applied to the parameter directly, not folded into the gradient) per Loshchilov & Hutter 2019, this is the property that distinguishes AdamW from Adam-with-L2. Fixes pytorch#18766

pytorch-bot · 2026-04-13T21:01:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18848

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull workflow for PyTorch trunk commits

❌ 1 Awaiting Approval, 3 New Failures, 3 Pending, 3 Unrelated Failures

As of commit bda405f with merge base c11ba1b ():

AWAITING APPROVAL - The following workflow needs approval before CI can run:

periodic (gh)

NEW FAILURES - The following jobs have failed:

Build documentation / build (buck2) / Build doc (gh)
Cadence Build & Test / cpu-test / test-ops / test-ops (gh)
examples/cadence/operators/test_g3_ops.py::ATenOpTestCases::test_g3_native_layer_norm_out_25
pull / unittest-editable / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

BryanBradfo · 2026-04-13T21:05:12Z

@pytorchbot label "release notes: training"

Copilot

Pull request overview

Adds an AdamW optimizer implementation to ExecuTorch’s on-device training extension, aligning behavior with torch.optim.AdamW (decoupled weight decay) and integrating it into the existing C++ optimizer build/test setup.

Changes:

Introduces AdamW optimizer implementation (adamw.{h,cpp}) and exposes it as a training optimizer target.
Adds new gtests for AdamW and wires them into the optimizer test targets.
Updates training extension build source lists and documentation to reflect AdamW availability.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
shim_et/xplat/executorch/build/build_variables.bzl	Adds AdamW source to extension training sources list.
extension/training/optimizer/targets.bzl	Defines a new `adamw` C++ library target.
extension/training/optimizer/adamw.h	Declares AdamW API, options, param group, and state types.
extension/training/optimizer/adamw.cpp	Implements AdamW step logic and state allocation/freeing.
extension/training/optimizer/test/targets.bzl	Adds `adamw_test` target.
extension/training/optimizer/test/adamw_test.cpp	New unit tests for AdamW behavior and defaults.
extension/training/README.md	Updates optimizer list to include AdamW.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T21:06:47Z