Skip to content

Commit bf8e55c

Browse files
authored
Update publications.yml
1 parent 8ec4b47 commit bf8e55c

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

_data/publications.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
authors: Tom Jacobs, Chao Zhou, and Rebekka Burkholz
55
url: https://openreview.net/forum?id=YgudIlQ9nC
66
pdf: https://openreview.net/forum?id=YgudIlQ9nC
7-
image: formulamw.png
7+
image: marginadam_page1.png
88
abstract: "What distinguishes modern adaptive methods from gradient descent to favor better generalizing solutions? To study this question for steepest-descent methods, including sign descent (an optimizer closely related to Adam), we introduce steepest mirror flows as a unifying theoretical framework. This enables us to analyze how optimization geometry governs learning dynamics, implicit bias, and sparsity. It also suggests a mechanism that may help explain why Adam and AdamW often outperform SGD in fine-tuning. Focusing on diagonal linear networks and deep diagonal linear reparameterizations, we show that steeper descent promotes saddle-point escape. By contrast, gradient descent typically requires much larger learning rates to escape saddles—regimes that are less common in fine-tuning practice. Furthermore, we find that decoupled weight decay, as in AdamW, stabilizes sparse training by enforcing novel balance equations. Empirical experiments establish that our theoretical insights and hypothesized mechanisms transfer to realistic settings. Together, these results identify two mechanisms through which steepest descent can benefit modern optimization: saddle escape and sparsity."
99
bib: |
1010
@inproceedings{

0 commit comments

Comments
 (0)