You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We push the limits of LLM sparsity through advanced optimization frameworks such as ADMM, aiming for robust generalization even at extreme compression levels.
We develop unified frameworks that integrate pruning, quantization, and distillation to reduce error accumulation and preserve core model capabilities, including reasoning, in highly compressed states.
457
+
</div>
458
+
</div>
326
459
</div>
327
460
</div>
328
461
@@ -334,7 +467,24 @@ <h2>Research Focus</h2>
334
467
The shift toward massive-scale models has made distributed training essential for workloads that no single machine can support.
335
468
From an optimization perspective, the key challenge is maintaining efficiency while managing the overhead of communication between devices.
336
469
Our research develops robust algorithms for decentralized and high-latency training environments.
<spanclass="fold-item-title">Communication-Efficient Training <spanclass="paper-tag">(<ahref="https://arxiv.org/abs/2602.18181" target="_blank">arXiv 2026</a>)</span></span>
477
+
We design methods that reduce the heavy data-sharing requirements of distributed training through information compression, reduced synchronization, and zeroth-order optimization.
478
+
</div>
479
+
<divclass="fold-item">
480
+
<spanclass="fold-item-title">Asynchronous Training <spanclass="paper-tag">(<ahref="https://arxiv.org/abs/2602.03515" target="_blank">arXiv 2026</a>)</span></span>
481
+
We create algorithms that allow devices to work independently and reduce waiting time, with a focus on correcting errors caused by slightly outdated information.
We develop adaptive strategies for imbalanced computational resources across devices, aiming for consistent global convergence despite variations in hardware performance.
Optimization is not only a subject of study in itself, but also a versatile lens through which we tackle diverse challenges arising in real-world deep learning systems.
496
+
Optimization is not only a subject of study in itself, but also a versatile lens through which we tackle challenges in interpretability, uncertainty quantification, continual learning, and other real-world deep learning systems.
347
497
Our research explores how optimization principles can be extended and applied to address such challenges.
We bridge complex model architectures and human understanding, from uncovering model logic in realistic scenarios to evaluating the reliability of model outputs.
We study constrained optimization problems where models adapt to new knowledge without disrupting what has already been learned, opening principled approaches to continual learning.
510
+
</div>
511
+
</div>
349
512
</div>
350
513
</div>
351
514
@@ -356,7 +519,20 @@ <h2>Research Focus</h2>
356
519
<spanclass="orange">Advanced Optimization</span>
357
520
Optimization has long been a source of crucial ideas that drastically enhance all corners of deep neural network training, and many of its most impactful questions remain open.
358
521
Our research investigates these questions and develops optimization principles and algorithms for modern deep learning systems.
<spanclass="fold-item-title">Optimization for Generalization</span>
529
+
We study the role of loss landscape curvature in generalization and develop flatness-oriented optimization strategies with both theoretical advantages and scalability for modern deep neural networks.
530
+
</div>
531
+
<divclass="fold-item">
532
+
<spanclass="fold-item-title">Zeroth-Order Optimization for Black-Box Model Training</span>
533
+
We develop methods that operate without direct gradient access, motivated by settings such as proprietary model APIs and decentralized learning systems where gradients are unavailable.
0 commit comments