Skip to content

Commit 2a22542

Browse files
committed
march joins
1 parent 398bb9d commit 2a22542

21 files changed

Lines changed: 474 additions & 236 deletions

_bibliography/preprints.bib

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,16 @@
11
---
22
---
3+
@misc{sun2026robustnessmixturesexpertsfeature,
4+
title={Robustness of Mixtures of Experts to Feature Noise},
5+
author={Dong Sun and Rahul Nittala and Rebekka Burkholz},
6+
year={2026},
7+
eprint={2601.14792},
8+
archivePrefix={arXiv},
9+
primaryClass={cs.LG},
10+
url={https://arxiv.org/abs/2601.14792},
11+
img={robustness-of-moes.png},
12+
abstract={Despite their practical success, it remains unclear why Mixture of Experts (MoE) models can outperform dense networks beyond sheer parameter scaling. We study an iso-parameter regime where inputs exhibit latent modular structure but are corrupted by feature noise, a proxy for noisy internal activations. We show that sparse expert activation acts as a noise filter: compared to a dense estimator, MoEs achieve lower generalization error under feature noise, improved robustness to perturbations, and faster convergence speed. Empirical results on synthetic data and real-world language tasks corroborate the theoretical insights, demonstrating consistent robustness and efficiency gains from sparse modular computation.},
13+
}
314
@misc{gadhikar2024cyclicsparsetrainingenough,
415
title={Cyclic Sparse Training: Is it Enough?},
516
author={Advait Gadhikar and Sree Harsha Nelaturu and Rebekka Burkholz},
@@ -10,6 +21,18 @@ @misc{gadhikar2024cyclicsparsetrainingenough
1021
url={https://arxiv.org/abs/2406.02773},
1122
code={https://github.com/RelationalML/TurboPrune},
1223
img={cyclic-train.png},
24+
abstract={The success of iterative pruning methods in achieving state-of-the-art sparse networks has largely been attributed to improved mask identification and an implicit regularization induced by pruning. We challenge this hypothesis and instead posit that their repeated cyclic training schedules enable improved optimization. To verify this, we show that pruning at initialization is significantly boosted by repeated cyclic training, even outperforming standard iterative pruning methods. The dominant mechanism how this is achieved, as we conjecture, can be attributed to a better exploration of the loss landscape leading to a lower training loss. However, at high sparsity, repeated cyclic training alone is not enough for competitive performance. A strong coupling between learnt parameter initialization and mask seems to be required. Standard methods obtain this coupling via expensive pruning-training iterations, starting from a dense network. To achieve this with sparse training instead, we propose SCULPT-ing, i.e., repeated cyclic training of any sparse mask followed by a single pruning step to couple the parameters and the mask, which is able to match the performance of state-of-the-art iterative pruning methods in the high sparsity regime at reduced computational cost.}
25+
}
26+
@misc{gadhikar2022dynamicalisometryresidualnetworks,
27+
title={Dynamical Isometry for Residual Networks},
28+
author={Advait Gadhikar and Rebekka Burkholz},
29+
year={2022},
30+
eprint={2210.02411},
31+
archivePrefix={arXiv},
32+
primaryClass={cs.LG},
33+
url={https://arxiv.org/abs/2210.02411},
34+
abstract={The training success, training speed and generalization ability of neural networks rely crucially on the choice of random parameter initialization. It has been shown for multiple architectures that initial dynamical isometry is particularly advantageous. Known initialization schemes for residual blocks, however, miss this property and suffer from degrading separability of different inputs for increasing depth and instability without Batch Normalization or lack feature diversity. We propose a random initialization scheme, RISOTTO, that achieves perfect dynamical isometry for residual networks with ReLU activation functions even for finite depth and width. It balances the contributions of the residual and skip branches unlike other schemes, which initially bias towards the skip connections. In experiments, we demonstrate that in most cases our approach outperforms initialization schemes proposed to make Batch Normalization obsolete, including Fixup and SkipInit, and facilitates stable training. Also in combination with Batch Normalization, we find that RISOTTO often achieves the overall best result.},
35+
img={risotto.png},
1336
}
1437
@misc{fischer2022lotteryticketsnonzerobiases,
1538
title={Lottery Tickets with Nonzero Biases},
@@ -20,5 +43,6 @@ @misc{fischer2022lotteryticketsnonzerobiases
2043
primaryClass={cs.LG},
2144
url={https://arxiv.org/abs/2110.11150},
2245
code={https://github.com/RelationalML/NonZeroBiases},
23-
img={nonzerobiases.png}
46+
img={nonzerobiases.png},
47+
abstract={The strong lottery ticket hypothesis holds the promise that pruning randomly initialized deep neural networks could offer a computationally efficient alternative to deep learning with stochastic gradient descent. Common parameter initialization schemes and existence proofs, however, are focused on networks with zero biases, thus foregoing the potential universal approximation property of pruning. To fill this gap, we extend multiple initialization schemes and existence proofs to nonzero biases, including explicit 'looks-linear' approaches for ReLU activation functions. These do not only enable truly orthogonal parameter initialization but also reduce potential pruning errors. In experiments on standard benchmark data, we further highlight the practical benefits of nonzero bias initialization schemes, and present theoretically inspired extensions for state-of-the-art strong lottery ticket pruning.}
2448
}

_bibliography/references.bib

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
---
22
---
3-
43
@inproceedings{merge,
54
title={Bridging Domains through Subspace-Aware Model Merging},
65
author={Levy Chaves and Chao Zhou and Rebekka Burkholz and Eduardo Valle and Andra Avila},
76
year={2026},
87
booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
98
img={model-merging.png},
9+
url={https://arxiv.org/abs/2603.05768},
10+
pdf={https://arxiv.org/pdf/2603.05768},
11+
abstract={Model merging integrates multiple task-specific models into a single consolidated one. Recent research has made progress in improving merging performance for in-distribution or multi-task scenarios, but domain generalization in model merging remains underexplored. We investigate how merging models fine-tuned on distinct domains affects generalization to unseen domains. Through an analysis of parameter competition in the task matrix using singular value decomposition, we show that merging models trained under different distribution shifts induces stronger conflicts between their subspaces compared to traditional multi-task settings. To mitigate this issue, we propose SCORE (Subspace COnflict-Resolving mErging), a method designed to alleviate such singular subspace conflicts. SCORE finds a shared orthogonal basis by computing the principal components of the concatenated leading singular vectors of all models. It then projects each task matrix into the shared basis, pruning off-diagonal components to remove conflicting singular directions. SCORE consistently outperforms, on average, existing model merging approaches in domain generalization settings across a variety of architectures and model scales, demonstrating its effectiveness and scalability.},
1012
}
1113

1214
@inproceedings{sanyal2026games,
@@ -65,7 +67,19 @@ @inproceedings{
6567
url={https://openreview.net/forum?id=XKB5Hu0ACY},
6668
pdf={https://openreview.net/pdf?id=XKB5Hu0ACY},
6769
abstract={Understanding the implicit bias of optimization algorithms is key to explaining and improving the generalization of deep models. The hyperbolic implicit bias induced by pointwise overparameterization promotes sparsity, but also yields a small inverse Riemannian metric near zero, slowing down parameter movement and impeding meaningful parameter sign flips. To overcome this obstacle, we propose Hyperbolic Aware Minimization (HAM), which alternates a standard optimizer step with a lightweight hyperbolic mirror step. The mirror step incurs less compute and memory than pointwise overparameterization, reproduces its beneficial hyperbolic geometry for feature learning, and mitigates the small–inverse-metric bottleneck. Our characterization of the implicit bias in the context of underdetermined linear regression provides insights into the mechanism how HAM consistently increases performance --even in the case of dense training, as we demonstrate in experiments with standard vision benchmarks. HAM is especially effective in combination with different sparsification methods, advancing the state of the art.},
68-
img={ham-hyperbolic-step.png},
70+
img={hyperbolic-aware-minimization.png},
71+
}
72+
73+
@inproceedings{
74+
adnan2026sparseopt,
75+
title={SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training},
76+
author={Mohammed Adnan and Rohan Jain and Tom Jacobs and Ekansh Sharma and Rahul G Krishnan and Rebekka Burkholz and Yani Ioannou},
77+
booktitle={The Third Conference on Parsimony and Learning (Recent Spotlight Track)},
78+
year={2026},
79+
url={https://openreview.net/forum?id=qerVUczDMf},
80+
pdf={https://openreview.net/pdf?id=qerVUczDMf},
81+
abstract={Dynamic Sparse Training (DST) methods train neural networks by maintaining sparsity while dynamically adapting the network topology. Despite the promise of reduced computation, DST methods converge significantly slower than dense training, often requiring comparable training time to achieve similar accuracy. We demonstrate both analytically and empirically that Batch Normalization (BN) adversely affects sparse training and propose SparseOpt — a sparsity-aware optimizer— to address this. Experiments on ResNet models across CIFAR-100 and ImageNet demonstrate consistently faster convergence and improved generalization with our proposed method. Our work highlights the limitations of current normalization layers in sparse training and provides the first systematic study of the interaction between Batch Normalization, sparse layers, and DST, taking a significant step toward making DST practically competitive with dense training.},
82+
img={sparseopt.png}
6983
}
7084

7185
@inproceedings{

_data/news.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
headline: "Congratulations to Advait for submitting his PhD!"
1010

1111
- date: 23. March 2026
12-
headline: "💬 Most of the group is presenting at [CPAL](https://cpal.cc/spotlight_track/) with six papers as recent spotlights."
12+
headline: "💬 Most of the group is presenting at [CPAL](https://cpal.cc/spotlight_track/) with nine papers as recent spotlights."
1313

1414
- date: 1. March 2026
1515
headline: "Welcome to Rohan, Franka, and Jonas!"

_data/team_members.yml

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
email: chao.zhou@cispa.de
1919
url: https://chaoedisonzhouucl.github.io/
2020
scholar: https://scholar.google.com/citations?user=ttO5HpQAAAAJ
21-
description: "I focus on understanding the intricate dynamics of training and fine-tuning in machine learning models, with the goal of developing more efficient and effective learning algorithms. My research explores how optimization processes evolve and how we can refine these methods to improve performance. Currently, I am particularly interested in gradient compression techniques."
21+
description: "I focus on understanding the intricate dynamics of training and fine-tuning in machine learning models, with the goal of developing more efficient and effective learning algorithms. My research explores how optimization processes evolve and how we can refine these methods to improve performance. Currently, I am particularly interested in gradient compression techniques. I obtained my PhD under the supervision of Prof. Miguel Rodrigues at University College London, UK."
2222

2323
- name: Dr. Gowtham Abbavaram Reddy
2424
last_name: Reddy
@@ -27,7 +27,7 @@
2727
email: gowtham.abbavaram@cispa.de
2828
url: https://gautam0707.github.io/
2929
scholar: https://scholar.google.com/citations?user=Iewg-GAAAAAJ
30-
description: "I work on research problems at the intersection of machine learning and causality, focusing on modeling, inference, and interpreting machine learning models from a causal perspective to enhance their robustness and trustworthiness."
30+
description: "I work on research problems at the intersection of machine learning and causality, focusing on modeling, inference, and interpreting machine learning models from a causal perspective to enhance their robustness and trustworthiness. I received my Ph.D. at the Indian Institute of Technology Hyderabad, where I was advised by Prof. Vineeth N Balasubramanian. During my Ph.D., I was awarded the prestigious Prime Minister's Research Fellowship (PMRF)."
3131

3232
- name: Dr. Franka Bause
3333
last_name: Bause
@@ -36,7 +36,8 @@
3636
email: franka.bause@cispa.de
3737
url: https://frareba.github.io/
3838
scholar: https://scholar.google.com/citations?user=UTQlpH8AAAAJ
39-
39+
description: "I focus on graph learning and similarity measures for graphs, with the aim of improving efficiency, expressivity, and accuracy. I completed my doctorate with distinction in the Kriege group at University of Vienna, and received the Award of Excellence from the Austrian Ministry of Women, Science and Research for my thesis."
40+
4041
- role: PhD students
4142
members:
4243
- name: Advait Gadhikar
@@ -88,8 +89,7 @@
8889
start_date: Jul 2024
8990
email: dong.sun@cispa.de
9091
url: https://cispa.de/en/people/c01dosu
91-
description: "My current research focuses on theoretically elucidating the superior performance of Mixture of Experts models, with an emphasis on their generalization performance, sample complexity, training dynamics, and robustness to adversarial noises.
92-
I did my master's degree at ETH Zurich."
92+
description: "My current research focuses on theoretically elucidating the superior performance of Mixture of Experts models, with an emphasis on their generalization performance, sample complexity, training dynamics, and robustness to adversarial noises. I studied a master's degree at ETH Zurich."
9393

9494
- name: Baraah Sidahmed
9595
last_name: Sidahmed
@@ -106,4 +106,12 @@
106106
start_date: Feb 2026
107107
email: rohan.jain@cispa.de
108108
url: https://cispa.de/en/people/c02roja
109-
scholar: https://scholar.google.com/citations?user=cUkv6VcAAAAJ
109+
scholar: https://scholar.google.com/citations?user=cUkv6VcAAAAJ
110+
111+
- name: Jonas Niederle
112+
last_name: Niederle
113+
photo: c01joni.jpg
114+
url: https://cispa.de/en/people/c01joni
115+
scholar: https://scholar.google.com/citations?user=-K5k5z0AAAAJ
116+
start_date: Mar 2026
117+
email: jonas.niederle@cispa.de

_pages/openings.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,14 +46,12 @@ We are a small team with a flat management structure and a collaborative work cu
4646

4747
The starting dates of the positions are flexible. We are committed to providing a healthy work environment and fostering diversity and respectful interaction. We welcome applications by candidates from all backgrounds and also support non-standard careers.
4848

49-
### Current open positions
50-
51-
* We have PhD and postdoc positions available for 2026.
52-
* [PhD and Postdocs in Efficient Deep Learning](https://career.cispa.de/jobs/group-relationalml-53) at CISPA Helmholtz Center for Information Security.
53-
5449
### Past open positions
5550
This is a non-exhaustive list of past open positions in our group.
5651

52+
* We had PhD and postdoc positions available for 2026.
53+
* [PhD and Postdocs in Efficient Deep Learning](https://career.cispa.de/jobs/group-relationalml-53) at CISPA Helmholtz Center for Information Security.
54+
5755
* We received an ERC Starting Grant in 2023 ([SPARSE-ML](https://cispa.de/en/research/grants/sparse-ml)) and had several open positions for PhD students and Postdocs:
5856
* [PhD position in sparse machine learning](https://euraxess.ec.europa.eu/jobs/144401).
5957
* [Postdoctoral position in sparse machine learning](https://euraxess.ec.europa.eu/jobs/144392).

0 commit comments

Comments
 (0)