Bias-Optimal Bounds for SGD

Introduction
Getting Started
Functionality
Example
Customization
References

Introduction

This code aims to implement a performance estimation problem for Stochastic Gradient Descent (SGD) without bounded variance assumption. The main results, as well as the methodology employed to implement them, are summarized in [1].

Getting Started

The code is written in Python 3.13. To install the required packages with pip, run the following command:

python3 -m pip install -r requirements.txt

Note that you will need to obtain and install a license for Mosek in the first place. This can be freely done for academic purposes, see here for more details.

The recommended practice is to create a virtual environment dedicated to the project, which can be done with

python3 -m venv .venv
. .venv\bin\activate
pip install -r requirements.txt

Functionality

The main functionality of the code is to estimate the worst instance of SGD for a given problem.

sgd = get_worst_instance(gamma=0.5, T=2, mu=0, L=1)

Parameters

gamma (float): The step-size of the SGD algorithm.
T (int): The number of iterations.
mu (float): The strong convexity parameter. If set to 0, the problem is not strongly convex.
L (float): The smoothness parameter.
objective (str): The objective function. Possible values are:
- "bias": minimize the rho term (in front of the sum) in the Lyapunov function (for non-strongly convex problems)
- "variance": minimizes the sum of (e_k), by fixing d to be its maximal value (up to a factor 1-epsilon) Defaults to "bias".
additional_constraints (function): (Optional) Additional conditions to add to the SGD class. Should be a function that takes in the SGD class object and modifies it in place.
solver (str): The solver to use. Possible values are:
- "MOSEK": Use the MOSEK solver [2] (default).
- "CLARABEL": Use the Clarabel solver [3]. This solver has purposefully been degraded with a lower tolerance to reach different solutions. It is not recommended to use this solver for real problems.

Returns

sgd (SGD): The instance of SGD after solving. This instance allows to access all the dual variables through:
- sgd.value: The objective value of the PEP.
- sgd.rho: The value of the rho term in the Lyapunov function.
- sgd.vars[k]: An array containing the values of the Lyapunov variables at iteration k. Come as a tuple (a_k, e_k).
- sgd.lambs[k]: A matrix containing the values of the dual variables lambda at iteration k.
- sgd.lamb1s[k]: An array containing the values of the dual variables lambda_1 at iteration k.
- sgd.lamb2s[k]: An array containing the values of the dual variables lambda_2 at iteration k.
- sgd.tau[k]: The values of the dual variable tau at iteration k.

Example

# Imports
import sys

sys.path.append("../")
from src import get_worst_instance
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

# Constants
GAMMAS = np.linspace(0.1, 3, 100)
Ts = [2, 5, 10]

# Compute all rates
rates = [[get_worst_instance(gamma, N, mu=0, L=1).value for gamma in tqdm(GAMMAS)] for N in Ns]

# Plot obtained rates
fig, axs = plt.subplots(1, 3, figsize=(10, 3), sharey=True)

for i, (T, ax) in enumerate(zip(Ts, axs.flatten())):
    ax.plot(GAMMAS, rates[i])
    ax.set_title(f"T={T} Iteration{'s' if T > 1 else ''}")
    ax.set_xlabel(r"Normalized Step-Size ($\gamma L$)")
    if i == 0: ax.set_ylabel(r"Optimal Bias ($\rho_\text{opt}$)")

plt.tight_layout()
plt.show()

The code above computes the worst rate of convergence of SGD for different step-sizes and number of iterations. The result is shown below:

Customization

The code is designed to be easily customizable. Additional variance assumptions or objectives to the PEP may be integrated by modifying the src/SGD/SGD.py file. Some experiments are provided in tests/ to illustrate the functionality, and may be used as a template for further experiments.

References

[1] Cortild, D., Ketels, L., Peypouquet, J., & Garrigos, G. (2025). New Tight Bounds for SGD without Variance Assumption: A Computer-Aided Lyapunov Analysis. arXiv preprint arXiv: 2505.17965. https://doi.org/10.48550/arXiv.2505.17965.
[2] MOSEK, A. (2025). MOSEK Optimizer API for Python. Release 11.0.20. http://www.mosek.com.
[3] Goulart, P. J., & Chen, Y. (2024). Clarabel: An interior-point solver for conic programs with quadratic objectives. arXiv preprint arXiv:2405.12762. https://doi.org/10.48550/arXiv.2405.12762.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.png		example.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bias-Optimal Bounds for SGD

Introduction

Getting Started

Functionality

Parameters

Returns

Example

Customization

References

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bias-Optimal Bounds for SGD

Introduction

Getting Started

Functionality

Parameters

Returns

Example

Customization

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages