Skip to content

DanielCortild/Bias-Optimal-SGD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bias-Optimal Bounds for SGD

  1. Introduction
  2. Getting Started
  3. Functionality
  4. Example
  5. Customization
  6. References

Introduction

This code aims to implement a performance estimation problem for Stochastic Gradient Descent (SGD) without bounded variance assumption. The main results, as well as the methodology employed to implement them, are summarized in [1].

Getting Started

The code is written in Python 3.13. To install the required packages with pip, run the following command:

python3 -m pip install -r requirements.txt

Note that you will need to obtain and install a license for Mosek in the first place. This can be freely done for academic purposes, see here for more details.

The recommended practice is to create a virtual environment dedicated to the project, which can be done with

python3 -m venv .venv
. .venv\bin\activate
pip install -r requirements.txt

Functionality

The main functionality of the code is to estimate the worst instance of SGD for a given problem.

sgd = get_worst_instance(gamma=0.5, T=2, mu=0, L=1)

Parameters

  • gamma (float): The step-size of the SGD algorithm.
  • T (int): The number of iterations.
  • mu (float): The strong convexity parameter. If set to 0, the problem is not strongly convex.
  • L (float): The smoothness parameter.
  • objective (str): The objective function. Possible values are:
    • "bias": minimize the rho term (in front of the sum) in the Lyapunov function (for non-strongly convex problems)
    • "variance": minimizes the sum of (e_k), by fixing d to be its maximal value (up to a factor 1-epsilon) Defaults to "bias".
  • additional_constraints (function): (Optional) Additional conditions to add to the SGD class. Should be a function that takes in the SGD class object and modifies it in place.
  • solver (str): The solver to use. Possible values are:
    • "MOSEK": Use the MOSEK solver [2] (default).
    • "CLARABEL": Use the Clarabel solver [3]. This solver has purposefully been degraded with a lower tolerance to reach different solutions. It is not recommended to use this solver for real problems.

Returns

  • sgd (SGD): The instance of SGD after solving. This instance allows to access all the dual variables through:
    • sgd.value: The objective value of the PEP.
    • sgd.rho: The value of the rho term in the Lyapunov function.
    • sgd.vars[k]: An array containing the values of the Lyapunov variables at iteration k. Come as a tuple (a_k, e_k).
    • sgd.lambs[k]: A matrix containing the values of the dual variables lambda at iteration k.
    • sgd.lamb1s[k]: An array containing the values of the dual variables lambda_1 at iteration k.
    • sgd.lamb2s[k]: An array containing the values of the dual variables lambda_2 at iteration k.
    • sgd.tau[k]: The values of the dual variable tau at iteration k.

Example

# Imports
import sys

sys.path.append("../")
from src import get_worst_instance
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

# Constants
GAMMAS = np.linspace(0.1, 3, 100)
Ts = [2, 5, 10]

# Compute all rates
rates = [[get_worst_instance(gamma, N, mu=0, L=1).value for gamma in tqdm(GAMMAS)] for N in Ns]

# Plot obtained rates
fig, axs = plt.subplots(1, 3, figsize=(10, 3), sharey=True)

for i, (T, ax) in enumerate(zip(Ts, axs.flatten())):
    ax.plot(GAMMAS, rates[i])
    ax.set_title(f"T={T} Iteration{'s' if T > 1 else ''}")
    ax.set_xlabel(r"Normalized Step-Size ($\gamma L$)")
    if i == 0: ax.set_ylabel(r"Optimal Bias ($\rho_\text{opt}$)")

plt.tight_layout()
plt.show()

The code above computes the worst rate of convergence of SGD for different step-sizes and number of iterations. The result is shown below:

Customization

The code is designed to be easily customizable. Additional variance assumptions or objectives to the PEP may be integrated by modifying the src/SGD/SGD.py file. Some experiments are provided in tests/ to illustrate the functionality, and may be used as a template for further experiments.

References

[1] Cortild, D., Ketels, L., Peypouquet, J., & Garrigos, G. (2025). New Tight Bounds for SGD without Variance Assumption: A Computer-Aided Lyapunov Analysis. arXiv preprint arXiv: 2505.17965. https://doi.org/10.48550/arXiv.2505.17965.
[2] MOSEK, A. (2025). MOSEK Optimizer API for Python. Release 11.0.20. http://www.mosek.com.
[3] Goulart, P. J., & Chen, Y. (2024). Clarabel: An interior-point solver for conic programs with quadratic objectives. arXiv preprint arXiv:2405.12762. https://doi.org/10.48550/arXiv.2405.12762.

About

Code implementing the Performance Estimation Problem (PEP) methodology for SGD without variance assumption.

Topics

Resources

License

Stars

Watchers

Forks

Contributors