Skip to content

IteraLabs/convective-rs

Repository files navigation

Design Document: Distributed Gradient Methods for Convex ML in Networks

Reference: Nedić, "Distributed Gradient Methods for Convex Machine Learning Problems in Networks" (IEEE SPM, 2020)


0. Paper Synopsis and Scope

The paper presents a unified framework for distributed first-order optimization over networks. The core problem is:

min_{x ∈ ℝⁿ}  (1/m) Σᵢ₌₁ᵐ fᵢ(x)

where m agents are embedded in a communication graph G = ([m], E), each agent i owns a private local objective fᵢ: ℝⁿ → ℝ, and no agent has access to the global objective or to any other agent's data. The algorithms rely exclusively on local gradient computation and one-hop neighbor communication.

The paper covers six distinct algorithms across two graph classes (undirected and directed), organized in three families:

Family Undirected Graph Directed Graph
Consensus-based gradient Eq. (10): Consensus + gradient Eq. (14): Push–sum consensus
Gradient–push Eq. (15): Ratio consensus + gradient
Gradient-tracking Eq. (16): EXTRA / DIGing style Eq. (17): Directed gradient-tracking
APull–BPush Eq. (18): Dual-matrix gradient-tracking

Plus variant Eq. (19) from [33] and the connection to Eq. (12) (adapt-then-combine variant of (10)).

Each algorithm has specific requirements on the weight matrix (row stochastic, column stochastic, doubly stochastic), step-size schedule (diminishing vs. constant), and convergence rate guarantees.


1. Architectural Overview

The implementation spans two crates with clear responsibilities:

┌─────────────────────────────────────────────────────────────────────────┐
│                          convective-ml                                  │
│                                                                         │
│  ┌──────────────┐  ┌───────────────────┐  ┌──────────────────────────┐ │
│  │  objective    │  │   agent           │  │  solver                  │ │
│  │              │  │                   │  │                          │ │
│  │  ObjectiveFn │  │  Agent<S,O,Alg>   │  │  DistributedSolver      │ │
│  │  LossFunction│  │  AgentState<S>    │  │  SolverConfig            │ │
│  │  Regularizer │  │  Neighborhood     │  │  StepSizeSchedule        │ │
│  │  LocalObj<S> │  │                   │  │  ConvergenceMonitor      │ │
│  └──────────────┘  └───────────────────┘  └──────────────────────────┘ │
│                                                                         │
│  ┌──────────────┐  ┌───────────────────┐  ┌──────────────────────────┐ │
│  │  algorithm   │  │   mixing          │  │  convergence             │ │
│  │              │  │                   │  │                          │ │
│  │  Algorithm   │  │  MixingStrategy   │  │  ConvergenceCriterion    │ │
│  │  Consensus   │  │  WeightMatrix     │  │  IterationLog            │ │
│  │  PushSum     │  │  StochasticType   │  │  DisagreementMetric      │ │
│  │  GradPush    │  │  MatrixBuilder    │  │                          │ │
│  │  GradTrack   │  │                   │  │                          │ │
│  │  APullBPush  │  │                   │  │                          │ │
│  └──────────────┘  └───────────────────┘  └──────────────────────────┘ │
│                                                                         │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │  types + errors                                                  │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                                                                         │
├─────────────────────────────────────────────────────────────────────────┤
│                         convective-graph                                │
│                                                                         │
│  ┌─────────────────┐  ┌────────────────────┐  ┌────────────────────┐  │
│  │  traits::alge   │  │  traits::spectral  │  │  traits::neighbor  │  │
│  │  braic          │  │                    │  │                    │  │
│  │                 │  │  SpectralComputable│  │  NeighborQuery     │  │
│  │  DegreeComput.  │  │  spectral_gap()   │  │  neighbors(i)      │  │
│  │  LaplacianComp. │  │  fiedler_vector() │  │  in_neighbors(i)   │  │
│  │  MatrixRepr.    │  │                    │  │  out_neighbors(i)  │  │
│  └─────────────────┘  └────────────────────┘  └────────────────────┘  │
│                                                                         │
│  ┌─────────────────┐  ┌────────────────────┐  ┌────────────────────┐  │
│  │  Graph<S,Dir>   │  │  DenseStorage<S>   │  │  GraphBuilder      │  │
│  │  GraphBase      │  │  (future: Sparse)  │  │  TOML I/O          │  │
│  │  NodeIterable   │  │                    │  │                    │  │
│  │  EdgeIterable   │  │                    │  │                    │  │
│  └─────────────────┘  └────────────────────┘  └────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘

Dependency direction: convective-ml depends on convective-graph, never the reverse.


2. What convective-graph Needs First (Phase 2 Unblocking)

Before any convective-ml code can be written, the following must exist in convective-graph. These are the Phase 2 items directly required by the Nedić algorithms.

2.1 Neighbor Query Trait

Every algorithm in the paper requires agents to know their one-hop neighbors. The current crate has no per-node neighbor query.

// convective-graph/src/traits/neighbor.rs

/// Neighbor query capabilities for graph types.
pub trait NeighborQuery: GraphBase {
    /// Returns the set of neighbors of node `i` in an undirected graph.
    /// For directed graphs, this returns out-neighbors.
    fn neighbors(&self, node: NodeId) -> Vec<NodeId>;

    /// Returns the in-neighbors: { j | (j, i) ∈ E }.
    /// Paper notation: N_i^{in} = { j | (j, i) ∈ E }
    fn in_neighbors(&self, node: NodeId) -> Vec<NodeId>;

    /// Returns the out-neighbors: { j | (i, j) ∈ E }.
    /// Paper notation: N_i^{out} = { j | (i, j) ∈ E }
    fn out_neighbors(&self, node: NodeId) -> Vec<NodeId>;

    /// Returns the degree (number of neighbors) for undirected graphs,
    /// or out-degree for directed graphs.
    fn degree(&self, node: NodeId) -> usize;

    /// Returns the in-degree of a node.
    fn in_degree(&self, node: NodeId) -> usize;

    /// Returns the out-degree of a node.
    fn out_degree(&self, node: NodeId) -> usize;
}

2.2 Algebraic Traits (Degree + Laplacian)

Required for spectral analysis of mixing matrices and convergence rate characterization.

// convective-graph/src/traits/algebraic.rs

use nalgebra::{DMatrix, DVector};

/// Degree computation from the adjacency matrix.
pub trait DegreeComputable: GraphBase {
    /// Returns the degree vector d where d_i = Σ_j A[i,j].
    fn degree_vector(&self) -> DVector<Self::S>;

    /// Returns the diagonal degree matrix D = diag(d).
    fn degree_matrix(&self) -> DMatrix<Self::S>;

    /// Returns the in-degree vector for directed graphs.
    fn in_degree_vector(&self) -> DVector<Self::S>;

    /// Returns the out-degree vector for directed graphs.
    fn out_degree_vector(&self) -> DVector<Self::S>;
}

/// Laplacian matrix computations.
pub trait LaplacianComputable: DegreeComputable {
    /// Combinatorial Laplacian: L = D - A
    fn laplacian(&self) -> DMatrix<Self::S>;

    /// Symmetric normalized Laplacian: L_sym = I - D^{-1/2} A D^{-1/2}
    /// Used in spectral graph theory and GCN-style methods.
    fn normalized_laplacian(&self) -> DMatrix<Self::S>;

    /// Random-walk Laplacian: L_rw = I - D^{-1} A
    fn random_walk_laplacian(&self) -> DMatrix<Self::S>;
}

2.3 Spectral Trait

Required for convergence analysis — the spectral gap λ₂ of the mixing matrix determines the convergence rate of all consensus-based methods.

// convective-graph/src/traits/spectral.rs

use nalgebra::{DMatrix, DVector};

/// Spectral properties of graph matrices.
pub trait SpectralComputable: LaplacianComputable {
    /// Eigenvalues of the Laplacian, sorted ascending.
    fn laplacian_eigenvalues(&self) -> DVector<Self::S>;

    /// The algebraic connectivity (second-smallest Laplacian eigenvalue λ₂).
    /// Controls convergence rate of consensus: larger λ₂ → faster mixing.
    fn algebraic_connectivity(&self) -> Self::S;

    /// The Fiedler vector (eigenvector corresponding to λ₂).
    /// Used for graph partitioning.
    fn fiedler_vector(&self) -> DVector<Self::S>;

    /// Spectral gap of a given stochastic matrix: 1 - |λ₂(A)|.
    /// Directly controls convergence rate of distributed methods.
    fn spectral_gap(&self, matrix: &DMatrix<Self::S>) -> Self::S;
}

3. Module Structure for convective-ml

convective-ml/
├── Cargo.toml
├── src/
│   ├── lib.rs
│   ├── types/
│   │   ├── mod.rs
│   │   └── vector.rs          # DecisionVector<S>, gradient vectors
│   ├── objective/
│   │   ├── mod.rs
│   │   ├── traits.rs          # ObjectiveFn, LossFunction, Regularizer
│   │   ├── loss.rs            # LogisticLoss, HingeLoss, SquaredLoss
│   │   ├── regularizer.rs     # L2Regularizer, L1Regularizer
│   │   └── local.rs           # LocalObjective<S> combining loss + regularizer
│   ├── mixing/
│   │   ├── mod.rs
│   │   ├── traits.rs          # MixingStrategy, WeightMatrix
│   │   ├── stochastic.rs      # StochasticType, stochasticity validation
│   │   ├── builder.rs         # WeightMatrixBuilder (from graph topology)
│   │   └── properties.rs      # spectral gap, mixing time analysis
│   ├── agent/
│   │   ├── mod.rs
│   │   ├── state.rs           # AgentState<S>, dual variables (x, y, d, z)
│   │   └── neighborhood.rs    # Neighborhood view, message types
│   ├── step_size/
│   │   ├── mod.rs
│   │   └── schedule.rs        # StepSizeSchedule, Diminishing, Constant
│   ├── algorithm/
│   │   ├── mod.rs
│   │   ├── traits.rs          # Algorithm, AlgorithmRequirements
│   │   ├── consensus.rs       # ConsensusGradient (Eq. 10, 12)
│   │   ├── push_sum.rs        # PushSumConsensus (Eq. 14)
│   │   ├── gradient_push.rs   # GradientPush (Eq. 15)
│   │   ├── gradient_track.rs  # GradientTracking (Eq. 16, 17)
│   │   └── apull_bpush.rs     # APullBPush (Eq. 18, 19)
│   ├── solver/
│   │   ├── mod.rs
│   │   ├── config.rs          # SolverConfig
│   │   ├── distributed.rs     # DistributedSolver orchestration
│   │   └── monitor.rs         # ConvergenceMonitor, IterationLog
│   ├── convergence/
│   │   ├── mod.rs
│   │   ├── criteria.rs        # ConvergenceCriterion enum
│   │   └── metrics.rs         # Disagreement, OptimalityGap, etc.
│   └── errors/
│       └── mod.rs             # MlError enum

4. Type System

4.1 Core Vector Type

The paper operates on decision vectors xᵢ ∈ ℝⁿ. We wrap nalgebra::DVector<S> to provide domain-specific semantics.

// types/vector.rs

use convective_graph::types::Scalar;
use nalgebra::DVector;

/// A decision vector in ℝⁿ, the primary variable each agent maintains.
/// Paper notation: xᵢᵏ ∈ ℝⁿ — agent i's iterate at iteration k.
#[derive(Clone, Debug)]
pub struct DecisionVector<S: Scalar> {
    inner: DVector<S>,
}

/// A gradient vector ∇fᵢ(xᵢᵏ) ∈ ℝⁿ.
/// Semantically distinct from DecisionVector to prevent accidental mixing.
#[derive(Clone, Debug)]
pub struct GradientVector<S: Scalar> {
    inner: DVector<S>,
}

/// A direction vector dᵢᵏ ∈ ℝⁿ used in gradient-tracking algorithms.
/// Tracks an estimate of the global gradient direction.
/// Paper: dᵢ⁰ = ∇fᵢ(xᵢ⁰), then updated via mixing + gradient corrections.
#[derive(Clone, Debug)]
pub struct DirectionVector<S: Scalar> {
    inner: DVector<S>,
}

All three types impl Deref<Target = DVector<S>> for transparent nalgebra interop, plus arithmetic ops (Add, Sub, Mul<S>) and convex_combination(weights, vectors).

4.2 Agent Identity

// agent/mod.rs

use convective_graph::types::NodeId;

/// An agent's identity in the network, directly mapped to a graph node.
/// Paper notation: agent i ∈ [m] = {1, 2, ..., m}.
/// Implementation: zero-indexed via NodeId.
pub type AgentId = NodeId;

5. Objective Function Hierarchy

The paper defines the local objective as fᵢ(x) = (m/p) Σ_{s ∈ Sᵢ} (cρ(x) + ℓ(x; zₛ, yₛ)) (Eq. 5), which decomposes into a regularization term and a sum of per-sample losses.

5.1 Trait Hierarchy

// objective/traits.rs

use crate::types::{DecisionVector, GradientVector};
use convective_graph::types::Scalar;

/// A differentiable scalar-valued function f: ℝⁿ → ℝ.
/// Root trait for all objective functions in the optimization framework.
pub trait ObjectiveFn<S: Scalar> {
    /// Evaluate f(x).
    fn value(&self, x: &DecisionVector<S>) -> S;

    /// Compute ∇f(x).
    fn gradient(&self, x: &DecisionVector<S>) -> GradientVector<S>;

    /// Dimension of the domain ℝⁿ.
    fn dimension(&self) -> usize;
}

/// A loss function ℓ(x; z, y) parameterized by a data point (z, y).
/// Paper: ℓ(x; z, y) where z ∈ ℝⁿ is a feature, y ∈ {-1, +1} is a label.
pub trait LossFunction<S: Scalar> {
    /// Evaluate ℓ(x; z, y).
    fn loss(&self, x: &DecisionVector<S>, feature: &DVector<S>, label: S) -> S;

    /// Compute ∇_x ℓ(x; z, y).
    fn loss_gradient(
        &self,
        x: &DecisionVector<S>,
        feature: &DVector<S>,
        label: S,
    ) -> GradientVector<S>;
}

/// A convex regularization function ρ: ℝⁿ → ℝ.
/// Paper: ρ(x) is strongly convex (e.g., Euclidean norm squared).
pub trait Regularizer<S: Scalar> {
    /// Evaluate cρ(x) where c > 0 is the regularization parameter.
    fn penalty(&self, x: &DecisionVector<S>) -> S;

    /// Compute c∇ρ(x).
    fn penalty_gradient(&self, x: &DecisionVector<S>) -> GradientVector<S>;

    /// The strong convexity parameter of ρ.
    /// Used in convergence rate analysis.
    fn strong_convexity_param(&self) -> S;
}

/// Lipschitz continuity certificate.
/// Paper: f has Lipschitz continuous gradients with constant L if
/// ‖∇f(x) - ∇f(y)‖ ≤ L‖x - y‖ for all x, y ∈ ℝⁿ.
pub trait LipschitzContinuous<S: Scalar> {
    /// The Lipschitz constant L of the gradient.
    fn lipschitz_constant(&self) -> S;
}

5.2 Concrete Loss Functions

// objective/loss.rs

/// Logistic regression loss: ℓ(x; z, y) = log(1 + e^{-y⟨x,z⟩}).
/// Paper: "For linear classifiers, a common choice is the logistic regression
/// loss function."
pub struct LogisticLoss;

/// Hinge loss: ℓ(x; z, y) = max{0, 1 - y⟨x, z⟩}.
/// Paper: "Another common choice is the hinge-loss function,
/// giving rise to the maximum margin (linear) classifier problem."
/// Note: Not differentiable at 1 - y⟨x,z⟩ = 0. Requires subgradient.
pub struct HingeLoss;

/// Squared error loss: ℓ(x; z, y) = (1/2)(y - ⟨x, z⟩)².
/// Standard regression loss, included for completeness.
pub struct SquaredLoss;

5.3 Regularizers

// objective/regularizer.rs

/// L2 (Tikhonov) regularization: ρ(x) = (1/2)‖x‖².
/// ∇ρ(x) = x. Strongly convex with parameter 1.
/// Paper: "the regularizing function ρ(x) is a strongly convex function
/// (such as the Euclidean norm)."
pub struct L2Regularizer<S: Scalar> {
    /// Regularization coefficient c > 0.
    pub coefficient: S,
}

/// L1 regularization: ρ(x) = ‖x‖₁.
/// Promotes sparsity. Convex but not strongly convex, not differentiable at 0.
/// Requires proximal operator or subgradient methods.
pub struct L1Regularizer<S: Scalar> {
    pub coefficient: S,
}

5.4 Local Objective (Composite)

// objective/local.rs

/// The local objective function for agent i.
///
/// Paper Eq. (5): fᵢ(x) = (m/p) Σ_{s ∈ Sᵢ} (cρ(x) + ℓ(x; zₛ, yₛ))
///
/// Each agent owns a private dataset Sᵢ of (feature, label) pairs and
/// a loss function. The regularizer is shared across agents (same c, ρ).
///
/// This struct implements ObjectiveFn<S>, so it can compute fᵢ(x) and ∇fᵢ(x).
pub struct LocalObjective<S, L, R>
where
    S: Scalar,
    L: LossFunction<S>,
    R: Regularizer<S>,
{
    /// The loss function (shared type, e.g., LogisticLoss).
    loss: L,
    /// The regularizer (e.g., L2Regularizer).
    regularizer: R,
    /// Agent's private dataset: Vec of (feature_vector, label).
    data: Vec<DataPoint<S>>,
    /// Total number of data points across ALL agents (p in the paper).
    total_data_count: usize,
    /// Number of agents (m in the paper).
    agent_count: usize,
}

/// A single labeled data point (zₛ, yₛ).
pub struct DataPoint<S: Scalar> {
    /// Feature vector z ∈ ℝⁿ.
    pub feature: DVector<S>,
    /// Label y (typically ∈ {-1, +1} for classification).
    pub label: S,
}

LocalObjective implements ObjectiveFn<S> with:

  • value(x) = (m/p) * Σ_{s ∈ Sᵢ} [c*ρ(x) + ℓ(x; zₛ, yₛ)]
  • gradient(x) = (m/p) * Σ_{s ∈ Sᵢ} [c*∇ρ(x) + ∇ₓℓ(x; zₛ, yₛ)]

6. Mixing and Weight Matrices

This is the bridge between convective-graph and convective-ml. The paper's algorithms require weight matrices A (and sometimes B) that are compatible with the graph structure and satisfy specific stochasticity properties.

6.1 Stochasticity Classification

// mixing/stochastic.rs

/// Classification of a nonnegative matrix's stochastic properties.
/// Paper: different algorithms require different stochastic structures.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum StochasticType {
    /// Each row sums to 1: Σⱼ aᵢⱼ = 1 for all i.
    /// Required by: Eq. (18) A-matrix (APull–BPush).
    RowStochastic,

    /// Each column sums to 1: Σᵢ aᵢⱼ = 1 for all j.
    /// Required by: Eq. (13) for directed graphs, Eq. (17) A-matrix,
    /// Eq. (18) B-matrix (APull–BPush).
    ColumnStochastic,

    /// Both row and column stochastic: Σⱼ aᵢⱼ = 1 AND Σᵢ aᵢⱼ = 1.
    /// Required by: Eq. (8)/(10) for undirected graphs, Eq. (16).
    DoublyStochastic,
}

6.2 Weight Matrix

// mixing/traits.rs

use convective_graph::graph::DenseGraph;
use convective_graph::types::{Scalar, EdgeDirection, NodeId};
use nalgebra::DMatrix;

/// A weight matrix A compatible with graph G's structure.
///
/// Paper Eq. (8): aᵢⱼ > 0 iff {i,j} ∈ E or i = j, aᵢⱼ = 0 otherwise.
/// The matrix is nonneg, entries respect graph topology, and satisfies
/// a specific stochastic property depending on the algorithm.
pub struct WeightMatrix<S: Scalar> {
    /// The m×m weight matrix.
    matrix: DMatrix<S>,
    /// Which stochastic property this matrix satisfies.
    stochastic_type: StochasticType,
    /// The spectral gap: 1 - |λ₂(A)|. Larger → faster mixing.
    spectral_gap: Option<S>,
}

/// Trait for types that can produce a weight matrix from a graph.
///
/// Different algorithms need different stochastic types, so the builder
/// is parameterized by the target property.
pub trait MixingStrategy<S: Scalar> {
    /// Build a weight matrix compatible with graph G.
    fn build_weight_matrix<Dir: EdgeDirection>(
        &self,
        graph: &DenseGraph<S, Dir>,
        target: StochasticType,
    ) -> Result<WeightMatrix<S>, MlError>;
}

6.3 Weight Matrix Builders

The paper describes several construction strategies. Each becomes a struct implementing MixingStrategy.

// mixing/builder.rs

/// Equal-weight (Metropolis-like) rule.
/// Paper: "A common choice is to let all of the values aᵢⱼ, i ∈ N_j^{out} ∪ {j},
/// be the same and equal to the cardinality of the set N_j^{out} ∪ {j}."
///
/// For undirected: aᵢⱼ = 1/(|Nᵢ| + 1) for j ∈ Nᵢ ∪ {i}
/// Produces doubly stochastic matrix on regular graphs,
/// column stochastic on directed (equal out-weight).
pub struct EqualNeighborWeight;

/// Metropolis–Hastings weight rule.
/// Produces doubly stochastic matrices for undirected graphs.
/// aᵢⱼ = 1 / (1 + max(dᵢ, dⱼ))  for {i,j} ∈ E
/// aᵢᵢ = 1 - Σ_{j≠i} aᵢⱼ
pub struct MetropolisHastings;

/// Maximum-degree weight rule.
/// aᵢⱼ = 1/d_max  for j ∈ Nᵢ, aᵢᵢ = 1 - dᵢ/d_max.
/// Doubly stochastic for undirected graphs.
pub struct MaxDegreeWeight;

/// Lazy random walk weights (for column stochastic on directed graphs).
/// Used in push-sum methods.
/// aᵢⱼ = 1 / (|N_j^{out}| + 1) for i ∈ N_j^{out} ∪ {j}
pub struct LazyColumnStochastic;

6.4 Weight Matrix Methods

impl<S: Scalar> WeightMatrix<S> {
    /// Validate that A is compatible with graph G's structure.
    /// Checks: aᵢⱼ > 0 only if {i,j} ∈ E or i = j.
    pub fn validate_compatibility<Dir: EdgeDirection>(
        &self,
        graph: &DenseGraph<S, Dir>,
    ) -> Result<(), MlError>;

    /// Validate the stochastic property.
    pub fn validate_stochasticity(&self, tolerance: S) -> Result<StochasticType, MlError>;

    /// Compute spectral gap: 1 - |λ₂(A)|.
    /// Paper: controls convergence rate of all consensus-based methods.
    pub fn compute_spectral_gap(&mut self) -> S;

    /// Check that all entries are nonneg and positive where required.
    pub fn validate_positivity(&self) -> Result<(), MlError>;

    /// Access the raw matrix for use in algorithm update steps.
    pub fn as_matrix(&self) -> &DMatrix<S>;

    /// Get aᵢⱼ — the weight agent i assigns to agent j.
    pub fn weight(&self, i: AgentId, j: AgentId) -> S;
}

7. Agent State Machine

Each agent maintains iteration-specific state. The shape of this state varies by algorithm:

7.1 Agent State Variants

// agent/state.rs

/// State maintained by a single agent during distributed optimization.
///
/// Different algorithms require different auxiliary variables:
///
///   Consensus gradient (10):   xᵢᵏ only
///   Push-sum (14):             xᵢᵏ, yᵢᵏ (scalar normalizer)
///   Gradient-push (15):        xᵢᵏ, yᵢᵏ, zᵢᵏ = xᵢᵏ/yᵢᵏ
///   Gradient-tracking (16,17): xᵢᵏ, dᵢᵏ (direction tracker)
///   APull-BPush (18):          xᵢᵏ, dᵢᵏ (with dual matrices A, B)
///
/// Rather than one monolithic state, we use an enum that carries
/// exactly what each algorithm needs.
#[derive(Clone, Debug)]
pub enum AgentState<S: Scalar> {
    /// For consensus-based gradient method (Eq. 10, 12).
    Consensus {
        /// Decision variable xᵢᵏ ∈ ℝⁿ.
        x: DecisionVector<S>,
    },

    /// For push-sum consensus on directed graphs (Eq. 14).
    PushSum {
        /// Decision variable xᵢᵏ ∈ ℝⁿ.
        x: DecisionVector<S>,
        /// Scalar normalizer yᵢᵏ ∈ ℝ, initialized to 1.
        y: S,
    },

    /// For gradient-push method (Eq. 15).
    GradientPush {
        /// Decision variable xᵢᵏ ∈ ℝⁿ.
        x: DecisionVector<S>,
        /// Scalar normalizer yᵢᵏ ∈ ℝ, initialized to 1.
        y: S,
        /// Ratio variable zᵢᵏ = xᵢᵏ/yᵢᵏ — the actual estimate.
        z: DecisionVector<S>,
    },

    /// For gradient-tracking methods (Eq. 16, 17, 18, 19).
    GradientTracking {
        /// Decision variable xᵢᵏ ∈ ℝⁿ.
        x: DecisionVector<S>,
        /// Direction tracker dᵢᵏ ∈ ℝⁿ, initialized to ∇fᵢ(xᵢ⁰).
        d: DirectionVector<S>,
    },
}

7.2 Message Types

Agents exchange specific data with neighbors depending on the algorithm.

// agent/neighborhood.rs

/// The data an agent sends to its neighbors in a single communication round.
///
/// Paper: "every agent i sends its current iterate xᵢᵏ to its neighbors
/// j ∈ Nᵢ and receives xⱼᵏ from its neighbors j ∈ Nᵢ."
#[derive(Clone, Debug)]
pub enum AgentMessage<S: Scalar> {
    /// Consensus methods: send xᵢᵏ.
    Decision(DecisionVector<S>),

    /// Push-sum methods: send aⱼᵢ·xᵢᵏ and aⱼᵢ·yᵢᵏ.
    /// Paper Eq. (14): "each agent j sends aⱼᵢxⱼᵏ and aⱼᵢyⱼᵏ to all of
    /// its out-neighbors i ∈ N_j^{out}."
    PushSumPair {
        weighted_x: DecisionVector<S>,
        weighted_y: S,
    },

    /// Gradient-tracking: send xᵢᵏ and dᵢᵏ.
    /// Paper Eq. (16): agents exchange "x and d vectors with their neighbors."
    DecisionAndDirection {
        x: DecisionVector<S>,
        d: DirectionVector<S>,
    },

    /// APull-BPush: pull step sends (xⱼᵏ - αⱼdⱼᵏ), push step sends bₗᵢdᵢᵏ.
    /// Paper Eq. (18): "each i agent obtains xⱼᵏ - αⱼdⱼᵏ from its in-neighbors
    /// j ∈ N_i^{in}, and sends bₗᵢdᵢᵏ to each of its out-neighbors ℓ ∈ N_i^{out}."
    APullBPush {
        /// For A-pull: xⱼ - αⱼdⱼ (pre-combined by sender).
        pull_value: DecisionVector<S>,
        /// For B-push: bₗᵢdᵢ (weighted direction).
        push_direction: DirectionVector<S>,
    },
}

/// The collected messages an agent has received from its neighborhood.
pub struct IncomingMessages<S: Scalar> {
    /// Messages indexed by sender AgentId.
    pub messages: Vec<(AgentId, AgentMessage<S>)>,
}

8. Step-Size Schedules

The paper treats step-size selection as a first-class concern affecting convergence.

// step_size/schedule.rs

/// Step-size schedule producing αₖ > 0 at each iteration k.
///
/// Paper: "some positive step size αₖ" with convergence requiring
/// Σ αₖ = ∞ and Σ αₖ² < ∞ for diminishing schedules, or
/// constant α for gradient-tracking methods.
pub trait StepSizeSchedule<S: Scalar>: Clone {
    /// Returns αₖ for iteration k (0-indexed).
    fn step_size(&self, k: usize) -> S;

    /// Whether this schedule satisfies Σ αₖ = ∞ (necessary for convergence
    /// of consensus-based methods with diminishing step size).
    fn is_divergent_sum(&self) -> bool;

    /// Whether this schedule satisfies Σ αₖ² < ∞ (necessary for convergence
    /// of consensus-based methods with diminishing step size).
    fn is_square_summable(&self) -> bool;
}

/// Diminishing step size: αₖ = a / k^b where a > 0, b ∈ ((1/2), 1].
///
/// Paper: "These conditions are satisfied, for example, with step size
/// αₖ = 1/k, or more generally with the step size of the form
/// αₖ = a/k^b, where a > 0 is arbitrary while b ∈ ((1/2), 1]."
///
/// Convergence rate: O(log k / √k) for consensus-based methods.
#[derive(Clone, Debug)]
pub struct Diminishing<S: Scalar> {
    /// Numerator coefficient a > 0.
    pub a: S,
    /// Exponent b ∈ ((1/2), 1].
    pub b: S,
}

/// Constant step size: αₖ = α for all k.
///
/// Paper: "under some additional assumptions on the objective function,
/// the gradient method with a suitably chosen constant step size α > 0
/// produces a sequence {xᵏ} converging to the solution."
///
/// Used by gradient-tracking methods (Eq. 16, 17, 18, 19) to achieve
/// geometric/linear convergence rate O(qᵏ) where q ∈ (0, 1).
#[derive(Clone, Debug)]
pub struct Constant<S: Scalar> {
    /// The fixed step size α > 0.
    pub alpha: S,
}

/// Per-agent step sizes: each agent i uses its own αᵢ.
///
/// Paper Eq. (19): "every agent uses its own step size αᵢ, which need
/// not be coordinated with the step sizes of the other agents."
#[derive(Clone, Debug)]
pub struct PerAgent<S: Scalar> {
    /// Step size for each agent, indexed by agent ID.
    pub alphas: Vec<S>,
}

9. Algorithm Trait and Implementations

9.1 Core Algorithm Trait

// algorithm/traits.rs

use crate::agent::state::AgentState;
use crate::mixing::{StochasticType, WeightMatrix};
use crate::objective::traits::ObjectiveFn;
use crate::step_size::schedule::StepSizeSchedule;
use crate::types::DecisionVector;
use convective_graph::graph::DenseGraph;
use convective_graph::types::{EdgeDirection, NodeId, Scalar};

/// Requirements that an algorithm imposes on the weight matrix and graph.
pub struct AlgorithmRequirements {
    /// Required stochastic property of the primary weight matrix A.
    pub matrix_a: StochasticType,
    /// Required stochastic property of the secondary matrix B (if needed).
    pub matrix_b: Option<StochasticType>,
    /// Whether the graph must be undirected.
    pub requires_undirected: bool,
    /// Whether the graph must be (strongly) connected.
    pub requires_connected: bool,
    /// Whether a diminishing step size is required (vs. constant allowed).
    pub requires_diminishing_step: bool,
    /// Communication cost per iteration: how many vectors of size n
    /// each agent sends per edge per iteration.
    pub vectors_per_edge: usize,
    /// Whether scalar auxiliary variables are also exchanged.
    pub scalar_per_edge: bool,
}

/// A distributed optimization algorithm.
///
/// Each algorithm is a stateless description of an update rule.
/// The actual state lives in `AgentState` values managed by the solver.
///
/// Type parameters:
///   S — scalar type (f64, f32)
pub trait Algorithm<S: Scalar>: Clone {
    /// Human-readable name (e.g., "Consensus-Based Gradient Method").
    fn name(&self) -> &'static str;

    /// The structural requirements this algorithm imposes.
    fn requirements(&self) -> AlgorithmRequirements;

    /// Initialize agent i's state at iteration 0.
    ///
    /// Paper: "Starting with arbitrary initial variables xᵢ⁰, i ∈ [m]"
    /// Some algorithms have additional initialization requirements
    /// (e.g., yᵢ⁰ = 1, dᵢ⁰ = ∇fᵢ(xᵢ⁰)).
    fn initialize<O: ObjectiveFn<S>>(
        &self,
        agent_id: NodeId,
        x_init: DecisionVector<S>,
        objective: &O,
    ) -> AgentState<S>;

    /// Execute one iteration of the algorithm for agent i.
    ///
    /// This computes xᵢᵏ⁺¹ (and any auxiliary variables) given:
    /// - The agent's current state (xᵢᵏ, etc.)
    /// - Received neighbor states (already weighted by the mixing matrix)
    /// - The agent's local objective function
    /// - The current step size αₖ
    ///
    /// Returns the new agent state at iteration k+1.
    fn step<O: ObjectiveFn<S>>(
        &self,
        agent_id: NodeId,
        state: &AgentState<S>,
        neighbors: &IncomingMessages<S>,
        weight_matrix: &WeightMatrix<S>,
        objective: &O,
        step_size: S,
        iteration: usize,
    ) -> AgentState<S>;

    /// Extract the decision variable from the agent state.
    /// For most algorithms this is just xᵢᵏ, but for gradient-push
    /// it is zᵢᵏ = xᵢᵏ/yᵢᵏ.
    fn decision_variable(&self, state: &AgentState<S>) -> DecisionVector<S>;
}

9.2 Consensus-Based Gradient Method (Eq. 10)

// algorithm/consensus.rs

/// Consensus-based gradient method for undirected graphs.
///
/// Paper Eq. (10):
///   vᵢᵏ = Σⱼ₌₁ᵐ aᵢⱼ xⱼᵏ                    (mixing/consensus step)
///   xᵢᵏ⁺¹ = vᵢᵏ - αₖ ∇fᵢ(vᵢᵏ)               (gradient step)
///
/// Requirements:
///   - Weight matrix A must be doubly stochastic
///   - Graph G must be connected and undirected
///   - Step size: diminishing (Σαₖ = ∞, Σαₖ² < ∞) for convergence
///
/// Convergence rate: O(log k / √k) with diminishing step size.
///
/// Also known as "combine-then-adapt" strategy.
///
/// Variant — "adapt-then-combine" (Eq. 12):
///   vᵢᵏ = xᵢᵏ - αₖ ∇fᵢ(xᵢᵏ)                  (gradient step first)
///   xᵢᵏ⁺¹ = Σⱼ₌₁ᵐ aᵢⱼ vⱼᵏ                    (mixing step after)
#[derive(Clone, Debug)]
pub struct ConsensusGradient {
    /// Which ordering to use.
    pub variant: ConsensusVariant,
}

#[derive(Clone, Debug, Copy, PartialEq, Eq)]
pub enum ConsensusVariant {
    /// Eq. (10): mix then gradient. "Combine-then-adapt".
    CombineThenAdapt,
    /// Eq. (12): gradient then mix. "Adapt-then-combine" / diffusion.
    AdaptThenCombine,
}

9.3 Push–Sum Consensus Method (Eq. 14)

// algorithm/push_sum.rs

/// Push–sum consensus method for directed graphs.
///
/// Paper Eq. (14):
///   xᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ xⱼᵏ
///   yᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ yⱼᵏ
///
/// where A = [aᵢⱼ] is column stochastic (Eq. 13).
/// The ratio zᵢᵏ = xᵢᵏ/yᵢᵏ converges to consensus (ratio consensus).
///
/// Requirements:
///   - Weight matrix A must be column stochastic
///   - Graph G must be strongly connected (directed)
///   - yᵢ⁰ = 1 for all i
///
/// This is NOT a gradient method by itself — it's the consensus mechanism
/// on which the gradient-push method (Eq. 15) is built.
#[derive(Clone, Debug)]
pub struct PushSumConsensus;

9.4 Gradient–Push Method (Eq. 15)

// algorithm/gradient_push.rs

/// Gradient–push method for directed graphs.
///
/// Paper Eq. (15):
///   vᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ xⱼᵏ
///   yᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ yⱼᵏ
///   zᵢᵏ⁺¹ = vᵢᵏ⁺¹ / yᵢᵏ⁺¹
///   xᵢᵏ⁺¹ = vᵢᵏ⁺¹ - αₖ ∇f(zᵢᵏ⁺¹)
///
/// Combines push–sum ratio consensus with a gradient step.
/// The gradient is evaluated at the ratio estimate zᵢᵏ⁺¹, not at xᵢᵏ.
///
/// Requirements:
///   - Weight matrix A must be column stochastic
///   - Graph G must be strongly connected (directed)
///   - yᵢ⁰ = 1 for all i
///   - Step size: diminishing
///
/// Convergence rate: same as consensus gradient — O(log k / √k).
/// Advantage: works on directed graphs without doubly stochastic matrix.
#[derive(Clone, Debug)]
pub struct GradientPush;

9.5 Gradient-Tracking Methods (Eq. 16, 17)

// algorithm/gradient_track.rs

/// Gradient-tracking method for undirected graphs.
///
/// Paper Eq. (16):
///   xᵢᵏ⁺¹ = Σⱼ₌₁ᵐ aᵢⱼ xⱼᵏ - α dᵢᵏ
///   dᵢᵏ⁺¹ = Σⱼ₌₁ᵐ aᵢⱼ dⱼᵏ + ∇fᵢ(xᵢᵏ⁺¹) - ∇fᵢ(xᵢᵏ)
///
/// The direction dᵢ tracks an estimate of the global gradient (1/m)Σ∇fⱼ.
/// Key insight: the sum Σᵢdᵢᵏ⁺¹ = Σᵢ∇fᵢ(xᵢᵏ⁺¹) is preserved across
/// iterations (gradient tracking invariant).
///
/// Requirements:
///   - Weight matrix A must be doubly stochastic
///   - Graph G must be connected and undirected
///   - dᵢ⁰ = ∇fᵢ(xᵢ⁰) for all i (critical initialization)
///   - Step size: constant α (sufficiently small)
///
/// Convergence rate: GEOMETRIC — O(qᵏ) where q ∈ (0,1), matching
/// centralized gradient descent under strong convexity + Lipschitz gradients.
///
/// Paper: "the distributed methods with gradient tracking in (16) and (17)
/// can match the best known convergence rates of centralized gradient methods."
#[derive(Clone, Debug)]
pub struct GradientTracking {
    pub variant: GradientTrackingVariant,
}

#[derive(Clone, Debug, Copy, PartialEq, Eq)]
pub enum GradientTrackingVariant {
    /// Eq. (16): For undirected graphs with doubly stochastic A.
    Undirected,
    /// Eq. (17): For directed graphs with column stochastic A.
    /// Uses the same A for mixing both x and d variables.
    Directed,
}

9.6 APull–BPush Method (Eq. 18, 19)

// algorithm/apull_bpush.rs

/// APull–BPush gradient-tracking method for directed graphs.
///
/// Paper Eq. (18):
///   xᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ (xⱼᵏ - αⱼ dⱼᵏ)     (A-pull)
///   dᵢᵏ⁺¹ = Σ_{ℓ ∈ N_i^{out} ∪ {i}} bₗᵢ dₗᵏ + ∇fᵢ(xᵢᵏ⁺¹) - ∇fᵢ(xᵢᵏ)  (B-push)
///
/// Key innovation: uses TWO different weight matrices:
///   - A = [aᵢⱼ] is row stochastic (Σⱼ aᵢⱼ = 1)
///   - B = [bᵢⱼ] is column stochastic (Σᵢ bᵢⱼ = 1)
///
/// Paper: "The interesting aspect of the APull–BPush method is that the
/// matrix A = [aᵢⱼ] is row stochastic while the matrix B = [bᵢⱼ] is
/// column stochastic."
///
/// Requirements:
///   - Matrix A: row stochastic, compatible with G
///   - Matrix B: column stochastic, compatible with G
///   - Graph G: strongly connected directed graph
///   - Step size: constant (per-agent αᵢ allowed in variant Eq. 19)
///   - dᵢ⁰ = ∇fᵢ(xᵢ⁰)
///
/// Convergence rate: LINEAR (geometric) under strong convexity + Lipschitz.
///
/// Variant Eq. (19) from [33]:
///   xᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ xⱼᵏ - η dᵢᵏ
///   dᵢᵏ⁺¹ = Σ_{ℓ ∈ N_i^{out} ∪ {i}} bₗᵢ (dₗᵏ + ∇fₗ(xₗᵏ⁺¹) - ∇fₗ(xₗᵏ))
#[derive(Clone, Debug)]
pub struct APullBPush {
    pub variant: APullBPushVariant,
}

#[derive(Clone, Debug, Copy, PartialEq, Eq)]
pub enum APullBPushVariant {
    /// Eq. (18): Original APull–BPush.
    Original,
    /// Eq. (19): Variant from [33] with slightly different update order.
    Variant19,
}

10. Solver Orchestration

The solver is the top-level coordinator that ties together graph, weight matrices, agents, objectives, algorithm, and step-size schedule.

10.1 Solver Configuration

// solver/config.rs

/// Complete configuration for a distributed optimization run.
pub struct SolverConfig<S, Alg, Step>
where
    S: Scalar,
    Alg: Algorithm<S>,
    Step: StepSizeSchedule<S>,
{
    /// The algorithm to execute.
    pub algorithm: Alg,
    /// Step-size schedule.
    pub step_size: Step,
    /// Maximum number of iterations.
    pub max_iterations: usize,
    /// Convergence criterion (when to stop early).
    pub convergence: ConvergenceCriterion<S>,
    /// How often to log iteration data (every N iterations).
    pub log_interval: usize,
}

10.2 Distributed Solver

// solver/distributed.rs

/// The main distributed optimization solver.
///
/// Orchestrates the iterative process:
///   for k = 0, 1, 2, ... :
///     1. Each agent computes its local gradient
///     2. Agents exchange messages with neighbors (simulated locally)
///     3. Each agent executes the algorithm's update step
///     4. Check convergence criteria
///
/// This is a SYNCHRONOUS simulation — all agents update simultaneously.
/// For actual distributed deployment, this would be replaced by a
/// message-passing runtime, but the Algorithm trait and AgentState types
/// remain identical.
pub struct DistributedSolver<S, Dir, Alg, Step, O>
where
    S: Scalar,
    Dir: EdgeDirection,
    Alg: Algorithm<S>,
    Step: StepSizeSchedule<S>,
    O: ObjectiveFn<S>,
{
    /// Communication graph G = ([m], E).
    graph: DenseGraph<S, Dir>,
    /// Primary weight matrix A.
    weight_matrix_a: WeightMatrix<S>,
    /// Secondary weight matrix B (only for APull–BPush).
    weight_matrix_b: Option<WeightMatrix<S>>,
    /// Per-agent local objective functions.
    objectives: Vec<O>,
    /// Per-agent current state.
    states: Vec<AgentState<S>>,
    /// Algorithm definition.
    algorithm: Alg,
    /// Step-size schedule.
    step_size: Step,
    /// Convergence monitor.
    monitor: ConvergenceMonitor<S>,
    /// Current iteration counter.
    iteration: usize,
}

impl<S, Dir, Alg, Step, O> DistributedSolver<S, Dir, Alg, Step, O>
where
    S: Scalar,
    Dir: EdgeDirection,
    Alg: Algorithm<S>,
    Step: StepSizeSchedule<S>,
    O: ObjectiveFn<S>,
{
    /// Create and validate a new solver instance.
    ///
    /// Validates:
    ///   - Algorithm requirements vs. graph type (directed/undirected)
    ///   - Weight matrix stochastic type vs. algorithm requirements
    ///   - Weight matrix compatibility with graph structure
    ///   - Number of objectives == number of agents == number of graph nodes
    pub fn new(
        graph: DenseGraph<S, Dir>,
        weight_matrix_a: WeightMatrix<S>,
        weight_matrix_b: Option<WeightMatrix<S>>,
        objectives: Vec<O>,
        initial_points: Vec<DecisionVector<S>>,
        config: SolverConfig<S, Alg, Step>,
    ) -> Result<Self, MlError>;

    /// Run the full optimization loop until convergence or max iterations.
    pub fn solve(&mut self) -> SolverResult<S>;

    /// Execute a single iteration across all agents (synchronous).
    pub fn step(&mut self) -> IterationSnapshot<S>;

    /// Simulate the communication round: collect and distribute messages.
    fn exchange_messages(&self) -> Vec<IncomingMessages<S>>;

    /// Compute the average iterate x̄ᵏ = (1/m) Σᵢ xᵢᵏ.
    /// Paper: "the averaged iterates x̄ᵏ⁺¹ will follow an erroneous
    /// gradient method."
    pub fn average_iterate(&self) -> DecisionVector<S>;

    /// Compute the disagreement: max_i ‖xᵢᵏ - x̄ᵏ‖.
    /// Paper: "the disagreement sequence {‖xᵢᵏ - x̄ᵏ‖} converges to zero."
    pub fn disagreement(&self) -> S;
}

10.3 Solver Result

// solver/distributed.rs

/// The outcome of a completed optimization run.
pub struct SolverResult<S: Scalar> {
    /// The solution estimate (average of final iterates).
    pub solution: DecisionVector<S>,
    /// Per-agent final iterates.
    pub agent_solutions: Vec<DecisionVector<S>>,
    /// Total iterations executed.
    pub iterations: usize,
    /// Whether convergence was achieved.
    pub converged: bool,
    /// Final disagreement metric.
    pub final_disagreement: S,
    /// Final objective value at the solution estimate.
    pub final_objective_value: S,
    /// Iteration history (logged at log_interval).
    pub history: Vec<IterationSnapshot<S>>,
}

/// A snapshot of the system state at a single iteration.
pub struct IterationSnapshot<S: Scalar> {
    pub iteration: usize,
    pub step_size: S,
    /// Average iterate x̄ᵏ.
    pub average_iterate: DecisionVector<S>,
    /// Global objective (1/m)Σfᵢ evaluated at x̄ᵏ.
    pub objective_value: S,
    /// max_i ‖xᵢᵏ - x̄ᵏ‖ (disagreement).
    pub disagreement: S,
    /// ‖∇F(x̄ᵏ)‖ (gradient norm at average).
    pub gradient_norm: S,
}

11. Convergence Monitoring

// convergence/criteria.rs

/// When to stop the optimization loop.
#[derive(Clone, Debug)]
pub enum ConvergenceCriterion<S: Scalar> {
    /// Stop when max_i ‖xᵢᵏ - x̄ᵏ‖ < ε.
    /// Paper: disagreement → 0.
    Disagreement { tolerance: S },

    /// Stop when ‖x̄ᵏ⁺¹ - x̄ᵏ‖ < ε.
    IterateDelta { tolerance: S },

    /// Stop when ‖∇F(x̄ᵏ)‖ < ε where F = (1/m)Σfᵢ.
    GradientNorm { tolerance: S },

    /// Stop when |F(x̄ᵏ) - F(x*)| < ε (requires known optimum).
    OptimalityGap { tolerance: S, optimal_value: S },

    /// Combine multiple criteria with AND semantics.
    All(Vec<ConvergenceCriterion<S>>),

    /// Combine multiple criteria with OR semantics.
    Any(Vec<ConvergenceCriterion<S>>),
}
// convergence/metrics.rs

/// Metrics for analyzing distributed optimization behavior.
pub struct ConvergenceMetrics<S: Scalar> {
    /// Per-agent disagreement ‖xᵢᵏ - x̄ᵏ‖ over time.
    pub disagreement_history: Vec<Vec<S>>,
    /// Objective value F(x̄ᵏ) over time.
    pub objective_history: Vec<S>,
    /// Estimated convergence rate (e.g., slope of log(F(x̄ᵏ) - F*) vs k).
    pub estimated_rate: Option<S>,
}

12. Error Types

// errors/mod.rs

use thiserror::Error;

#[derive(Error, Debug, Clone, PartialEq)]
pub enum MlError {
    // ── Configuration errors ───────────────────────────────────────
    #[error("Agent count mismatch: graph has {graph_nodes} nodes but {provided} objectives provided")]
    AgentCountMismatch { graph_nodes: usize, provided: usize },

    #[error("Dimension mismatch: expected vectors of dimension {expected}, got {actual}")]
    DimensionMismatch { expected: usize, actual: usize },

    #[error("Algorithm '{algorithm}' requires {required:?} matrix but got {actual:?}")]
    StochasticitMismatch {
        algorithm: &'static str,
        required: StochasticType,
        actual: StochasticType,
    },

    #[error("Algorithm '{algorithm}' requires undirected graph")]
    DirectedGraphNotSupported { algorithm: &'static str },

    #[error("Algorithm '{algorithm}' requires a connected graph but graph has {components} components")]
    DisconnectedGraph {
        algorithm: &'static str,
        components: usize,
    },

    // ── Weight matrix errors ───────────────────────────────────────
    #[error("Weight matrix entry ({i}, {j}) = {value} is incompatible with graph: no edge exists")]
    IncompatibleWeight { i: usize, j: usize, value: f64 },

    #[error("Row {row} sums to {sum}, expected 1.0 (tolerance {tol})")]
    RowSumViolation { row: usize, sum: f64, tol: f64 },

    #[error("Column {col} sums to {sum}, expected 1.0 (tolerance {tol})")]
    ColumnSumViolation { col: usize, sum: f64, tol: f64 },

    #[error("Negative weight at ({i}, {j}): {value}")]
    NegativeWeight { i: usize, j: usize, value: f64 },

    // ── Step size errors ───────────────────────────────────────────
    #[error("Step size must be positive, got {value}")]
    NonPositiveStepSize { value: f64 },

    #[error("Diminishing exponent b must be in ((1/2), 1], got {value}")]
    InvalidDiminishingExponent { value: f64 },

    // ── Convergence errors ─────────────────────────────────────────
    #[error("Solver did not converge after {iterations} iterations (disagreement: {disagreement})")]
    ConvergenceFailure {
        iterations: usize,
        disagreement: f64,
    },

    #[error("Numerical instability detected at iteration {iteration}: {description}")]
    NumericalInstability {
        iteration: usize,
        description: String,
    },

    // ── Objective function errors ──────────────────────────────────
    #[error("Empty dataset for agent {agent}")]
    EmptyDataset { agent: usize },

    #[error("Regularization coefficient must be positive, got {value}")]
    NonPositiveRegularization { value: f64 },

    // ── Graph errors (propagated) ──────────────────────────────────
    #[error("Graph error: {0}")]
    GraphError(#[from] convective_graph::errors::GraphError),
}

13. Algorithm–Requirements Matrix

Summary of what each algorithm needs, for validation at solver construction:

Algorithm Eq. Graph Matrix A Matrix B Step Size dᵢ⁰ yᵢ⁰ Rate
ConsensusGradient (CTA) (10) Undirected, connected Doubly stochastic Diminishing O(log k / √k)
ConsensusGradient (ATC) (12) Undirected, connected Doubly stochastic Diminishing O(log k / √k)
PushSumConsensus (14) Directed, strongly connected Column stochastic 1 (consensus only)
GradientPush (15) Directed, strongly connected Column stochastic Diminishing 1 O(log k / √k)
GradientTracking (undir) (16) Undirected, connected Doubly stochastic Constant ∇fᵢ(xᵢ⁰) O(qᵏ) geometric
GradientTracking (dir) (17) Directed, strongly connected Column stochastic Constant ∇fᵢ(xᵢ⁰) O(qᵏ) geometric
APullBPush (original) (18) Directed, strongly connected Row stochastic Column stochastic Constant ∇fᵢ(xᵢ⁰) Linear
APullBPush (variant) (19) Directed, strongly connected Row stochastic Column stochastic Constant (per-agent) ∇fᵢ(xᵢ⁰) Linear

14. Communication Cost Summary

Per iteration, per edge in the graph:

Algorithm Vectors (size n) Scalars Total per edge
Consensus Gradient (10) 1 (x) 0 n
Adapt-then-Combine (12) 1 (v) 0 n
Push-Sum (14) 1 (x) 1 (y) n + 1
Gradient-Push (15) 1 (x) 1 (y) n + 1
Gradient-Tracking (16) 2 (x, d) 0 2n
Gradient-Tracking directed (17) 2 (x, d) 1 (y) 2n + 1
APull-BPush (18) 2 (x-αd, d) 0 2n
Variant (19) 2 (x, d) 0 2n

15. Dependency Graph (Implementation Order)

Phase 1: convective-graph additions (PREREQUISITE)
  ├── [G1] NeighborQuery trait + DenseStorage impl
  ├── [G2] DegreeComputable trait + impl
  ├── [G3] LaplacianComputable trait + impl
  └── [G4] SpectralComputable trait + impl (spectral gap)

Phase 2: convective-ml foundation
  ├── [M1] types/vector.rs (DecisionVector, GradientVector, DirectionVector)
  ├── [M2] errors/mod.rs (MlError)
  ├── [M3] objective/traits.rs (ObjectiveFn, LossFunction, Regularizer)
  ├── [M4] objective/loss.rs (LogisticLoss, HingeLoss, SquaredLoss)
  ├── [M5] objective/regularizer.rs (L2Regularizer)
  └── [M6] objective/local.rs (LocalObjective)

Phase 3: convective-ml mixing layer
  ├── [M7] mixing/stochastic.rs (StochasticType)
  ├── [M8] mixing/traits.rs (WeightMatrix, MixingStrategy)
  └── [M9] mixing/builder.rs (EqualNeighborWeight, MetropolisHastings, etc.)
       └── depends on [G1] for neighbor queries

Phase 4: convective-ml agent + step size
  ├── [M10] agent/state.rs (AgentState enum)
  ├── [M11] agent/neighborhood.rs (AgentMessage, IncomingMessages)
  └── [M12] step_size/schedule.rs (Diminishing, Constant, PerAgent)

Phase 5: convective-ml algorithms
  ├── [M13] algorithm/traits.rs (Algorithm trait, AlgorithmRequirements)
  ├── [M14] algorithm/consensus.rs (ConsensusGradient — simplest, implement first)
  ├── [M15] algorithm/push_sum.rs (PushSumConsensus)
  ├── [M16] algorithm/gradient_push.rs (GradientPush)
  ├── [M17] algorithm/gradient_track.rs (GradientTracking)
  └── [M18] algorithm/apull_bpush.rs (APullBPush)

Phase 6: convective-ml solver + convergence
  ├── [M19] convergence/criteria.rs (ConvergenceCriterion)
  ├── [M20] convergence/metrics.rs (ConvergenceMetrics)
  ├── [M21] solver/config.rs (SolverConfig)
  ├── [M22] solver/distributed.rs (DistributedSolver)
  └── [M23] solver/monitor.rs (ConvergenceMonitor)

16. Relation to convective-graph's Existing Design

The design intentionally preserves convective-graph's core principles:

Immutable-after-build graphs: The communication graph G is static throughout the optimization (the paper considers static graphs). The DenseGraph is constructed once and passed immutably to the solver. This aligns perfectly with the existing design.

Matrix-first philosophy: Weight matrices A and B are DMatrix<S> wrappers. The spectral gap of A (an eigenvalue computation) is a core convergence diagnostic. The adjacency matrix is directly used to construct weight matrices. All of this leverages the existing nalgebra integration.

Scalar generics: All types are generic over S: Scalar, inheriting the existing f64/f32 support. Convergence rate constants, step sizes, and tolerances are all S-typed.

NodeId as AgentId: The paper's agent indices i ∈ [m] map directly to NodeId. No new identity type needed — just a type alias.

Graph is not mutated by ML: convective-ml depends on convective-graph read-only. It never adds edges, removes nodes, or modifies the graph structure. This is a clean dependency direction.


17. Future Extensions (Beyond the Paper)

The paper explicitly identifies these open directions, which the architecture should accommodate:

  1. Time-varying graphs: Graph G changes between iterations. The solver could accept a GraphSequence or Fn(k) -> Graph instead of a fixed graph. The Algorithm trait's step method already takes the weight matrix as a parameter, making this straightforward.

  2. Stochastic gradients: Replace ∇fᵢ(x) with a stochastic estimate using a mini-batch. The ObjectiveFn trait could gain a stochastic_gradient(x, rng) method.

  3. Asynchronous updates: Agents don't wait for all others. This requires replacing the synchronous solve() loop with an event-driven runtime (tokio tasks). The AgentState and Algorithm types remain identical.

  4. Quantized communication: Compress messages to reduce bandwidth. An AgentMessageCompressedMessage transformation layer.

  5. Constrained optimization: Intersect local constraint sets Xᵢ. Add a projection step after the gradient update: xᵢᵏ⁺¹ = Π_{Xᵢ}(update).

  6. Nonconvex objectives: The paper focuses on convex fᵢ. Extensions to nonconvex problems (e.g., neural networks) would require different convergence criteria and possibly different algorithms.

About

Distributed Convex Methods for Financial Machine Learning

Resources

License

Stars

Watchers

Forks

Contributors