Reference: Nedić, "Distributed Gradient Methods for Convex Machine Learning Problems in Networks" (IEEE SPM, 2020)
The paper presents a unified framework for distributed first-order optimization over networks. The core problem is:
min_{x ∈ ℝⁿ} (1/m) Σᵢ₌₁ᵐ fᵢ(x)
where m agents are embedded in a communication graph G = ([m], E), each agent i owns a private local objective fᵢ: ℝⁿ → ℝ, and no agent has access to the global objective or to any other agent's data. The algorithms rely exclusively on local gradient computation and one-hop neighbor communication.
The paper covers six distinct algorithms across two graph classes (undirected and directed), organized in three families:
| Family | Undirected Graph | Directed Graph |
|---|---|---|
| Consensus-based gradient | Eq. (10): Consensus + gradient | Eq. (14): Push–sum consensus |
| Gradient–push | — | Eq. (15): Ratio consensus + gradient |
| Gradient-tracking | Eq. (16): EXTRA / DIGing style | Eq. (17): Directed gradient-tracking |
| APull–BPush | — | Eq. (18): Dual-matrix gradient-tracking |
Plus variant Eq. (19) from [33] and the connection to Eq. (12) (adapt-then-combine variant of (10)).
Each algorithm has specific requirements on the weight matrix (row stochastic, column stochastic, doubly stochastic), step-size schedule (diminishing vs. constant), and convergence rate guarantees.
The implementation spans two crates with clear responsibilities:
┌─────────────────────────────────────────────────────────────────────────┐
│ convective-ml │
│ │
│ ┌──────────────┐ ┌───────────────────┐ ┌──────────────────────────┐ │
│ │ objective │ │ agent │ │ solver │ │
│ │ │ │ │ │ │ │
│ │ ObjectiveFn │ │ Agent<S,O,Alg> │ │ DistributedSolver │ │
│ │ LossFunction│ │ AgentState<S> │ │ SolverConfig │ │
│ │ Regularizer │ │ Neighborhood │ │ StepSizeSchedule │ │
│ │ LocalObj<S> │ │ │ │ ConvergenceMonitor │ │
│ └──────────────┘ └───────────────────┘ └──────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌───────────────────┐ ┌──────────────────────────┐ │
│ │ algorithm │ │ mixing │ │ convergence │ │
│ │ │ │ │ │ │ │
│ │ Algorithm │ │ MixingStrategy │ │ ConvergenceCriterion │ │
│ │ Consensus │ │ WeightMatrix │ │ IterationLog │ │
│ │ PushSum │ │ StochasticType │ │ DisagreementMetric │ │
│ │ GradPush │ │ MatrixBuilder │ │ │ │
│ │ GradTrack │ │ │ │ │ │
│ │ APullBPush │ │ │ │ │ │
│ └──────────────┘ └───────────────────┘ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ types + errors │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
├─────────────────────────────────────────────────────────────────────────┤
│ convective-graph │
│ │
│ ┌─────────────────┐ ┌────────────────────┐ ┌────────────────────┐ │
│ │ traits::alge │ │ traits::spectral │ │ traits::neighbor │ │
│ │ braic │ │ │ │ │ │
│ │ │ │ SpectralComputable│ │ NeighborQuery │ │
│ │ DegreeComput. │ │ spectral_gap() │ │ neighbors(i) │ │
│ │ LaplacianComp. │ │ fiedler_vector() │ │ in_neighbors(i) │ │
│ │ MatrixRepr. │ │ │ │ out_neighbors(i) │ │
│ └─────────────────┘ └────────────────────┘ └────────────────────┘ │
│ │
│ ┌─────────────────┐ ┌────────────────────┐ ┌────────────────────┐ │
│ │ Graph<S,Dir> │ │ DenseStorage<S> │ │ GraphBuilder │ │
│ │ GraphBase │ │ (future: Sparse) │ │ TOML I/O │ │
│ │ NodeIterable │ │ │ │ │ │
│ │ EdgeIterable │ │ │ │ │ │
│ └─────────────────┘ └────────────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Dependency direction: convective-ml depends on convective-graph, never the reverse.
Before any convective-ml code can be written, the following must exist in convective-graph. These are the Phase 2 items directly required by the Nedić algorithms.
Every algorithm in the paper requires agents to know their one-hop neighbors. The current crate has no per-node neighbor query.
// convective-graph/src/traits/neighbor.rs
/// Neighbor query capabilities for graph types.
pub trait NeighborQuery: GraphBase {
/// Returns the set of neighbors of node `i` in an undirected graph.
/// For directed graphs, this returns out-neighbors.
fn neighbors(&self, node: NodeId) -> Vec<NodeId>;
/// Returns the in-neighbors: { j | (j, i) ∈ E }.
/// Paper notation: N_i^{in} = { j | (j, i) ∈ E }
fn in_neighbors(&self, node: NodeId) -> Vec<NodeId>;
/// Returns the out-neighbors: { j | (i, j) ∈ E }.
/// Paper notation: N_i^{out} = { j | (i, j) ∈ E }
fn out_neighbors(&self, node: NodeId) -> Vec<NodeId>;
/// Returns the degree (number of neighbors) for undirected graphs,
/// or out-degree for directed graphs.
fn degree(&self, node: NodeId) -> usize;
/// Returns the in-degree of a node.
fn in_degree(&self, node: NodeId) -> usize;
/// Returns the out-degree of a node.
fn out_degree(&self, node: NodeId) -> usize;
}Required for spectral analysis of mixing matrices and convergence rate characterization.
// convective-graph/src/traits/algebraic.rs
use nalgebra::{DMatrix, DVector};
/// Degree computation from the adjacency matrix.
pub trait DegreeComputable: GraphBase {
/// Returns the degree vector d where d_i = Σ_j A[i,j].
fn degree_vector(&self) -> DVector<Self::S>;
/// Returns the diagonal degree matrix D = diag(d).
fn degree_matrix(&self) -> DMatrix<Self::S>;
/// Returns the in-degree vector for directed graphs.
fn in_degree_vector(&self) -> DVector<Self::S>;
/// Returns the out-degree vector for directed graphs.
fn out_degree_vector(&self) -> DVector<Self::S>;
}
/// Laplacian matrix computations.
pub trait LaplacianComputable: DegreeComputable {
/// Combinatorial Laplacian: L = D - A
fn laplacian(&self) -> DMatrix<Self::S>;
/// Symmetric normalized Laplacian: L_sym = I - D^{-1/2} A D^{-1/2}
/// Used in spectral graph theory and GCN-style methods.
fn normalized_laplacian(&self) -> DMatrix<Self::S>;
/// Random-walk Laplacian: L_rw = I - D^{-1} A
fn random_walk_laplacian(&self) -> DMatrix<Self::S>;
}Required for convergence analysis — the spectral gap λ₂ of the mixing matrix determines the convergence rate of all consensus-based methods.
// convective-graph/src/traits/spectral.rs
use nalgebra::{DMatrix, DVector};
/// Spectral properties of graph matrices.
pub trait SpectralComputable: LaplacianComputable {
/// Eigenvalues of the Laplacian, sorted ascending.
fn laplacian_eigenvalues(&self) -> DVector<Self::S>;
/// The algebraic connectivity (second-smallest Laplacian eigenvalue λ₂).
/// Controls convergence rate of consensus: larger λ₂ → faster mixing.
fn algebraic_connectivity(&self) -> Self::S;
/// The Fiedler vector (eigenvector corresponding to λ₂).
/// Used for graph partitioning.
fn fiedler_vector(&self) -> DVector<Self::S>;
/// Spectral gap of a given stochastic matrix: 1 - |λ₂(A)|.
/// Directly controls convergence rate of distributed methods.
fn spectral_gap(&self, matrix: &DMatrix<Self::S>) -> Self::S;
}convective-ml/
├── Cargo.toml
├── src/
│ ├── lib.rs
│ ├── types/
│ │ ├── mod.rs
│ │ └── vector.rs # DecisionVector<S>, gradient vectors
│ ├── objective/
│ │ ├── mod.rs
│ │ ├── traits.rs # ObjectiveFn, LossFunction, Regularizer
│ │ ├── loss.rs # LogisticLoss, HingeLoss, SquaredLoss
│ │ ├── regularizer.rs # L2Regularizer, L1Regularizer
│ │ └── local.rs # LocalObjective<S> combining loss + regularizer
│ ├── mixing/
│ │ ├── mod.rs
│ │ ├── traits.rs # MixingStrategy, WeightMatrix
│ │ ├── stochastic.rs # StochasticType, stochasticity validation
│ │ ├── builder.rs # WeightMatrixBuilder (from graph topology)
│ │ └── properties.rs # spectral gap, mixing time analysis
│ ├── agent/
│ │ ├── mod.rs
│ │ ├── state.rs # AgentState<S>, dual variables (x, y, d, z)
│ │ └── neighborhood.rs # Neighborhood view, message types
│ ├── step_size/
│ │ ├── mod.rs
│ │ └── schedule.rs # StepSizeSchedule, Diminishing, Constant
│ ├── algorithm/
│ │ ├── mod.rs
│ │ ├── traits.rs # Algorithm, AlgorithmRequirements
│ │ ├── consensus.rs # ConsensusGradient (Eq. 10, 12)
│ │ ├── push_sum.rs # PushSumConsensus (Eq. 14)
│ │ ├── gradient_push.rs # GradientPush (Eq. 15)
│ │ ├── gradient_track.rs # GradientTracking (Eq. 16, 17)
│ │ └── apull_bpush.rs # APullBPush (Eq. 18, 19)
│ ├── solver/
│ │ ├── mod.rs
│ │ ├── config.rs # SolverConfig
│ │ ├── distributed.rs # DistributedSolver orchestration
│ │ └── monitor.rs # ConvergenceMonitor, IterationLog
│ ├── convergence/
│ │ ├── mod.rs
│ │ ├── criteria.rs # ConvergenceCriterion enum
│ │ └── metrics.rs # Disagreement, OptimalityGap, etc.
│ └── errors/
│ └── mod.rs # MlError enum
The paper operates on decision vectors xᵢ ∈ ℝⁿ. We wrap nalgebra::DVector<S> to provide domain-specific semantics.
// types/vector.rs
use convective_graph::types::Scalar;
use nalgebra::DVector;
/// A decision vector in ℝⁿ, the primary variable each agent maintains.
/// Paper notation: xᵢᵏ ∈ ℝⁿ — agent i's iterate at iteration k.
#[derive(Clone, Debug)]
pub struct DecisionVector<S: Scalar> {
inner: DVector<S>,
}
/// A gradient vector ∇fᵢ(xᵢᵏ) ∈ ℝⁿ.
/// Semantically distinct from DecisionVector to prevent accidental mixing.
#[derive(Clone, Debug)]
pub struct GradientVector<S: Scalar> {
inner: DVector<S>,
}
/// A direction vector dᵢᵏ ∈ ℝⁿ used in gradient-tracking algorithms.
/// Tracks an estimate of the global gradient direction.
/// Paper: dᵢ⁰ = ∇fᵢ(xᵢ⁰), then updated via mixing + gradient corrections.
#[derive(Clone, Debug)]
pub struct DirectionVector<S: Scalar> {
inner: DVector<S>,
}All three types impl Deref<Target = DVector<S>> for transparent nalgebra interop, plus arithmetic ops (Add, Sub, Mul<S>) and convex_combination(weights, vectors).
// agent/mod.rs
use convective_graph::types::NodeId;
/// An agent's identity in the network, directly mapped to a graph node.
/// Paper notation: agent i ∈ [m] = {1, 2, ..., m}.
/// Implementation: zero-indexed via NodeId.
pub type AgentId = NodeId;The paper defines the local objective as fᵢ(x) = (m/p) Σ_{s ∈ Sᵢ} (cρ(x) + ℓ(x; zₛ, yₛ)) (Eq. 5), which decomposes into a regularization term and a sum of per-sample losses.
// objective/traits.rs
use crate::types::{DecisionVector, GradientVector};
use convective_graph::types::Scalar;
/// A differentiable scalar-valued function f: ℝⁿ → ℝ.
/// Root trait for all objective functions in the optimization framework.
pub trait ObjectiveFn<S: Scalar> {
/// Evaluate f(x).
fn value(&self, x: &DecisionVector<S>) -> S;
/// Compute ∇f(x).
fn gradient(&self, x: &DecisionVector<S>) -> GradientVector<S>;
/// Dimension of the domain ℝⁿ.
fn dimension(&self) -> usize;
}
/// A loss function ℓ(x; z, y) parameterized by a data point (z, y).
/// Paper: ℓ(x; z, y) where z ∈ ℝⁿ is a feature, y ∈ {-1, +1} is a label.
pub trait LossFunction<S: Scalar> {
/// Evaluate ℓ(x; z, y).
fn loss(&self, x: &DecisionVector<S>, feature: &DVector<S>, label: S) -> S;
/// Compute ∇_x ℓ(x; z, y).
fn loss_gradient(
&self,
x: &DecisionVector<S>,
feature: &DVector<S>,
label: S,
) -> GradientVector<S>;
}
/// A convex regularization function ρ: ℝⁿ → ℝ.
/// Paper: ρ(x) is strongly convex (e.g., Euclidean norm squared).
pub trait Regularizer<S: Scalar> {
/// Evaluate cρ(x) where c > 0 is the regularization parameter.
fn penalty(&self, x: &DecisionVector<S>) -> S;
/// Compute c∇ρ(x).
fn penalty_gradient(&self, x: &DecisionVector<S>) -> GradientVector<S>;
/// The strong convexity parameter of ρ.
/// Used in convergence rate analysis.
fn strong_convexity_param(&self) -> S;
}
/// Lipschitz continuity certificate.
/// Paper: f has Lipschitz continuous gradients with constant L if
/// ‖∇f(x) - ∇f(y)‖ ≤ L‖x - y‖ for all x, y ∈ ℝⁿ.
pub trait LipschitzContinuous<S: Scalar> {
/// The Lipschitz constant L of the gradient.
fn lipschitz_constant(&self) -> S;
}// objective/loss.rs
/// Logistic regression loss: ℓ(x; z, y) = log(1 + e^{-y⟨x,z⟩}).
/// Paper: "For linear classifiers, a common choice is the logistic regression
/// loss function."
pub struct LogisticLoss;
/// Hinge loss: ℓ(x; z, y) = max{0, 1 - y⟨x, z⟩}.
/// Paper: "Another common choice is the hinge-loss function,
/// giving rise to the maximum margin (linear) classifier problem."
/// Note: Not differentiable at 1 - y⟨x,z⟩ = 0. Requires subgradient.
pub struct HingeLoss;
/// Squared error loss: ℓ(x; z, y) = (1/2)(y - ⟨x, z⟩)².
/// Standard regression loss, included for completeness.
pub struct SquaredLoss;// objective/regularizer.rs
/// L2 (Tikhonov) regularization: ρ(x) = (1/2)‖x‖².
/// ∇ρ(x) = x. Strongly convex with parameter 1.
/// Paper: "the regularizing function ρ(x) is a strongly convex function
/// (such as the Euclidean norm)."
pub struct L2Regularizer<S: Scalar> {
/// Regularization coefficient c > 0.
pub coefficient: S,
}
/// L1 regularization: ρ(x) = ‖x‖₁.
/// Promotes sparsity. Convex but not strongly convex, not differentiable at 0.
/// Requires proximal operator or subgradient methods.
pub struct L1Regularizer<S: Scalar> {
pub coefficient: S,
}// objective/local.rs
/// The local objective function for agent i.
///
/// Paper Eq. (5): fᵢ(x) = (m/p) Σ_{s ∈ Sᵢ} (cρ(x) + ℓ(x; zₛ, yₛ))
///
/// Each agent owns a private dataset Sᵢ of (feature, label) pairs and
/// a loss function. The regularizer is shared across agents (same c, ρ).
///
/// This struct implements ObjectiveFn<S>, so it can compute fᵢ(x) and ∇fᵢ(x).
pub struct LocalObjective<S, L, R>
where
S: Scalar,
L: LossFunction<S>,
R: Regularizer<S>,
{
/// The loss function (shared type, e.g., LogisticLoss).
loss: L,
/// The regularizer (e.g., L2Regularizer).
regularizer: R,
/// Agent's private dataset: Vec of (feature_vector, label).
data: Vec<DataPoint<S>>,
/// Total number of data points across ALL agents (p in the paper).
total_data_count: usize,
/// Number of agents (m in the paper).
agent_count: usize,
}
/// A single labeled data point (zₛ, yₛ).
pub struct DataPoint<S: Scalar> {
/// Feature vector z ∈ ℝⁿ.
pub feature: DVector<S>,
/// Label y (typically ∈ {-1, +1} for classification).
pub label: S,
}LocalObjective implements ObjectiveFn<S> with:
value(x)=(m/p) * Σ_{s ∈ Sᵢ} [c*ρ(x) + ℓ(x; zₛ, yₛ)]gradient(x)=(m/p) * Σ_{s ∈ Sᵢ} [c*∇ρ(x) + ∇ₓℓ(x; zₛ, yₛ)]
This is the bridge between convective-graph and convective-ml. The paper's algorithms require weight matrices A (and sometimes B) that are compatible with the graph structure and satisfy specific stochasticity properties.
// mixing/stochastic.rs
/// Classification of a nonnegative matrix's stochastic properties.
/// Paper: different algorithms require different stochastic structures.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum StochasticType {
/// Each row sums to 1: Σⱼ aᵢⱼ = 1 for all i.
/// Required by: Eq. (18) A-matrix (APull–BPush).
RowStochastic,
/// Each column sums to 1: Σᵢ aᵢⱼ = 1 for all j.
/// Required by: Eq. (13) for directed graphs, Eq. (17) A-matrix,
/// Eq. (18) B-matrix (APull–BPush).
ColumnStochastic,
/// Both row and column stochastic: Σⱼ aᵢⱼ = 1 AND Σᵢ aᵢⱼ = 1.
/// Required by: Eq. (8)/(10) for undirected graphs, Eq. (16).
DoublyStochastic,
}// mixing/traits.rs
use convective_graph::graph::DenseGraph;
use convective_graph::types::{Scalar, EdgeDirection, NodeId};
use nalgebra::DMatrix;
/// A weight matrix A compatible with graph G's structure.
///
/// Paper Eq. (8): aᵢⱼ > 0 iff {i,j} ∈ E or i = j, aᵢⱼ = 0 otherwise.
/// The matrix is nonneg, entries respect graph topology, and satisfies
/// a specific stochastic property depending on the algorithm.
pub struct WeightMatrix<S: Scalar> {
/// The m×m weight matrix.
matrix: DMatrix<S>,
/// Which stochastic property this matrix satisfies.
stochastic_type: StochasticType,
/// The spectral gap: 1 - |λ₂(A)|. Larger → faster mixing.
spectral_gap: Option<S>,
}
/// Trait for types that can produce a weight matrix from a graph.
///
/// Different algorithms need different stochastic types, so the builder
/// is parameterized by the target property.
pub trait MixingStrategy<S: Scalar> {
/// Build a weight matrix compatible with graph G.
fn build_weight_matrix<Dir: EdgeDirection>(
&self,
graph: &DenseGraph<S, Dir>,
target: StochasticType,
) -> Result<WeightMatrix<S>, MlError>;
}The paper describes several construction strategies. Each becomes a struct implementing MixingStrategy.
// mixing/builder.rs
/// Equal-weight (Metropolis-like) rule.
/// Paper: "A common choice is to let all of the values aᵢⱼ, i ∈ N_j^{out} ∪ {j},
/// be the same and equal to the cardinality of the set N_j^{out} ∪ {j}."
///
/// For undirected: aᵢⱼ = 1/(|Nᵢ| + 1) for j ∈ Nᵢ ∪ {i}
/// Produces doubly stochastic matrix on regular graphs,
/// column stochastic on directed (equal out-weight).
pub struct EqualNeighborWeight;
/// Metropolis–Hastings weight rule.
/// Produces doubly stochastic matrices for undirected graphs.
/// aᵢⱼ = 1 / (1 + max(dᵢ, dⱼ)) for {i,j} ∈ E
/// aᵢᵢ = 1 - Σ_{j≠i} aᵢⱼ
pub struct MetropolisHastings;
/// Maximum-degree weight rule.
/// aᵢⱼ = 1/d_max for j ∈ Nᵢ, aᵢᵢ = 1 - dᵢ/d_max.
/// Doubly stochastic for undirected graphs.
pub struct MaxDegreeWeight;
/// Lazy random walk weights (for column stochastic on directed graphs).
/// Used in push-sum methods.
/// aᵢⱼ = 1 / (|N_j^{out}| + 1) for i ∈ N_j^{out} ∪ {j}
pub struct LazyColumnStochastic;impl<S: Scalar> WeightMatrix<S> {
/// Validate that A is compatible with graph G's structure.
/// Checks: aᵢⱼ > 0 only if {i,j} ∈ E or i = j.
pub fn validate_compatibility<Dir: EdgeDirection>(
&self,
graph: &DenseGraph<S, Dir>,
) -> Result<(), MlError>;
/// Validate the stochastic property.
pub fn validate_stochasticity(&self, tolerance: S) -> Result<StochasticType, MlError>;
/// Compute spectral gap: 1 - |λ₂(A)|.
/// Paper: controls convergence rate of all consensus-based methods.
pub fn compute_spectral_gap(&mut self) -> S;
/// Check that all entries are nonneg and positive where required.
pub fn validate_positivity(&self) -> Result<(), MlError>;
/// Access the raw matrix for use in algorithm update steps.
pub fn as_matrix(&self) -> &DMatrix<S>;
/// Get aᵢⱼ — the weight agent i assigns to agent j.
pub fn weight(&self, i: AgentId, j: AgentId) -> S;
}Each agent maintains iteration-specific state. The shape of this state varies by algorithm:
// agent/state.rs
/// State maintained by a single agent during distributed optimization.
///
/// Different algorithms require different auxiliary variables:
///
/// Consensus gradient (10): xᵢᵏ only
/// Push-sum (14): xᵢᵏ, yᵢᵏ (scalar normalizer)
/// Gradient-push (15): xᵢᵏ, yᵢᵏ, zᵢᵏ = xᵢᵏ/yᵢᵏ
/// Gradient-tracking (16,17): xᵢᵏ, dᵢᵏ (direction tracker)
/// APull-BPush (18): xᵢᵏ, dᵢᵏ (with dual matrices A, B)
///
/// Rather than one monolithic state, we use an enum that carries
/// exactly what each algorithm needs.
#[derive(Clone, Debug)]
pub enum AgentState<S: Scalar> {
/// For consensus-based gradient method (Eq. 10, 12).
Consensus {
/// Decision variable xᵢᵏ ∈ ℝⁿ.
x: DecisionVector<S>,
},
/// For push-sum consensus on directed graphs (Eq. 14).
PushSum {
/// Decision variable xᵢᵏ ∈ ℝⁿ.
x: DecisionVector<S>,
/// Scalar normalizer yᵢᵏ ∈ ℝ, initialized to 1.
y: S,
},
/// For gradient-push method (Eq. 15).
GradientPush {
/// Decision variable xᵢᵏ ∈ ℝⁿ.
x: DecisionVector<S>,
/// Scalar normalizer yᵢᵏ ∈ ℝ, initialized to 1.
y: S,
/// Ratio variable zᵢᵏ = xᵢᵏ/yᵢᵏ — the actual estimate.
z: DecisionVector<S>,
},
/// For gradient-tracking methods (Eq. 16, 17, 18, 19).
GradientTracking {
/// Decision variable xᵢᵏ ∈ ℝⁿ.
x: DecisionVector<S>,
/// Direction tracker dᵢᵏ ∈ ℝⁿ, initialized to ∇fᵢ(xᵢ⁰).
d: DirectionVector<S>,
},
}Agents exchange specific data with neighbors depending on the algorithm.
// agent/neighborhood.rs
/// The data an agent sends to its neighbors in a single communication round.
///
/// Paper: "every agent i sends its current iterate xᵢᵏ to its neighbors
/// j ∈ Nᵢ and receives xⱼᵏ from its neighbors j ∈ Nᵢ."
#[derive(Clone, Debug)]
pub enum AgentMessage<S: Scalar> {
/// Consensus methods: send xᵢᵏ.
Decision(DecisionVector<S>),
/// Push-sum methods: send aⱼᵢ·xᵢᵏ and aⱼᵢ·yᵢᵏ.
/// Paper Eq. (14): "each agent j sends aⱼᵢxⱼᵏ and aⱼᵢyⱼᵏ to all of
/// its out-neighbors i ∈ N_j^{out}."
PushSumPair {
weighted_x: DecisionVector<S>,
weighted_y: S,
},
/// Gradient-tracking: send xᵢᵏ and dᵢᵏ.
/// Paper Eq. (16): agents exchange "x and d vectors with their neighbors."
DecisionAndDirection {
x: DecisionVector<S>,
d: DirectionVector<S>,
},
/// APull-BPush: pull step sends (xⱼᵏ - αⱼdⱼᵏ), push step sends bₗᵢdᵢᵏ.
/// Paper Eq. (18): "each i agent obtains xⱼᵏ - αⱼdⱼᵏ from its in-neighbors
/// j ∈ N_i^{in}, and sends bₗᵢdᵢᵏ to each of its out-neighbors ℓ ∈ N_i^{out}."
APullBPush {
/// For A-pull: xⱼ - αⱼdⱼ (pre-combined by sender).
pull_value: DecisionVector<S>,
/// For B-push: bₗᵢdᵢ (weighted direction).
push_direction: DirectionVector<S>,
},
}
/// The collected messages an agent has received from its neighborhood.
pub struct IncomingMessages<S: Scalar> {
/// Messages indexed by sender AgentId.
pub messages: Vec<(AgentId, AgentMessage<S>)>,
}The paper treats step-size selection as a first-class concern affecting convergence.
// step_size/schedule.rs
/// Step-size schedule producing αₖ > 0 at each iteration k.
///
/// Paper: "some positive step size αₖ" with convergence requiring
/// Σ αₖ = ∞ and Σ αₖ² < ∞ for diminishing schedules, or
/// constant α for gradient-tracking methods.
pub trait StepSizeSchedule<S: Scalar>: Clone {
/// Returns αₖ for iteration k (0-indexed).
fn step_size(&self, k: usize) -> S;
/// Whether this schedule satisfies Σ αₖ = ∞ (necessary for convergence
/// of consensus-based methods with diminishing step size).
fn is_divergent_sum(&self) -> bool;
/// Whether this schedule satisfies Σ αₖ² < ∞ (necessary for convergence
/// of consensus-based methods with diminishing step size).
fn is_square_summable(&self) -> bool;
}
/// Diminishing step size: αₖ = a / k^b where a > 0, b ∈ ((1/2), 1].
///
/// Paper: "These conditions are satisfied, for example, with step size
/// αₖ = 1/k, or more generally with the step size of the form
/// αₖ = a/k^b, where a > 0 is arbitrary while b ∈ ((1/2), 1]."
///
/// Convergence rate: O(log k / √k) for consensus-based methods.
#[derive(Clone, Debug)]
pub struct Diminishing<S: Scalar> {
/// Numerator coefficient a > 0.
pub a: S,
/// Exponent b ∈ ((1/2), 1].
pub b: S,
}
/// Constant step size: αₖ = α for all k.
///
/// Paper: "under some additional assumptions on the objective function,
/// the gradient method with a suitably chosen constant step size α > 0
/// produces a sequence {xᵏ} converging to the solution."
///
/// Used by gradient-tracking methods (Eq. 16, 17, 18, 19) to achieve
/// geometric/linear convergence rate O(qᵏ) where q ∈ (0, 1).
#[derive(Clone, Debug)]
pub struct Constant<S: Scalar> {
/// The fixed step size α > 0.
pub alpha: S,
}
/// Per-agent step sizes: each agent i uses its own αᵢ.
///
/// Paper Eq. (19): "every agent uses its own step size αᵢ, which need
/// not be coordinated with the step sizes of the other agents."
#[derive(Clone, Debug)]
pub struct PerAgent<S: Scalar> {
/// Step size for each agent, indexed by agent ID.
pub alphas: Vec<S>,
}// algorithm/traits.rs
use crate::agent::state::AgentState;
use crate::mixing::{StochasticType, WeightMatrix};
use crate::objective::traits::ObjectiveFn;
use crate::step_size::schedule::StepSizeSchedule;
use crate::types::DecisionVector;
use convective_graph::graph::DenseGraph;
use convective_graph::types::{EdgeDirection, NodeId, Scalar};
/// Requirements that an algorithm imposes on the weight matrix and graph.
pub struct AlgorithmRequirements {
/// Required stochastic property of the primary weight matrix A.
pub matrix_a: StochasticType,
/// Required stochastic property of the secondary matrix B (if needed).
pub matrix_b: Option<StochasticType>,
/// Whether the graph must be undirected.
pub requires_undirected: bool,
/// Whether the graph must be (strongly) connected.
pub requires_connected: bool,
/// Whether a diminishing step size is required (vs. constant allowed).
pub requires_diminishing_step: bool,
/// Communication cost per iteration: how many vectors of size n
/// each agent sends per edge per iteration.
pub vectors_per_edge: usize,
/// Whether scalar auxiliary variables are also exchanged.
pub scalar_per_edge: bool,
}
/// A distributed optimization algorithm.
///
/// Each algorithm is a stateless description of an update rule.
/// The actual state lives in `AgentState` values managed by the solver.
///
/// Type parameters:
/// S — scalar type (f64, f32)
pub trait Algorithm<S: Scalar>: Clone {
/// Human-readable name (e.g., "Consensus-Based Gradient Method").
fn name(&self) -> &'static str;
/// The structural requirements this algorithm imposes.
fn requirements(&self) -> AlgorithmRequirements;
/// Initialize agent i's state at iteration 0.
///
/// Paper: "Starting with arbitrary initial variables xᵢ⁰, i ∈ [m]"
/// Some algorithms have additional initialization requirements
/// (e.g., yᵢ⁰ = 1, dᵢ⁰ = ∇fᵢ(xᵢ⁰)).
fn initialize<O: ObjectiveFn<S>>(
&self,
agent_id: NodeId,
x_init: DecisionVector<S>,
objective: &O,
) -> AgentState<S>;
/// Execute one iteration of the algorithm for agent i.
///
/// This computes xᵢᵏ⁺¹ (and any auxiliary variables) given:
/// - The agent's current state (xᵢᵏ, etc.)
/// - Received neighbor states (already weighted by the mixing matrix)
/// - The agent's local objective function
/// - The current step size αₖ
///
/// Returns the new agent state at iteration k+1.
fn step<O: ObjectiveFn<S>>(
&self,
agent_id: NodeId,
state: &AgentState<S>,
neighbors: &IncomingMessages<S>,
weight_matrix: &WeightMatrix<S>,
objective: &O,
step_size: S,
iteration: usize,
) -> AgentState<S>;
/// Extract the decision variable from the agent state.
/// For most algorithms this is just xᵢᵏ, but for gradient-push
/// it is zᵢᵏ = xᵢᵏ/yᵢᵏ.
fn decision_variable(&self, state: &AgentState<S>) -> DecisionVector<S>;
}// algorithm/consensus.rs
/// Consensus-based gradient method for undirected graphs.
///
/// Paper Eq. (10):
/// vᵢᵏ = Σⱼ₌₁ᵐ aᵢⱼ xⱼᵏ (mixing/consensus step)
/// xᵢᵏ⁺¹ = vᵢᵏ - αₖ ∇fᵢ(vᵢᵏ) (gradient step)
///
/// Requirements:
/// - Weight matrix A must be doubly stochastic
/// - Graph G must be connected and undirected
/// - Step size: diminishing (Σαₖ = ∞, Σαₖ² < ∞) for convergence
///
/// Convergence rate: O(log k / √k) with diminishing step size.
///
/// Also known as "combine-then-adapt" strategy.
///
/// Variant — "adapt-then-combine" (Eq. 12):
/// vᵢᵏ = xᵢᵏ - αₖ ∇fᵢ(xᵢᵏ) (gradient step first)
/// xᵢᵏ⁺¹ = Σⱼ₌₁ᵐ aᵢⱼ vⱼᵏ (mixing step after)
#[derive(Clone, Debug)]
pub struct ConsensusGradient {
/// Which ordering to use.
pub variant: ConsensusVariant,
}
#[derive(Clone, Debug, Copy, PartialEq, Eq)]
pub enum ConsensusVariant {
/// Eq. (10): mix then gradient. "Combine-then-adapt".
CombineThenAdapt,
/// Eq. (12): gradient then mix. "Adapt-then-combine" / diffusion.
AdaptThenCombine,
}// algorithm/push_sum.rs
/// Push–sum consensus method for directed graphs.
///
/// Paper Eq. (14):
/// xᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ xⱼᵏ
/// yᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ yⱼᵏ
///
/// where A = [aᵢⱼ] is column stochastic (Eq. 13).
/// The ratio zᵢᵏ = xᵢᵏ/yᵢᵏ converges to consensus (ratio consensus).
///
/// Requirements:
/// - Weight matrix A must be column stochastic
/// - Graph G must be strongly connected (directed)
/// - yᵢ⁰ = 1 for all i
///
/// This is NOT a gradient method by itself — it's the consensus mechanism
/// on which the gradient-push method (Eq. 15) is built.
#[derive(Clone, Debug)]
pub struct PushSumConsensus;// algorithm/gradient_push.rs
/// Gradient–push method for directed graphs.
///
/// Paper Eq. (15):
/// vᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ xⱼᵏ
/// yᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ yⱼᵏ
/// zᵢᵏ⁺¹ = vᵢᵏ⁺¹ / yᵢᵏ⁺¹
/// xᵢᵏ⁺¹ = vᵢᵏ⁺¹ - αₖ ∇f(zᵢᵏ⁺¹)
///
/// Combines push–sum ratio consensus with a gradient step.
/// The gradient is evaluated at the ratio estimate zᵢᵏ⁺¹, not at xᵢᵏ.
///
/// Requirements:
/// - Weight matrix A must be column stochastic
/// - Graph G must be strongly connected (directed)
/// - yᵢ⁰ = 1 for all i
/// - Step size: diminishing
///
/// Convergence rate: same as consensus gradient — O(log k / √k).
/// Advantage: works on directed graphs without doubly stochastic matrix.
#[derive(Clone, Debug)]
pub struct GradientPush;// algorithm/gradient_track.rs
/// Gradient-tracking method for undirected graphs.
///
/// Paper Eq. (16):
/// xᵢᵏ⁺¹ = Σⱼ₌₁ᵐ aᵢⱼ xⱼᵏ - α dᵢᵏ
/// dᵢᵏ⁺¹ = Σⱼ₌₁ᵐ aᵢⱼ dⱼᵏ + ∇fᵢ(xᵢᵏ⁺¹) - ∇fᵢ(xᵢᵏ)
///
/// The direction dᵢ tracks an estimate of the global gradient (1/m)Σ∇fⱼ.
/// Key insight: the sum Σᵢdᵢᵏ⁺¹ = Σᵢ∇fᵢ(xᵢᵏ⁺¹) is preserved across
/// iterations (gradient tracking invariant).
///
/// Requirements:
/// - Weight matrix A must be doubly stochastic
/// - Graph G must be connected and undirected
/// - dᵢ⁰ = ∇fᵢ(xᵢ⁰) for all i (critical initialization)
/// - Step size: constant α (sufficiently small)
///
/// Convergence rate: GEOMETRIC — O(qᵏ) where q ∈ (0,1), matching
/// centralized gradient descent under strong convexity + Lipschitz gradients.
///
/// Paper: "the distributed methods with gradient tracking in (16) and (17)
/// can match the best known convergence rates of centralized gradient methods."
#[derive(Clone, Debug)]
pub struct GradientTracking {
pub variant: GradientTrackingVariant,
}
#[derive(Clone, Debug, Copy, PartialEq, Eq)]
pub enum GradientTrackingVariant {
/// Eq. (16): For undirected graphs with doubly stochastic A.
Undirected,
/// Eq. (17): For directed graphs with column stochastic A.
/// Uses the same A for mixing both x and d variables.
Directed,
}// algorithm/apull_bpush.rs
/// APull–BPush gradient-tracking method for directed graphs.
///
/// Paper Eq. (18):
/// xᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ (xⱼᵏ - αⱼ dⱼᵏ) (A-pull)
/// dᵢᵏ⁺¹ = Σ_{ℓ ∈ N_i^{out} ∪ {i}} bₗᵢ dₗᵏ + ∇fᵢ(xᵢᵏ⁺¹) - ∇fᵢ(xᵢᵏ) (B-push)
///
/// Key innovation: uses TWO different weight matrices:
/// - A = [aᵢⱼ] is row stochastic (Σⱼ aᵢⱼ = 1)
/// - B = [bᵢⱼ] is column stochastic (Σᵢ bᵢⱼ = 1)
///
/// Paper: "The interesting aspect of the APull–BPush method is that the
/// matrix A = [aᵢⱼ] is row stochastic while the matrix B = [bᵢⱼ] is
/// column stochastic."
///
/// Requirements:
/// - Matrix A: row stochastic, compatible with G
/// - Matrix B: column stochastic, compatible with G
/// - Graph G: strongly connected directed graph
/// - Step size: constant (per-agent αᵢ allowed in variant Eq. 19)
/// - dᵢ⁰ = ∇fᵢ(xᵢ⁰)
///
/// Convergence rate: LINEAR (geometric) under strong convexity + Lipschitz.
///
/// Variant Eq. (19) from [33]:
/// xᵢᵏ⁺¹ = Σ_{j ∈ N_i^{in} ∪ {i}} aᵢⱼ xⱼᵏ - η dᵢᵏ
/// dᵢᵏ⁺¹ = Σ_{ℓ ∈ N_i^{out} ∪ {i}} bₗᵢ (dₗᵏ + ∇fₗ(xₗᵏ⁺¹) - ∇fₗ(xₗᵏ))
#[derive(Clone, Debug)]
pub struct APullBPush {
pub variant: APullBPushVariant,
}
#[derive(Clone, Debug, Copy, PartialEq, Eq)]
pub enum APullBPushVariant {
/// Eq. (18): Original APull–BPush.
Original,
/// Eq. (19): Variant from [33] with slightly different update order.
Variant19,
}The solver is the top-level coordinator that ties together graph, weight matrices, agents, objectives, algorithm, and step-size schedule.
// solver/config.rs
/// Complete configuration for a distributed optimization run.
pub struct SolverConfig<S, Alg, Step>
where
S: Scalar,
Alg: Algorithm<S>,
Step: StepSizeSchedule<S>,
{
/// The algorithm to execute.
pub algorithm: Alg,
/// Step-size schedule.
pub step_size: Step,
/// Maximum number of iterations.
pub max_iterations: usize,
/// Convergence criterion (when to stop early).
pub convergence: ConvergenceCriterion<S>,
/// How often to log iteration data (every N iterations).
pub log_interval: usize,
}// solver/distributed.rs
/// The main distributed optimization solver.
///
/// Orchestrates the iterative process:
/// for k = 0, 1, 2, ... :
/// 1. Each agent computes its local gradient
/// 2. Agents exchange messages with neighbors (simulated locally)
/// 3. Each agent executes the algorithm's update step
/// 4. Check convergence criteria
///
/// This is a SYNCHRONOUS simulation — all agents update simultaneously.
/// For actual distributed deployment, this would be replaced by a
/// message-passing runtime, but the Algorithm trait and AgentState types
/// remain identical.
pub struct DistributedSolver<S, Dir, Alg, Step, O>
where
S: Scalar,
Dir: EdgeDirection,
Alg: Algorithm<S>,
Step: StepSizeSchedule<S>,
O: ObjectiveFn<S>,
{
/// Communication graph G = ([m], E).
graph: DenseGraph<S, Dir>,
/// Primary weight matrix A.
weight_matrix_a: WeightMatrix<S>,
/// Secondary weight matrix B (only for APull–BPush).
weight_matrix_b: Option<WeightMatrix<S>>,
/// Per-agent local objective functions.
objectives: Vec<O>,
/// Per-agent current state.
states: Vec<AgentState<S>>,
/// Algorithm definition.
algorithm: Alg,
/// Step-size schedule.
step_size: Step,
/// Convergence monitor.
monitor: ConvergenceMonitor<S>,
/// Current iteration counter.
iteration: usize,
}
impl<S, Dir, Alg, Step, O> DistributedSolver<S, Dir, Alg, Step, O>
where
S: Scalar,
Dir: EdgeDirection,
Alg: Algorithm<S>,
Step: StepSizeSchedule<S>,
O: ObjectiveFn<S>,
{
/// Create and validate a new solver instance.
///
/// Validates:
/// - Algorithm requirements vs. graph type (directed/undirected)
/// - Weight matrix stochastic type vs. algorithm requirements
/// - Weight matrix compatibility with graph structure
/// - Number of objectives == number of agents == number of graph nodes
pub fn new(
graph: DenseGraph<S, Dir>,
weight_matrix_a: WeightMatrix<S>,
weight_matrix_b: Option<WeightMatrix<S>>,
objectives: Vec<O>,
initial_points: Vec<DecisionVector<S>>,
config: SolverConfig<S, Alg, Step>,
) -> Result<Self, MlError>;
/// Run the full optimization loop until convergence or max iterations.
pub fn solve(&mut self) -> SolverResult<S>;
/// Execute a single iteration across all agents (synchronous).
pub fn step(&mut self) -> IterationSnapshot<S>;
/// Simulate the communication round: collect and distribute messages.
fn exchange_messages(&self) -> Vec<IncomingMessages<S>>;
/// Compute the average iterate x̄ᵏ = (1/m) Σᵢ xᵢᵏ.
/// Paper: "the averaged iterates x̄ᵏ⁺¹ will follow an erroneous
/// gradient method."
pub fn average_iterate(&self) -> DecisionVector<S>;
/// Compute the disagreement: max_i ‖xᵢᵏ - x̄ᵏ‖.
/// Paper: "the disagreement sequence {‖xᵢᵏ - x̄ᵏ‖} converges to zero."
pub fn disagreement(&self) -> S;
}// solver/distributed.rs
/// The outcome of a completed optimization run.
pub struct SolverResult<S: Scalar> {
/// The solution estimate (average of final iterates).
pub solution: DecisionVector<S>,
/// Per-agent final iterates.
pub agent_solutions: Vec<DecisionVector<S>>,
/// Total iterations executed.
pub iterations: usize,
/// Whether convergence was achieved.
pub converged: bool,
/// Final disagreement metric.
pub final_disagreement: S,
/// Final objective value at the solution estimate.
pub final_objective_value: S,
/// Iteration history (logged at log_interval).
pub history: Vec<IterationSnapshot<S>>,
}
/// A snapshot of the system state at a single iteration.
pub struct IterationSnapshot<S: Scalar> {
pub iteration: usize,
pub step_size: S,
/// Average iterate x̄ᵏ.
pub average_iterate: DecisionVector<S>,
/// Global objective (1/m)Σfᵢ evaluated at x̄ᵏ.
pub objective_value: S,
/// max_i ‖xᵢᵏ - x̄ᵏ‖ (disagreement).
pub disagreement: S,
/// ‖∇F(x̄ᵏ)‖ (gradient norm at average).
pub gradient_norm: S,
}// convergence/criteria.rs
/// When to stop the optimization loop.
#[derive(Clone, Debug)]
pub enum ConvergenceCriterion<S: Scalar> {
/// Stop when max_i ‖xᵢᵏ - x̄ᵏ‖ < ε.
/// Paper: disagreement → 0.
Disagreement { tolerance: S },
/// Stop when ‖x̄ᵏ⁺¹ - x̄ᵏ‖ < ε.
IterateDelta { tolerance: S },
/// Stop when ‖∇F(x̄ᵏ)‖ < ε where F = (1/m)Σfᵢ.
GradientNorm { tolerance: S },
/// Stop when |F(x̄ᵏ) - F(x*)| < ε (requires known optimum).
OptimalityGap { tolerance: S, optimal_value: S },
/// Combine multiple criteria with AND semantics.
All(Vec<ConvergenceCriterion<S>>),
/// Combine multiple criteria with OR semantics.
Any(Vec<ConvergenceCriterion<S>>),
}// convergence/metrics.rs
/// Metrics for analyzing distributed optimization behavior.
pub struct ConvergenceMetrics<S: Scalar> {
/// Per-agent disagreement ‖xᵢᵏ - x̄ᵏ‖ over time.
pub disagreement_history: Vec<Vec<S>>,
/// Objective value F(x̄ᵏ) over time.
pub objective_history: Vec<S>,
/// Estimated convergence rate (e.g., slope of log(F(x̄ᵏ) - F*) vs k).
pub estimated_rate: Option<S>,
}// errors/mod.rs
use thiserror::Error;
#[derive(Error, Debug, Clone, PartialEq)]
pub enum MlError {
// ── Configuration errors ───────────────────────────────────────
#[error("Agent count mismatch: graph has {graph_nodes} nodes but {provided} objectives provided")]
AgentCountMismatch { graph_nodes: usize, provided: usize },
#[error("Dimension mismatch: expected vectors of dimension {expected}, got {actual}")]
DimensionMismatch { expected: usize, actual: usize },
#[error("Algorithm '{algorithm}' requires {required:?} matrix but got {actual:?}")]
StochasticitMismatch {
algorithm: &'static str,
required: StochasticType,
actual: StochasticType,
},
#[error("Algorithm '{algorithm}' requires undirected graph")]
DirectedGraphNotSupported { algorithm: &'static str },
#[error("Algorithm '{algorithm}' requires a connected graph but graph has {components} components")]
DisconnectedGraph {
algorithm: &'static str,
components: usize,
},
// ── Weight matrix errors ───────────────────────────────────────
#[error("Weight matrix entry ({i}, {j}) = {value} is incompatible with graph: no edge exists")]
IncompatibleWeight { i: usize, j: usize, value: f64 },
#[error("Row {row} sums to {sum}, expected 1.0 (tolerance {tol})")]
RowSumViolation { row: usize, sum: f64, tol: f64 },
#[error("Column {col} sums to {sum}, expected 1.0 (tolerance {tol})")]
ColumnSumViolation { col: usize, sum: f64, tol: f64 },
#[error("Negative weight at ({i}, {j}): {value}")]
NegativeWeight { i: usize, j: usize, value: f64 },
// ── Step size errors ───────────────────────────────────────────
#[error("Step size must be positive, got {value}")]
NonPositiveStepSize { value: f64 },
#[error("Diminishing exponent b must be in ((1/2), 1], got {value}")]
InvalidDiminishingExponent { value: f64 },
// ── Convergence errors ─────────────────────────────────────────
#[error("Solver did not converge after {iterations} iterations (disagreement: {disagreement})")]
ConvergenceFailure {
iterations: usize,
disagreement: f64,
},
#[error("Numerical instability detected at iteration {iteration}: {description}")]
NumericalInstability {
iteration: usize,
description: String,
},
// ── Objective function errors ──────────────────────────────────
#[error("Empty dataset for agent {agent}")]
EmptyDataset { agent: usize },
#[error("Regularization coefficient must be positive, got {value}")]
NonPositiveRegularization { value: f64 },
// ── Graph errors (propagated) ──────────────────────────────────
#[error("Graph error: {0}")]
GraphError(#[from] convective_graph::errors::GraphError),
}Summary of what each algorithm needs, for validation at solver construction:
| Algorithm | Eq. | Graph | Matrix A | Matrix B | Step Size | dᵢ⁰ | yᵢ⁰ | Rate |
|---|---|---|---|---|---|---|---|---|
| ConsensusGradient (CTA) | (10) | Undirected, connected | Doubly stochastic | — | Diminishing | — | — | O(log k / √k) |
| ConsensusGradient (ATC) | (12) | Undirected, connected | Doubly stochastic | — | Diminishing | — | — | O(log k / √k) |
| PushSumConsensus | (14) | Directed, strongly connected | Column stochastic | — | — | — | 1 | (consensus only) |
| GradientPush | (15) | Directed, strongly connected | Column stochastic | — | Diminishing | — | 1 | O(log k / √k) |
| GradientTracking (undir) | (16) | Undirected, connected | Doubly stochastic | — | Constant | ∇fᵢ(xᵢ⁰) | — | O(qᵏ) geometric |
| GradientTracking (dir) | (17) | Directed, strongly connected | Column stochastic | — | Constant | ∇fᵢ(xᵢ⁰) | — | O(qᵏ) geometric |
| APullBPush (original) | (18) | Directed, strongly connected | Row stochastic | Column stochastic | Constant | ∇fᵢ(xᵢ⁰) | — | Linear |
| APullBPush (variant) | (19) | Directed, strongly connected | Row stochastic | Column stochastic | Constant (per-agent) | ∇fᵢ(xᵢ⁰) | — | Linear |
Per iteration, per edge in the graph:
| Algorithm | Vectors (size n) | Scalars | Total per edge |
|---|---|---|---|
| Consensus Gradient (10) | 1 (x) | 0 | n |
| Adapt-then-Combine (12) | 1 (v) | 0 | n |
| Push-Sum (14) | 1 (x) | 1 (y) | n + 1 |
| Gradient-Push (15) | 1 (x) | 1 (y) | n + 1 |
| Gradient-Tracking (16) | 2 (x, d) | 0 | 2n |
| Gradient-Tracking directed (17) | 2 (x, d) | 1 (y) | 2n + 1 |
| APull-BPush (18) | 2 (x-αd, d) | 0 | 2n |
| Variant (19) | 2 (x, d) | 0 | 2n |
Phase 1: convective-graph additions (PREREQUISITE)
├── [G1] NeighborQuery trait + DenseStorage impl
├── [G2] DegreeComputable trait + impl
├── [G3] LaplacianComputable trait + impl
└── [G4] SpectralComputable trait + impl (spectral gap)
Phase 2: convective-ml foundation
├── [M1] types/vector.rs (DecisionVector, GradientVector, DirectionVector)
├── [M2] errors/mod.rs (MlError)
├── [M3] objective/traits.rs (ObjectiveFn, LossFunction, Regularizer)
├── [M4] objective/loss.rs (LogisticLoss, HingeLoss, SquaredLoss)
├── [M5] objective/regularizer.rs (L2Regularizer)
└── [M6] objective/local.rs (LocalObjective)
Phase 3: convective-ml mixing layer
├── [M7] mixing/stochastic.rs (StochasticType)
├── [M8] mixing/traits.rs (WeightMatrix, MixingStrategy)
└── [M9] mixing/builder.rs (EqualNeighborWeight, MetropolisHastings, etc.)
└── depends on [G1] for neighbor queries
Phase 4: convective-ml agent + step size
├── [M10] agent/state.rs (AgentState enum)
├── [M11] agent/neighborhood.rs (AgentMessage, IncomingMessages)
└── [M12] step_size/schedule.rs (Diminishing, Constant, PerAgent)
Phase 5: convective-ml algorithms
├── [M13] algorithm/traits.rs (Algorithm trait, AlgorithmRequirements)
├── [M14] algorithm/consensus.rs (ConsensusGradient — simplest, implement first)
├── [M15] algorithm/push_sum.rs (PushSumConsensus)
├── [M16] algorithm/gradient_push.rs (GradientPush)
├── [M17] algorithm/gradient_track.rs (GradientTracking)
└── [M18] algorithm/apull_bpush.rs (APullBPush)
Phase 6: convective-ml solver + convergence
├── [M19] convergence/criteria.rs (ConvergenceCriterion)
├── [M20] convergence/metrics.rs (ConvergenceMetrics)
├── [M21] solver/config.rs (SolverConfig)
├── [M22] solver/distributed.rs (DistributedSolver)
└── [M23] solver/monitor.rs (ConvergenceMonitor)
The design intentionally preserves convective-graph's core principles:
Immutable-after-build graphs: The communication graph G is static throughout the optimization (the paper considers static graphs). The DenseGraph is constructed once and passed immutably to the solver. This aligns perfectly with the existing design.
Matrix-first philosophy: Weight matrices A and B are DMatrix<S> wrappers. The spectral gap of A (an eigenvalue computation) is a core convergence diagnostic. The adjacency matrix is directly used to construct weight matrices. All of this leverages the existing nalgebra integration.
Scalar generics: All types are generic over S: Scalar, inheriting the existing f64/f32 support. Convergence rate constants, step sizes, and tolerances are all S-typed.
NodeId as AgentId: The paper's agent indices i ∈ [m] map directly to NodeId. No new identity type needed — just a type alias.
Graph is not mutated by ML: convective-ml depends on convective-graph read-only. It never adds edges, removes nodes, or modifies the graph structure. This is a clean dependency direction.
The paper explicitly identifies these open directions, which the architecture should accommodate:
-
Time-varying graphs: Graph
Gchanges between iterations. The solver could accept aGraphSequenceorFn(k) -> Graphinstead of a fixed graph. TheAlgorithmtrait'sstepmethod already takes the weight matrix as a parameter, making this straightforward. -
Stochastic gradients: Replace
∇fᵢ(x)with a stochastic estimate using a mini-batch. TheObjectiveFntrait could gain astochastic_gradient(x, rng)method. -
Asynchronous updates: Agents don't wait for all others. This requires replacing the synchronous
solve()loop with an event-driven runtime (tokio tasks). TheAgentStateandAlgorithmtypes remain identical. -
Quantized communication: Compress messages to reduce bandwidth. An
AgentMessage→CompressedMessagetransformation layer. -
Constrained optimization: Intersect local constraint sets
Xᵢ. Add a projection step after the gradient update:xᵢᵏ⁺¹ = Π_{Xᵢ}(update). -
Nonconvex objectives: The paper focuses on convex
fᵢ. Extensions to nonconvex problems (e.g., neural networks) would require different convergence criteria and possibly different algorithms.