A minimal scientific-computing library for solving large sparse linear systems Ax = b using iterative Krylov methods with:
- CPU (single-thread) baseline
- CPU (OpenMP) acceleration
- GPU (CUDA) vector ops + scaffolding
This repo is designed to teach you the math and the performance fundamentals as you build.
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
ctest --test-dir build
./build/bin/solve --nx 128 --ny 128 --tol 1e-8 --max-iters 5000cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_OMP=ON
cmake --build build -jcmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_CUDA=ON
cmake --build build -j
./build/bin/cuda_sanity --n 1000000- CSR sparse matrix
- 2D Poisson/Laplacian matrix generator (SPD)
- SpMV (CPU; OpenMP-enabled if built with
ENABLE_OMP) - BLAS-1 ops: dot, axpy, scal, norm2 (CPU; OpenMP-enabled)
- CG solver (CPU)
- Jacobi preconditioner
- Minimal tests + benchmark-style CLI app
We solve Ax = b where A is large and sparse.
Conjugate Gradient (CG) applies when A is symmetric positive definite (SPD).
It builds a sequence of approximations x_k that (in exact arithmetic) converges in at most n steps, but in practice converges much sooner depending on conditioning.
- Residual:
r_k = b - A x_k - We stop when:
||r_k|| / ||b|| < tol
Given x0, r0 = b - A x0, p0 = r0
alpha_k = (r_k^T r_k) / (p_k^T A p_k)
x_{k+1} = x_k + alpha_k p_k
r_{k+1} = r_k - alpha_k A p_k
beta_k = (r_{k+1}^T r_{k+1}) / (r_k^T r_k)
p_{k+1} = r_{k+1} + beta_k p_k
We apply an easy approximate inverse M^{-1} each iteration to reduce the effective condition number.
Jacobi uses M = diag(A) so z = M^{-1} r is just elementwise scaling.