Sparse matrix operations for AWS Trainium via NKI.
CSR/COO formats, SpMV, SpMM, and integral screening for sparse scientific computing on Trainium. Part of the trnsci scientific computing suite (github.com/trnsci).
trnsparse follows the trnsci 5-phase roadmap. Active work is tracked in phase-labeled GitHub issues:
- Phase 1 — correctness ✅ v0.2.0: NKI SpMM validated on trn1 via densify-then-GEMM; first
torch.autograd.Function-wrapped NKI kernel in the suite (seetrnsci/trnsci#3). Benchmarks indocs/benchmarks.md. - Phase 3 — perf: nnz-bucketing SpMM, streaming large-sparse, NEFF cache reuse.
- Phase 4 — multi-chip: sharded sparse matrices across chips.
- Phase 5 — generation: trn2 DMA bandwidth exploitation.
(No Phase 2 for trnsparse — the precision story is inherited from trnblas.)
Suite-wide tracker: trnsci/trnsci#1.
pip install trnsparseimport torch
import trnsparse
# Dense → sparse
A = torch.randn(100, 100)
A[torch.abs(A) < 1.0] = 0.0
csr = trnsparse.from_dense(A)
# SpMV: y = A @ x
y = trnsparse.spmv(csr, x, alpha=2.0)
# SpMM: C = A @ B
C = trnsparse.spmm(csr, B)
# Integral screening
Q = trnsparse.schwarz_bounds(diagonal_integrals)
mask = trnsparse.screen_quartets(Q, threshold=1e-10)
stats = trnsparse.sparsity_stats(Q)| Operation | Description |
|---|---|
spmv |
Sparse × dense vector |
spmm |
Sparse × dense matrix |
spmv_symmetric |
Symmetric SpMV (half storage) |
sparse_add |
C = αA + βB |
sparse_scale |
B = αA |
sparse_transpose |
A^T |
schwarz_bounds |
Schwarz screening bounds |
screen_quartets |
Shell quartet significance mask |
density_screen |
Density-weighted screening |
Apache 2.0 — Copyright 2026 Scott Friedman