trnsolver

Linear solvers and eigendecomposition for AWS Trainium via NKI.

Eigenvalue problems, matrix factorizations, and iterative solvers for scientific computing on Trainium. The Jacobi eigensolver is the primary NKI acceleration target — each Givens rotation maps to a 2-row matmul on the Tensor Engine. The Newton-Schulz matrix-sqrt-inverse is the secondary target: an all-GEMM iteration whose shape aligns with the Tensor Engine pipeline.

Part of the trnsci scientific computing suite (github.com/trnsci).

Current phase

trnsolver follows the trnsci 5-phase roadmap. Active work is tracked in phase-labeled GitHub issues:

Phase 1 — correctness (active): NKI Jacobi kernel validated on hardware, eigh_generalized on NKI path, SCF example end-to-end. Target release: v0.4.0.
Phase 2 — precision: iterative refinement for eigh / solve_spd, Kahan summation in CG / GMRES.
Phase 3 — perf: Newton-Schulz NKI backend, preconditioner suite, NEFF cache reuse.
Phase 4 — multi-chip: parallel Jacobi sweeps across NeuronCores.
Phase 5 — generation: trn2 rotation-block tuning.

Install

pip install trnsolver

# With Neuron hardware support
pip install trnsolver[neuron]

Quick example

import torch
import trnsolver

# Symmetric eigenvalue decomposition
w, V = trnsolver.eigh(A)

# Generalized eigenproblem: A x = λ B x  (the SCF problem)
w, V = trnsolver.eigh_generalized(F, S)

# Factorizations + direct solves
L = trnsolver.cholesky(A)
x = trnsolver.solve_spd(A, b)
M = trnsolver.inv_sqrt_spd(A)                            # eigendecomposition-based
M, iters, res = trnsolver.inv_sqrt_spd_ns(A, tol=1e-8)   # Newton-Schulz, all-GEMM

# Iterative solvers with preconditioners
precond = trnsolver.jacobi_preconditioner(A)
x, iters, res = trnsolver.cg(A, b, M=precond, tol=1e-8)
x, iters, res = trnsolver.gmres(A, b, tol=1e-6)

Why

Trainium has no native LAPACK. Every SCF iteration, every density-fitting metric inversion, every Krylov solve on Trainium currently falls back to torch.linalg on the host CPU or hand-rolled wrappers. trnsolver closes that gap: same solver API surface, NKI-accelerated Jacobi on the Tensor Engine, PyTorch fallback everywhere else.

SCF example

python examples/scf_eigen.py --demo
python examples/scf_eigen.py --nbasis 50 --nocc 10

Demonstrates the self-consistent-field iteration: build Fock matrix → solve generalized eigenproblem FC = SCε → build density → check convergence. This is the headline use case for quantum-chemistry workflows, feeding into DF-MP2 via trnblas.

Status

v0.3.0 — PyTorch path is feature-complete. NKI Jacobi kernel is scaffolded but not yet validated on hardware; set_backend("auto") falls back to torch.linalg.eigh everywhere until v0.4.0 lands.

API coverage:

Category	Shipped (v0.3.0)	Deferred
Eigensolvers	`eigh`, `eigh_generalized`	`svd` (Jacobi-SVD target for v0.5.0)
Factorizations	`cholesky`, `lu`, `qr`	`schur`, `pinv` (see #22)
Direct solvers	`solve`, `solve_spd`, `inv_spd`, `inv_sqrt_spd`, `inv_sqrt_spd_ns`	—
Iterative	`cg` (w/ preconditioner), `gmres`	IC0/SSOR/block-Jacobi (#16)
Preconditioners	`jacobi_preconditioner`	See #16

Roadmap:

v0.4.0 — NKI Jacobi rotation kernel validated on trn1.2xlarge (#9, #12)
v0.5.0 — Newton-Schulz NKI backend via trnblas GEMM (#14, #25), preconditioner expansion (#16), scipy.linalg parity audit (#22)
v0.6.0+ — BF16/FP16 across the API (#19), multi-NeuronCore parallel Jacobi sweep (#20)

Operations

Category	Operation	Description
Eigen	`eigh`	Symmetric eigendecomposition (Jacobi / torch)
Eigen	`eigh_generalized`	Generalized: `Ax = λBx` via Cholesky reduction
Factor	`cholesky`	`A = LL^T`
Factor	`lu`	`PA = LU`
Factor	`qr`	`A = QR`
Solve	`solve`	`Ax = b` (LU-based)
Solve	`solve_spd`	`Ax = b` (Cholesky, A is SPD)
Solve	`inv_spd`	`A^{-1}` for SPD A
Solve	`inv_sqrt_spd`	`A^{-1/2}` via eigendecomposition
Solve	`inv_sqrt_spd_ns`	`A^{-1/2}` via Newton-Schulz (all-GEMM)
Iterative	`cg`	Conjugate Gradient (SPD systems)
Iterative	`gmres`	GMRES (general systems)
Iterative	`jacobi_preconditioner`	Diagonal preconditioner for CG

Benchmarks

CPU baselines (torch.linalg, scipy.linalg, trnsolver PyTorch path) run on every CI build; CUDA baselines (benchmarks/bench_cuda.py, cuSOLVER via torch.linalg) run on a vintage-matched g5.xlarge A10G instance; NKI numbers are pending v0.4.0 hardware validation. See the benchmarks page for the latest table and vintage-matching rationale.

Related projects in the trnsci suite

All six siblings are on PyPI, along with the umbrella meta-package:

Project	What	Latest
trnsci	Umbrella meta-package pulling the whole suite	v0.1.0
trnfft	FFT and complex-valued tensors	v0.8.0
trnblas	BLAS Level 1–3	v0.4.0
trnrand	Philox / Sobol / Halton RNG	v0.1.0
trnsolver	Linear solvers and eigendecomposition	v0.3.0
trnsparse	Sparse matrix operations	v0.1.1
trntensor	Tensor contractions (einsum, TT/Tucker)	v0.1.1

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
benchmarks		benchmarks
docs		docs
examples		examples
infra/terraform		infra/terraform
scripts		scripts
tests		tests
trnsolver		trnsolver
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

trnsolver

Current phase

Install

Quick example

Why

SCF example

Status

Operations

Benchmarks

Related projects in the trnsci suite

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

trnsolver

Current phase

Install

Quick example

Why

SCF example

Status

Operations

Benchmarks

Related projects in the trnsci suite

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages