Add sparse Lanczos SVD solver#3034
Conversation
|
Hi @Intron7, I haven't looked at this PR closely yet, but one thing to think about is that we should try and reuse parts of the existing lanczos eigensolver as much as possible. They should have some things in common right? |
|
/ok to test eabeaa4 |
|
@aamijar the algorithm is different even is the name is similar. The two paths share the Lanczos name but the kernels are different algorithms: the eigensolver builds a symmetric tridiagonal via a one-vector recurrence and ritz-solves with syevd; the SVD builds a bidiagonal via Golub-Kahan with two coupled bases (A @ v and Aᵀ @ u) and ritz-solves with gesvdj. Restart is also different, the SVD path locks converged singular triplets and restarts on the unconverged subspace. |
|
/ok to test b80ca30 |
| /** | ||
| * @addtogroup sparse_lanczos_svd | ||
| * @{ | ||
| */ |
There was a problem hiding this comment.
To keep to our conventions, can you put this in a file called solver_types.hpp? Create it if it oesn't exist alredy.
| void sparse_lanczos_svd( | ||
| raft::resources const& handle, | ||
| sparse_lanczos_svd_config<ValueTypeT> const& config, | ||
| raft::device_csr_matrix_view<const ValueTypeT, int, int, NNZTypeT> A, |
There was a problem hiding this comment.
Using our c++ sparse APIs. Very nice!
As discussed previously with @cjnolet I'm also adding my Lanczos SVD solver for sparse CSR matrices.
This is the more precise sparse SVD path next to the existing randomized solver. The solver repeatedly applies
A @ vandA.T @ uto build Krylov bases, computes the SVD of the small bidiagonal problem, uses the resulting Ritz vectors to identify converged singular triplets, locks those vectors, and restarts on the remaining unconverged part. It also uses full reorthogonalization and a finalA @ Vrefinement step to improve singular values and left singular vectors.Compared with randomized SVD, this is aimed at quality: clustered spectra, slow singular-value decay, near-rank-deficient inputs, and PCA workloads where ARPACK-like accuracy matters.
Ran the #2999 -style row sweep on the same singlecell dataset, k=50, n_oversamples=10, n_power_iters=2, best-of-3 GPU timings. GPU: RTX PRO 6000 Blackwell.
Using the 2999 CPU baselines for context: