Skip to content

Add sparse Lanczos SVD solver#3034

Open
Intron7 wants to merge 3 commits into
rapidsai:mainfrom
Intron7:feat/add-lanczos-svds
Open

Add sparse Lanczos SVD solver#3034
Intron7 wants to merge 3 commits into
rapidsai:mainfrom
Intron7:feat/add-lanczos-svds

Conversation

@Intron7
Copy link
Copy Markdown
Contributor

@Intron7 Intron7 commented May 21, 2026

As discussed previously with @cjnolet I'm also adding my Lanczos SVD solver for sparse CSR matrices.

This is the more precise sparse SVD path next to the existing randomized solver. The solver repeatedly applies A @ v and A.T @ u to build Krylov bases, computes the SVD of the small bidiagonal problem, uses the resulting Ritz vectors to identify converged singular triplets, locks those vectors, and restarts on the remaining unconverged part. It also uses full reorthogonalization and a final A @ V refinement step to improve singular values and left singular vectors.

Compared with randomized SVD, this is aimed at quality: clustered spectra, slow singular-value decay, near-rank-deficient inputs, and PCA workloads where ARPACK-like accuracy matters.

Ran the #2999 -style row sweep on the same singlecell dataset, k=50, n_oversamples=10, n_power_iters=2, best-of-3 GPU timings. GPU: RTX PRO 6000 Blackwell.



  ┌──────┬───────┬─────────────────┬──────────────┬─────────────────────────┬─────────────────────┬──────────────────┐
  │ rows │   nnz │ raft randomized │ raft Lanczos │          Lanczos / rand │ randomized residual │ Lanczos residual │
  ├──────┼───────┼─────────────────┼──────────────┼─────────────────────────┼─────────────────────┼──────────────────┤
  │  50k │  101M │          0.180s │       0.252s │            1.40x slower │            3.11e-02 │         1.55e-07 │
  │ 200k │  400M │          0.698s │       1.328s │            1.90x slower │            3.12e-02 │         1.57e-07 │
  │ 500k │ 1.02B │          1.776s │       1.763s │          basically tied │            3.06e-02 │         1.57e-07 │
  │ 982k │ 2.01B │          3.536s │       3.423s │ Lanczos slightly faster │            3.09e-02 │         1.56e-07 │
  └──────┴───────┴─────────────────┴──────────────┴─────────────────────────┴─────────────────────┴──────────────────┘

Using the 2999 CPU baselines for context:

  ┌──────┬─────────────┬──────────────┬─────────────────────────────┬──────────────────────────┐
  │ rows │ sklearn CPU │ scipy ARPACK │ randomized speedup vs scipy │ Lanczos speedup vs scipy │
  ├──────┼─────────────┼──────────────┼─────────────────────────────┼──────────────────────────┤
  │  50k │       7.38s │       18.69s │                        104x │                      74x │
  │ 200k │      25.61s │       62.49s │                         90x │                      47x │
  │ 500k │      64.97s │      153.40s │                         86x │                      87x │
  │ 982k │     126.16s │      307.04s │                         87x │                      90x │
  └──────┴─────────────┴──────────────┴─────────────────────────────┴──────────────────────────┘

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aamijar aamijar added non-breaking Non-breaking change feature request New feature or request labels May 26, 2026
@aamijar aamijar moved this to In Progress in Unstructured Data Processing May 26, 2026
@aamijar
Copy link
Copy Markdown
Member

aamijar commented May 26, 2026

Hi @Intron7, I haven't looked at this PR closely yet, but one thing to think about is that we should try and reuse parts of the existing lanczos eigensolver as much as possible. They should have some things in common right?

@aamijar
Copy link
Copy Markdown
Member

aamijar commented May 27, 2026

/ok to test eabeaa4

@Intron7
Copy link
Copy Markdown
Contributor Author

Intron7 commented Jun 3, 2026

@aamijar the algorithm is different even is the name is similar. The two paths share the Lanczos name but the kernels are different algorithms: the eigensolver builds a symmetric tridiagonal via a one-vector recurrence and ritz-solves with syevd; the SVD builds a bidiagonal via Golub-Kahan with two coupled bases (A @ v and Aᵀ @ u) and ritz-solves with gesvdj. Restart is also different, the SVD path locks converged singular triplets and restarts on the unconverged subspace.
The realistic shared surface is the reorthogonalization helpers (CGS2/MGS2) and the cublas wrapper calls. The existing lanczos.cuh does its reorthogonalization inline against its own single-vector layout, so factoring CGS2/MGS2 into a shared utility would require touching the existing eigensolver too. I'd rather land this PR as-is and do a separate refactor PR to extract a shared bidiag_reorth /lanczos_reorth utility if you want.

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Jun 4, 2026

/ok to test b80ca30

/**
* @addtogroup sparse_lanczos_svd
* @{
*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep to our conventions, can you put this in a file called solver_types.hpp? Create it if it oesn't exist alredy.

void sparse_lanczos_svd(
raft::resources const& handle,
sparse_lanczos_svd_config<ValueTypeT> const& config,
raft::device_csr_matrix_view<const ValueTypeT, int, int, NNZTypeT> A,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using our c++ sparse APIs. Very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants