ENH: sparse: Add CSR @ CSC sparsetools function by dschult · Pull Request #3 · dschult/scipy

dschult · 2024-03-21T21:58:14Z

Sparsetools does matmul between two CSR matrices. This is usually quite good, but requires "wide" times "tall" matrices to convert the tall matrix to CSR format. For a "tall" matrix the indptr array must store M+1 entries, so if M is large this can lose much of the advantage of sparse storage.

The new function allows A@B to store B as a CSC matrix, so the indptr array needs to store N+1 entries. In the case of "wide A" times "tall B", storage is best with A as CSR and B as CSC. This algorithm allows that.

Computation speed is comparable to the current CSR@CSR function. It is faster when the number of sparse entries that align (and multiply during the matmul operation) is larger than the number of sparse entries that do not align. Thus A@A.T or A.T@A are ideal candidates for this new function. If A is CSR, then A.T is a zero-copy CSC format, and every sparse element aligns at least with itself in the transpose.

Unfortunately figuring out ahead of time how many sparse values will align requires computation. So it's not clear what algorithm should be used to determine which matmul function to use. Certainly a 1D times a 1D will be faster using this new function. And in general "wide" times "tall" is a good rule of thumb. But the placement of the values is key to predicting which is faster.

Deciding which function to use based on memory usage is another approach. And letting the user decide might be the best way -- but would involve format specific method calls.

Below is a plot of timings of the functions for various M, N of the resulting matrix A@B and the number of "hits" (aligned values) and "misses" (not aligned). It's hard to plot results for 4 parameters. But the first plot has "hits" vs "misses" with each data position increased slightly to the right for N and upward for M. So each value of hits and misses has a pattern of dots showing the N,M dependence. Color of the dot shows which function is faster with intensity showing how much faster.

Predictions for which is faster are made using a formula: hits > miss | (hits == miss & N == 1)
Incorrect predictions have a black circle around the dot on the scatterplot. 88% of the data is predicted correctly. That can be increased to 92% with a more complicated formula (buried in the timing code in this PR along with the data and the 3rd png file shown here).

The second plot uses N and M on the axis, with each value shifted slightly for hits and misses. The patterns are harder for me to see than the first plot. But the data is the same.

Here is the first plot with a more complicated decision criteria:

Proposed update to the extend-coo-nd branch

dschult added 6 commits March 21, 2024 09:23

include csr_matmul_csc

a22642a

tweaks to timing script

cc89f54

update timing and multiply scripts

c05f159

make timing data and some plots.

5125a58

improved predictions

3546935

remove old data files

1181c4b

dschult pushed a commit that referenced this pull request May 5, 2024

TST: stats: remove some unnecessary specification of dtype (#3)

80b4bed

dschult pushed a commit that referenced this pull request Sep 14, 2024

Merge pull request #3 from dschult/extend-coo-nd

5c1ae1f

Proposed update to the extend-coo-nd branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: sparse: Add CSR @ CSC sparsetools function#3

ENH: sparse: Add CSR @ CSC sparsetools function#3
dschult wants to merge 6 commits into
mainfrom
sparsetools_csr_matcsc

dschult commented Mar 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dschult commented Mar 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant