Skip to content

ENH: sparse: Add CSR @ CSC sparsetools function#3

Open
dschult wants to merge 6 commits into
mainfrom
sparsetools_csr_matcsc
Open

ENH: sparse: Add CSR @ CSC sparsetools function#3
dschult wants to merge 6 commits into
mainfrom
sparsetools_csr_matcsc

Conversation

@dschult
Copy link
Copy Markdown
Owner

@dschult dschult commented Mar 21, 2024

Sparsetools does matmul between two CSR matrices. This is usually quite good, but requires "wide" times "tall" matrices to convert the tall matrix to CSR format. For a "tall" matrix the indptr array must store M+1 entries, so if M is large this can lose much of the advantage of sparse storage.

The new function allows A@B to store B as a CSC matrix, so the indptr array needs to store N+1 entries. In the case of "wide A" times "tall B", storage is best with A as CSR and B as CSC. This algorithm allows that.

Computation speed is comparable to the current CSR@CSR function. It is faster when the number of sparse entries that align (and multiply during the matmul operation) is larger than the number of sparse entries that do not align. Thus A@A.T or A.T@A are ideal candidates for this new function. If A is CSR, then A.T is a zero-copy CSC format, and every sparse element aligns at least with itself in the transpose.

Unfortunately figuring out ahead of time how many sparse values will align requires computation. So it's not clear what algorithm should be used to determine which matmul function to use. Certainly a 1D times a 1D will be faster using this new function. And in general "wide" times "tall" is a good rule of thumb. But the placement of the values is key to predicting which is faster.

Deciding which function to use based on memory usage is another approach. And letting the user decide might be the best way -- but would involve format specific method calls.

Below is a plot of timings of the functions for various M, N of the resulting matrix A@B and the number of "hits" (aligned values) and "misses" (not aligned). It's hard to plot results for 4 parameters. But the first plot has "hits" vs "misses" with each data position increased slightly to the right for N and upward for M. So each value of hits and misses has a pattern of dots showing the N,M dependence. Color of the dot shows which function is faster with intensity showing how much faster.

Predictions for which is faster are made using a formula: hits > miss | (hits == miss & N == 1)
Incorrect predictions have a black circle around the dot on the scatterplot. 88% of the data is predicted correctly. That can be increased to 92% with a more complicated formula (buried in the timing code in this PR along with the data and the 3rd png file shown here).

hit_vs_miss

The second plot uses N and M on the axis, with each value shifted slightly for hits and misses. The patterns are harder for me to see than the first plot. But the data is the same.

M_vs_N

Here is the first plot with a more complicated decision criteria:

hit_vs_miss_complicated_predictions

dschult pushed a commit that referenced this pull request Sep 14, 2024
Proposed update to the extend-coo-nd branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant