ENH: sparse: Add CSR @ CSC sparsetools function#3
Open
dschult wants to merge 6 commits into
Open
Conversation
dschult
pushed a commit
that referenced
this pull request
May 5, 2024
dschult
pushed a commit
that referenced
this pull request
Sep 14, 2024
Proposed update to the extend-coo-nd branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sparsetools does matmul between two CSR matrices. This is usually quite good, but requires "wide" times "tall" matrices to convert the tall matrix to CSR format. For a "tall" matrix the
indptrarray must storeM+1entries, so ifMis large this can lose much of the advantage of sparse storage.The new function allows
A@Bto storeBas a CSC matrix, so theindptrarray needs to storeN+1entries. In the case of "wide A" times "tall B", storage is best withAas CSR andBas CSC. This algorithm allows that.Computation speed is comparable to the current CSR@CSR function. It is faster when the number of sparse entries that align (and multiply during the matmul operation) is larger than the number of sparse entries that do not align. Thus
A@A.TorA.T@Aare ideal candidates for this new function. If A is CSR, thenA.Tis a zero-copy CSC format, and every sparse element aligns at least with itself in the transpose.Unfortunately figuring out ahead of time how many sparse values will align requires computation. So it's not clear what algorithm should be used to determine which matmul function to use. Certainly a 1D times a 1D will be faster using this new function. And in general "wide" times "tall" is a good rule of thumb. But the placement of the values is key to predicting which is faster.
Deciding which function to use based on memory usage is another approach. And letting the user decide might be the best way -- but would involve format specific method calls.
Below is a plot of timings of the functions for various M, N of the resulting matrix A@B and the number of "hits" (aligned values) and "misses" (not aligned). It's hard to plot results for 4 parameters. But the first plot has "hits" vs "misses" with each data position increased slightly to the right for N and upward for M. So each value of hits and misses has a pattern of dots showing the N,M dependence. Color of the dot shows which function is faster with intensity showing how much faster.
Predictions for which is faster are made using a formula:
hits > miss | (hits == miss & N == 1)Incorrect predictions have a black circle around the dot on the scatterplot. 88% of the data is predicted correctly. That can be increased to 92% with a more complicated formula (buried in the timing code in this PR along with the data and the 3rd png file shown here).
The second plot uses N and M on the axis, with each value shifted slightly for hits and misses. The patterns are harder for me to see than the first plot. But the data is the same.
Here is the first plot with a more complicated decision criteria: