I was experimenting with Lanczos interpolation (convolution of sinc(x)sinc(x/a)) of 3D data. In my use case I typically have O(n^3) data points and O(n^2) interpolation points. So far a very mediocre CUDA implementation seems to be faster than Interpolations.jl's quadratic b-spline:
https://nextjournal.com/stabbles/cuda-interpolation
I'm sure the CUDA implementation can be improved quite a bit. Would it be possible to run Interpolations.jl's functions on the GPU? I figure the bottleneck is the lu decomp mostly?
I was experimenting with Lanczos interpolation (convolution of
sinc(x)sinc(x/a)) of 3D data. In my use case I typically have O(n^3) data points and O(n^2) interpolation points. So far a very mediocre CUDA implementation seems to be faster than Interpolations.jl's quadratic b-spline:https://nextjournal.com/stabbles/cuda-interpolation
I'm sure the CUDA implementation can be improved quite a bit. Would it be possible to run Interpolations.jl's functions on the GPU? I figure the bottleneck is the lu decomp mostly?