Add p-norm functionality.#688
Add p-norm functionality.#688SomTambe wants to merge 7 commits intoJuliaGPU:masterfrom SomTambe:master
Conversation
| return LinearAlgebra.norm(x) | ||
| end | ||
| if p>2 | ||
| return LinearAlgebra.tr(LinearAlgebra.Diagonal(abs.(x))^p)^(1/p) |
There was a problem hiding this comment.
The problem is that these operations are implemented using CUBLAS, so are themselves restricted to the types CUBLAS supports (plain, basic C types). Ideally we'd have something more generic. What about #84 (comment), is that not valid or slower?
There was a problem hiding this comment.
Such a generic versions could also go into GPUArrays.jl, while a specialized version for CUBLAS-supported types could live in CUDA.jl.
There was a problem hiding this comment.
@maleadt Could you elaborate on the types CUBLAS does not support?
There was a problem hiding this comment.
mapreduce is supported by CUDA.jl, I don't understand how that would trigger scalar indexing?
If you look at the CUBLAS docs, you'll see it's a C library that only supports a limited set of element types. That's why we have type unions like CUBLASFloat in CUDA.jl
|
@maleadt I have added the tests. Do let me know if there is anything that needs to be changed. |
Codecov Report
@@ Coverage Diff @@
## master #688 +/- ##
==========================================
+ Coverage 77.86% 77.89% +0.03%
==========================================
Files 118 118
Lines 7119 7126 +7
==========================================
+ Hits 5543 5551 +8
+ Misses 1576 1575 -1
Continue to review full report at Codecov.
|
|
@maleadt I tried using the And this is the output I have received - julia> norm(a,3)
ERROR: LLVM error: Cannot select: 0x66598b50: f32 = fpow 0x6658aba8, 0x6658a8d0, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
0x6658aba8: f32,ch = CopyFromReg 0x65e04818, Register:f32 %40, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
0x6dca6988: f32 = Register %40
0x6658a8d0: f32,ch = CopyFromReg 0x65e04818, Register:f32 %19, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
0x6658aee8: f32 = Register %19
In function: _Z33julia_partial_mapreduce_grid_44259_identity2__7Float3216CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE4_1_2IS5_ES3_IS7_IS1_Li1ELi1EEEE
Stacktrace:
[1] handle_error(::Cstring) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\core\context.jl:105
[2] macro expansion at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\util.jl:114 [inlined]
[3] LLVMTargetMachineEmitToMemoryBuffer(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType, ::Base.RefValue{Cstring}, ::Base.RefValue{Ptr{LLVM.API.LLVMOpaqueMemoryBuffer}}) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\lib\libLLVM_h.jl:3612
[4] emit(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\targetmachine.jl:44
[5] mcgen(::GPUCompiler.CompilerJob, ::LLVM.Module, ::LLVM.Function, ::LLVM.API.LLVMCodeGenFileType) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\mcgen.jl:74
[6] macro expansion at C:\Users\tambe\.julia\packages\TimerOutputs\ZmKD7\src\TimerOutput.jl:206 [inlined]
[7] macro expansion at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:252 [inlined]
[8] macro expansion at C:\Users\tambe\.julia\packages\TimerOutputs\ZmKD7\src\TimerOutput.jl:206 [inlined]
[9] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:248
[10] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:39
[11] compile at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:35 [inlined]
[12] cufunction_compile(::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:302
[13] cufunction_compile(::GPUCompiler.FunctionSpec) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:297
[14] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{typeof(CUDA.partial_mapreduce_grid),Tuple{typeof(identity),typeof(+),Float32,CartesianIndices{1,Tuple{Base.OneTo{Int64}}},CartesianIndices{1,Tuple{Base.OneTo{Int64}}},Val{true},CuDeviceArray{Float32,2,1},Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuDeviceArray{Float32,1,1}}}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\cache.jl:40
[15] partial_mapreduce_grid at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:87 [inlined]
[16] cached_compilation at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\cache.jl:65 [inlined]
[17] cufunction(::typeof(CUDA.partial_mapreduce_grid), ::Type{Tuple{typeof(identity),typeof(+),Float32,CartesianIndices{1,Tuple{Base.OneTo{Int64}}},CartesianIndices{1,Tuple{Base.OneTo{Int64}}},Val{true},CuDeviceArray{Float32,2,1},Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuDeviceArray{Float32,1,1}}}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:289
[18] cufunction at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:286 [inlined]
[19] macro expansion at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:100 [inlined]
[20] mapreducedim!(::typeof(identity), ::typeof(+), ::CuArray{Float32,1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuArray{Float32,1}}}; init::Float32) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:192
[21] _mapreduce(::var"#1#2"{Int64}, ::typeof(+), ::CuArray{Float32,1}; dims::Colon, init::Float32) at C:\Users\tambe\.julia\packages\GPUArrays\WV76E\src\host\mapreduce.jl:62
[22] #mapreduce#15 at C:\Users\tambe\.julia\packages\GPUArrays\WV76E\src\host\mapreduce.jl:28 [inlined]
[23] norm(::CuArray{Float32,1}, ::Int64) at .\REPL[9]:1
[24] top-level scope at REPL[10]:1Seems like a compiler-level error since LLVM is being mentioned in the error statements. How do I tackle this? Will my existing implementation work? PS: Sorry for such a late reply! I got occupied with some other work, and then I had my exams 😢 |
|
@maleadt Sounds right, I shall then wait for it to get merged. Will rebase the changes I have made then ;). |
|
Maybe I should start work on this now that the PR has been merged. |
What does this PR do?
norm(x::CuArray, p::Integer)I will add the coming modifications over here.
Another aspect I should point out, it still does not work for
norm(x::CuArray, p::Inf), gives us the same old scalar getindex error. Might have to think of something much better for this, becauseLinearAlgebra.opnorm()also has not been implemented forCUDA.jl.