Skip to content

Support Julia 1.13#3020

Merged
maleadt merged 1 commit intoJuliaGPU:masterfrom
eschnett:eschnett/julia-1.13
Mar 4, 2026
Merged

Support Julia 1.13#3020
maleadt merged 1 commit intoJuliaGPU:masterfrom
eschnett:eschnett/julia-1.13

Conversation

@eschnett
Copy link
Contributor

Closes #3019.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: f90472d Previous: 9c24e73 Ratio
latency/precompile 43977295720.5 ns 44501126522 ns 0.99
latency/ttfp 13145443593 ns 13149867592 ns 1.00
latency/import 3766140874 ns 3767927411.5 ns 1.00
integration/volumerhs 9437395 ns 9440140.5 ns 1.00
integration/byval/slices=1 146003 ns 145804 ns 1.00
integration/byval/slices=3 423230 ns 422996 ns 1.00
integration/byval/reference 144129 ns 143875 ns 1.00
integration/byval/slices=2 284679 ns 284373 ns 1.00
integration/cudadevrt 102728 ns 102603 ns 1.00
kernel/indexing 13691 ns 13604 ns 1.01
kernel/indexing_checked 14437 ns 14041.5 ns 1.03
kernel/occupancy 634.7218934911242 ns 654.3878787878788 ns 0.97
kernel/launch 2272.8888888888887 ns 2065.9 ns 1.10
kernel/rand 17457 ns 14529 ns 1.20
array/reverse/1d 19014 ns 18833 ns 1.01
array/reverse/2dL_inplace 66383 ns 66297 ns 1.00
array/reverse/1dL 69213 ns 69017 ns 1.00
array/reverse/2d 20940 ns 21208 ns 0.99
array/reverse/1d_inplace 9019 ns 8801 ns 1.02
array/reverse/2d_inplace 10640 ns 10457 ns 1.02
array/reverse/2dL 72945 ns 73233 ns 1.00
array/reverse/1dL_inplace 66337 ns 66238 ns 1.00
array/copy 18123 ns 18166 ns 1.00
array/iteration/findall/int 145268.5 ns 146211.5 ns 0.99
array/iteration/findall/bool 130269 ns 130874 ns 1.00
array/iteration/findfirst/int 83676 ns 84566 ns 0.99
array/iteration/findfirst/bool 81053 ns 81494 ns 0.99
array/iteration/scalar 67363 ns 66998 ns 1.01
array/iteration/logical 200756.5 ns 198961 ns 1.01
array/iteration/findmin/1d 83353 ns 84192 ns 0.99
array/iteration/findmin/2d 116414 ns 117391 ns 0.99
array/reductions/reduce/Int64/1d 39232 ns 38940 ns 1.01
array/reductions/reduce/Int64/dims=1 42353.5 ns 42402.5 ns 1.00
array/reductions/reduce/Int64/dims=2 58978 ns 59096.5 ns 1.00
array/reductions/reduce/Int64/dims=1L 87227 ns 87158 ns 1.00
array/reductions/reduce/Int64/dims=2L 84412 ns 84522.5 ns 1.00
array/reductions/reduce/Float32/1d 33877 ns 34365.5 ns 0.99
array/reductions/reduce/Float32/dims=1 39302 ns 49003 ns 0.80
array/reductions/reduce/Float32/dims=2 56362 ns 56392.5 ns 1.00
array/reductions/reduce/Float32/dims=1L 51579 ns 51750 ns 1.00
array/reductions/reduce/Float32/dims=2L 69583 ns 70137.5 ns 0.99
array/reductions/mapreduce/Int64/1d 38982 ns 38678 ns 1.01
array/reductions/mapreduce/Int64/dims=1 51400 ns 49417 ns 1.04
array/reductions/mapreduce/Int64/dims=2 58902 ns 59199 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 87221 ns 87193 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 84498 ns 84515 ns 1.00
array/reductions/mapreduce/Float32/1d 33969 ns 34022 ns 1.00
array/reductions/mapreduce/Float32/dims=1 39127.5 ns 39691.5 ns 0.99
array/reductions/mapreduce/Float32/dims=2 56116 ns 55971 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 51476 ns 51491 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 68985.5 ns 68932 ns 1.00
array/broadcast 20849 ns 20437 ns 1.02
array/copyto!/gpu_to_gpu 10632 ns 10680.333333333334 ns 1.00
array/copyto!/cpu_to_gpu 212916 ns 217910 ns 0.98
array/copyto!/gpu_to_cpu 282303 ns 285564 ns 0.99
array/accumulate/Int64/1d 118120.5 ns 118803 ns 0.99
array/accumulate/Int64/dims=1 79213 ns 79869.5 ns 0.99
array/accumulate/Int64/dims=2 155247 ns 155687 ns 1.00
array/accumulate/Int64/dims=1L 1705262 ns 1694846 ns 1.01
array/accumulate/Int64/dims=2L 960497 ns 961414 ns 1.00
array/accumulate/Float32/1d 100730 ns 101291.5 ns 0.99
array/accumulate/Float32/dims=1 76297 ns 76639 ns 1.00
array/accumulate/Float32/dims=2 143951 ns 144870 ns 0.99
array/accumulate/Float32/dims=1L 1584441.5 ns 1584515.5 ns 1.00
array/accumulate/Float32/dims=2L 656314 ns 657191.5 ns 1.00
array/construct 1272.3 ns 1305.2 ns 0.97
array/random/randn/Float32 42774 ns 37110.5 ns 1.15
array/random/randn!/Float32 29702 ns 30378 ns 0.98
array/random/rand!/Int64 34748 ns 31479 ns 1.10
array/random/rand!/Float32 8185.25 ns 8318 ns 0.98
array/random/rand/Int64 37159.5 ns 31553 ns 1.18
array/random/rand/Float32 12276 ns 12376 ns 0.99
array/permutedims/4d 54725.5 ns 51624 ns 1.06
array/permutedims/2d 52225 ns 52729 ns 0.99
array/permutedims/3d 52803 ns 52893 ns 1.00
array/sorting/1d 2734309.5 ns 2734663 ns 1.00
array/sorting/by 3303053 ns 3304625 ns 1.00
array/sorting/2d 1067014 ns 1068662 ns 1.00
cuda/synchronization/stream/auto 994.5 ns 1017.4166666666666 ns 0.98
cuda/synchronization/stream/nonblocking 7503.5 ns 7603.1 ns 0.99
cuda/synchronization/stream/blocking 810.6881720430108 ns 808.4591836734694 ns 1.00
cuda/synchronization/context/auto 1151.9 ns 1174.2 ns 0.98
cuda/synchronization/context/nonblocking 7846.4 ns 7803.2 ns 1.01
cuda/synchronization/context/blocking 900.9183673469388 ns 886.6666666666666 ns 1.02

This comment was automatically generated by workflow using github-action-benchmark.

@eschnett
Copy link
Contributor Author

The self-tests fail because the linear algebra functions (e.g. matrix exponential) as implemented in LinearAlgebra use scalar iteration. See e.g. exp! in https://github.com/JuliaLang/LinearAlgebra.jl/blob/f55e4736fb6dce08fee8a7ac7f0aba1f2b54838e/src/dense.jl#L784.

How should this be handled? Rewrite exp!? Find a respective CUDA library function to call and add a new method to exp? Fall back to the Julia 1.12 implementation? How does this work in Julia 1.12?

@eschnett
Copy link
Contributor Author

I think it's JuliaGPU/GPUArrays.jl#679.

@eschnett
Copy link
Contributor Author

eschnett commented Feb 3, 2026

The buildkite error is

  ptxas /tmp/jl_PALmvKnqta.ptx, line 226; error   : Modifier '.NaN' requires .target sm_80 or higher
  ptxas /tmp/jl_PALmvKnqta.ptx, line 226; error   : Feature 'min.f16 or min.f16x2' requires .target sm_80 or higher

This seems unrelated to my changes, except that I am now running CI tests on Julia 1.12 and Julia 1.13...

@maleadt
Copy link
Member

maleadt commented Feb 4, 2026

I guess #3025 needs to be active for all LLVM versions.

@eschnett
Copy link
Contributor Author

eschnett commented Feb 4, 2026

Good news: CUDA.jl now works for Julia 1.12.
Bad news: There's an LLVM segfault for Julia 1.13.

�_bk;t=1770145810814�      From worker 5:	[271397] signal 11 (1): Segmentation fault
�_bk;t=1770145810814�      From worker 5:	in expression starting at /var/lib/buildkite-agent/builds/gpuci-9/julialang/cuda-dot-jl/test/base/texture.jl:41
�_bk;t=1770145810920�      From worker 5:	_ZN12_GLOBAL__N_124NVPTXReplaceImageHandles18findIndexForHandleERN4llvm14MachineOperandERNS1_15MachineFunctionERj.isra.0 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810920�      From worker 5:	_ZN12_GLOBAL__N_124NVPTXReplaceImageHandles18findIndexForHandleERN4llvm14MachineOperandERNS1_15MachineFunctionERj.isra.0 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810920�      From worker 5:	_ZN12_GLOBAL__N_124NVPTXReplaceImageHandles20runOnMachineFunctionERN4llvm15MachineFunctionE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810921�      From worker 5:	_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE.part.0 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810921�      From worker 5:	_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810921�      From worker 5:	_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810922�      From worker 5:	_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810922�      From worker 5:	_ZL21LLVMTargetMachineEmitP23LLVMOpaqueTargetMachineP16LLVMOpaqueModuleRN4llvm17raw_pwrite_streamE19LLVMCodeGenFileTypePPc at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)

@eschnett
Copy link
Contributor Author

eschnett commented Feb 4, 2026

I think it's texture interpolation that is broken on 1.13. This line segfaults LLVM:

dst[i] = texture[u]

in test/base/texture.jl (function kernel_texture_warp_native).

@eschnett
Copy link
Contributor Author

eschnett commented Feb 4, 2026

We will need to update KernelAbstractions.jl as well JuliaGPU/KernelAbstractions.jl#679.

@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.48%. Comparing base (9c24e73) to head (f90472d).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3020      +/-   ##
==========================================
+ Coverage   89.46%   89.48%   +0.01%     
==========================================
  Files         148      148              
  Lines       13047    13039       -8     
==========================================
- Hits        11673    11668       -5     
+ Misses       1374     1371       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@eschnett
Copy link
Contributor Author

All green!

@eschnett
Copy link
Contributor Author

It seems this PR has stalled. Can I do something to get a review or to get it merged?

@eschnett
Copy link
Contributor Author

Can we merge either this PR, or #3031 instead?

@maleadt maleadt self-assigned this Mar 2, 2026
Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into pushing this over the finish.

matrix:
setup:
cuda:
# - "13.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add but not test 13.1?

@@ -2082,58 +2082,58 @@ end
const nvmlGpuFabricInfoV_t = nvmlGpuFabricInfo_v3_t

@checked function nvmlInit_v2()
@gcsafe_ccall (libnvml()).nvmlInit_v2()::nvmlReturn_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these require matching changes in the wrapper generator.

lib/nvml/NVML.jl Outdated
Comment on lines +20 to +23
# NVSMI dir isn't added to PATH by the installer; add it to Julia's DLL search path.
nvsmi = joinpath(get(ENV, "ProgramFiles", raw"C:\Program Files"), "NVIDIA Corporation", "NVSMI")
if isdir(nvsmi) && !(nvsmi in Libdl.DL_LOAD_PATH)
pushfirst!(Libdl.DL_LOAD_PATH, nvsmi)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not equivalent. Why can't it be done globally, setting the constant string?

ir = sprint(io->CUDA.code_llvm(io, CUDA.pointerref_ldg, Tuple{Core.LLVMPtr{Int,AS.Global},Int,Val{1}}))
@test occursin("@llvm.nvvm.ldg", ir)
if Base.libllvm_version >= v"20"
@test occursin("load i64, ptr addrspace(1)", ir)
Copy link
Member

@maleadt maleadt Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should test for the replacement pattern, which is an !invariant.load

using Interpolations

# Texture interpolation crashes LLVM in Julia 1.13
VERSION < v"1.13-" && @testset "texture" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using Interpolations

# Texture interpolation crashes LLVM in Julia 1.13
VERSION < v"1.13-" && @testset "texture" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored by: Erik Schnetter <schnetter@gmail.com>
Co-authored by: KARLO\karlo <karlo.sepetanc@live.com>
@maleadt maleadt force-pushed the eschnett/julia-1.13 branch from 08c9eb6 to f90472d Compare March 3, 2026 16:43
@maleadt maleadt merged commit a7d3f8b into JuliaGPU:master Mar 4, 2026
2 of 3 checks passed
@KSepetanc
Copy link
Contributor

Could you please edit commit msg. This is my first contribut here. "Co-authored by" should be "Co-authored-by" for Github to recognize.

maleadt added a commit that referenced this pull request Mar 4, 2026
Co-authored-by: Karlo Sepetanc <karlo.sepetanc@live.com>
Co-authored-by: Tim Besard <tim.besard@gmail.com>
@maleadt
Copy link
Member

maleadt commented Mar 4, 2026

bdb6c1d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot load CUDA.jl with Julia 1.13

4 participants