Support Julia 1.13 by eschnett · Pull Request #3020 · JuliaGPU/CUDA.jl

eschnett · 2026-01-19T16:52:19Z

Closes #3019.

github-actions

CUDA.jl Benchmarks

Details

Benchmark suite	Current: `f90472d`	Previous: `9c24e73`	Ratio
`latency/precompile`	`43977295720.5` ns	`44501126522` ns	`0.99`
`latency/ttfp`	`13145443593` ns	`13149867592` ns	`1.00`
`latency/import`	`3766140874` ns	`3767927411.5` ns	`1.00`
`integration/volumerhs`	`9437395` ns	`9440140.5` ns	`1.00`
`integration/byval/slices=1`	`146003` ns	`145804` ns	`1.00`
`integration/byval/slices=3`	`423230` ns	`422996` ns	`1.00`
`integration/byval/reference`	`144129` ns	`143875` ns	`1.00`
`integration/byval/slices=2`	`284679` ns	`284373` ns	`1.00`
`integration/cudadevrt`	`102728` ns	`102603` ns	`1.00`
`kernel/indexing`	`13691` ns	`13604` ns	`1.01`
`kernel/indexing_checked`	`14437` ns	`14041.5` ns	`1.03`
`kernel/occupancy`	`634.7218934911242` ns	`654.3878787878788` ns	`0.97`
`kernel/launch`	`2272.8888888888887` ns	`2065.9` ns	`1.10`
`kernel/rand`	`17457` ns	`14529` ns	`1.20`
`array/reverse/1d`	`19014` ns	`18833` ns	`1.01`
`array/reverse/2dL_inplace`	`66383` ns	`66297` ns	`1.00`
`array/reverse/1dL`	`69213` ns	`69017` ns	`1.00`
`array/reverse/2d`	`20940` ns	`21208` ns	`0.99`
`array/reverse/1d_inplace`	`9019` ns	`8801` ns	`1.02`
`array/reverse/2d_inplace`	`10640` ns	`10457` ns	`1.02`
`array/reverse/2dL`	`72945` ns	`73233` ns	`1.00`
`array/reverse/1dL_inplace`	`66337` ns	`66238` ns	`1.00`
`array/copy`	`18123` ns	`18166` ns	`1.00`
`array/iteration/findall/int`	`145268.5` ns	`146211.5` ns	`0.99`
`array/iteration/findall/bool`	`130269` ns	`130874` ns	`1.00`
`array/iteration/findfirst/int`	`83676` ns	`84566` ns	`0.99`
`array/iteration/findfirst/bool`	`81053` ns	`81494` ns	`0.99`
`array/iteration/scalar`	`67363` ns	`66998` ns	`1.01`
`array/iteration/logical`	`200756.5` ns	`198961` ns	`1.01`
`array/iteration/findmin/1d`	`83353` ns	`84192` ns	`0.99`
`array/iteration/findmin/2d`	`116414` ns	`117391` ns	`0.99`
`array/reductions/reduce/Int64/1d`	`39232` ns	`38940` ns	`1.01`
`array/reductions/reduce/Int64/dims=1`	`42353.5` ns	`42402.5` ns	`1.00`
`array/reductions/reduce/Int64/dims=2`	`58978` ns	`59096.5` ns	`1.00`
`array/reductions/reduce/Int64/dims=1L`	`87227` ns	`87158` ns	`1.00`
`array/reductions/reduce/Int64/dims=2L`	`84412` ns	`84522.5` ns	`1.00`
`array/reductions/reduce/Float32/1d`	`33877` ns	`34365.5` ns	`0.99`
`array/reductions/reduce/Float32/dims=1`	`39302` ns	`49003` ns	`0.80`
`array/reductions/reduce/Float32/dims=2`	`56362` ns	`56392.5` ns	`1.00`
`array/reductions/reduce/Float32/dims=1L`	`51579` ns	`51750` ns	`1.00`
`array/reductions/reduce/Float32/dims=2L`	`69583` ns	`70137.5` ns	`0.99`
`array/reductions/mapreduce/Int64/1d`	`38982` ns	`38678` ns	`1.01`
`array/reductions/mapreduce/Int64/dims=1`	`51400` ns	`49417` ns	`1.04`
`array/reductions/mapreduce/Int64/dims=2`	`58902` ns	`59199` ns	`0.99`
`array/reductions/mapreduce/Int64/dims=1L`	`87221` ns	`87193` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2L`	`84498` ns	`84515` ns	`1.00`
`array/reductions/mapreduce/Float32/1d`	`33969` ns	`34022` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=1`	`39127.5` ns	`39691.5` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=2`	`56116` ns	`55971` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=1L`	`51476` ns	`51491` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2L`	`68985.5` ns	`68932` ns	`1.00`
`array/broadcast`	`20849` ns	`20437` ns	`1.02`
`array/copyto!/gpu_to_gpu`	`10632` ns	`10680.333333333334` ns	`1.00`
`array/copyto!/cpu_to_gpu`	`212916` ns	`217910` ns	`0.98`
`array/copyto!/gpu_to_cpu`	`282303` ns	`285564` ns	`0.99`
`array/accumulate/Int64/1d`	`118120.5` ns	`118803` ns	`0.99`
`array/accumulate/Int64/dims=1`	`79213` ns	`79869.5` ns	`0.99`
`array/accumulate/Int64/dims=2`	`155247` ns	`155687` ns	`1.00`
`array/accumulate/Int64/dims=1L`	`1705262` ns	`1694846` ns	`1.01`
`array/accumulate/Int64/dims=2L`	`960497` ns	`961414` ns	`1.00`
`array/accumulate/Float32/1d`	`100730` ns	`101291.5` ns	`0.99`
`array/accumulate/Float32/dims=1`	`76297` ns	`76639` ns	`1.00`
`array/accumulate/Float32/dims=2`	`143951` ns	`144870` ns	`0.99`
`array/accumulate/Float32/dims=1L`	`1584441.5` ns	`1584515.5` ns	`1.00`
`array/accumulate/Float32/dims=2L`	`656314` ns	`657191.5` ns	`1.00`
`array/construct`	`1272.3` ns	`1305.2` ns	`0.97`
`array/random/randn/Float32`	`42774` ns	`37110.5` ns	`1.15`
`array/random/randn!/Float32`	`29702` ns	`30378` ns	`0.98`
`array/random/rand!/Int64`	`34748` ns	`31479` ns	`1.10`
`array/random/rand!/Float32`	`8185.25` ns	`8318` ns	`0.98`
`array/random/rand/Int64`	`37159.5` ns	`31553` ns	`1.18`
`array/random/rand/Float32`	`12276` ns	`12376` ns	`0.99`
`array/permutedims/4d`	`54725.5` ns	`51624` ns	`1.06`
`array/permutedims/2d`	`52225` ns	`52729` ns	`0.99`
`array/permutedims/3d`	`52803` ns	`52893` ns	`1.00`
`array/sorting/1d`	`2734309.5` ns	`2734663` ns	`1.00`
`array/sorting/by`	`3303053` ns	`3304625` ns	`1.00`
`array/sorting/2d`	`1067014` ns	`1068662` ns	`1.00`
`cuda/synchronization/stream/auto`	`994.5` ns	`1017.4166666666666` ns	`0.98`
`cuda/synchronization/stream/nonblocking`	`7503.5` ns	`7603.1` ns	`0.99`
`cuda/synchronization/stream/blocking`	`810.6881720430108` ns	`808.4591836734694` ns	`1.00`
`cuda/synchronization/context/auto`	`1151.9` ns	`1174.2` ns	`0.98`
`cuda/synchronization/context/nonblocking`	`7846.4` ns	`7803.2` ns	`1.01`
`cuda/synchronization/context/blocking`	`900.9183673469388` ns	`886.6666666666666` ns	`1.02`

This comment was automatically generated by workflow using github-action-benchmark.

eschnett · 2026-01-19T18:54:02Z

The self-tests fail because the linear algebra functions (e.g. matrix exponential) as implemented in LinearAlgebra use scalar iteration. See e.g. exp! in https://github.com/JuliaLang/LinearAlgebra.jl/blob/f55e4736fb6dce08fee8a7ac7f0aba1f2b54838e/src/dense.jl#L784.

How should this be handled? Rewrite exp!? Find a respective CUDA library function to call and add a new method to exp? Fall back to the Julia 1.12 implementation? How does this work in Julia 1.12?

eschnett · 2026-01-19T20:03:54Z

I think it's JuliaGPU/GPUArrays.jl#679.

eschnett · 2026-02-03T13:02:14Z

The buildkite error is

  ptxas /tmp/jl_PALmvKnqta.ptx, line 226; error   : Modifier '.NaN' requires .target sm_80 or higher
  ptxas /tmp/jl_PALmvKnqta.ptx, line 226; error   : Feature 'min.f16 or min.f16x2' requires .target sm_80 or higher

This seems unrelated to my changes, except that I am now running CI tests on Julia 1.12 and Julia 1.13...

maleadt · 2026-02-04T10:42:10Z

I guess #3025 needs to be active for all LLVM versions.

eschnett · 2026-02-04T14:47:47Z

Good news: CUDA.jl now works for Julia 1.12.
Bad news: There's an LLVM segfault for Julia 1.13.

�_bk;t=1770145810814�      From worker 5:	[271397] signal 11 (1): Segmentation fault
�_bk;t=1770145810814�      From worker 5:	in expression starting at /var/lib/buildkite-agent/builds/gpuci-9/julialang/cuda-dot-jl/test/base/texture.jl:41
�_bk;t=1770145810920�      From worker 5:	_ZN12_GLOBAL__N_124NVPTXReplaceImageHandles18findIndexForHandleERN4llvm14MachineOperandERNS1_15MachineFunctionERj.isra.0 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810920�      From worker 5:	_ZN12_GLOBAL__N_124NVPTXReplaceImageHandles18findIndexForHandleERN4llvm14MachineOperandERNS1_15MachineFunctionERj.isra.0 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810920�      From worker 5:	_ZN12_GLOBAL__N_124NVPTXReplaceImageHandles20runOnMachineFunctionERN4llvm15MachineFunctionE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810921�      From worker 5:	_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE.part.0 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810921�      From worker 5:	_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810921�      From worker 5:	_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810922�      From worker 5:	_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810922�      From worker 5:	_ZL21LLVMTargetMachineEmitP23LLVMOpaqueTargetMachineP16LLVMOpaqueModuleRN4llvm17raw_pwrite_streamE19LLVMCodeGenFileTypePPc at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)

eschnett · 2026-02-04T15:57:50Z

I think it's texture interpolation that is broken on 1.13. This line segfaults LLVM:

dst[i] = texture[u]

in test/base/texture.jl (function kernel_texture_warp_native).

eschnett · 2026-02-04T17:35:53Z

We will need to update KernelAbstractions.jl as well JuliaGPU/KernelAbstractions.jl#679.

.buildkite/pipeline.yml

codecov · 2026-02-13T19:41:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.48%. Comparing base (9c24e73) to head (f90472d).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3020      +/-   ##
==========================================
+ Coverage   89.46%   89.48%   +0.01%     
==========================================
  Files         148      148              
  Lines       13047    13039       -8     
==========================================
- Hits        11673    11668       -5     
+ Misses       1374     1371       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

eschnett · 2026-02-13T20:02:30Z

All green!

eschnett · 2026-02-18T14:24:30Z

It seems this PR has stalled. Can I do something to get a review or to get it merged?

eschnett · 2026-02-26T14:54:28Z

Can we merge either this PR, or #3031 instead?

maleadt

I'll look into pushing this over the finish.

maleadt · 2026-03-02T15:02:33Z

.buildkite/pipeline.yml

        matrix:
          setup:
            cuda:
+              # - "13.1"


Why add but not test 13.1?

maleadt · 2026-03-02T15:02:59Z

lib/nvml/libnvml.jl

@@ -2082,58 +2082,58 @@ end
 const nvmlGpuFabricInfoV_t = nvmlGpuFabricInfo_v3_t

 @checked function nvmlInit_v2()
-    @gcsafe_ccall (libnvml()).nvmlInit_v2()::nvmlReturn_t


All these require matching changes in the wrapper generator.

maleadt · 2026-03-02T15:03:21Z

lib/nvml/NVML.jl

+        # NVSMI dir isn't added to PATH by the installer; add it to Julia's DLL search path.
+        nvsmi = joinpath(get(ENV, "ProgramFiles", raw"C:\Program Files"), "NVIDIA Corporation", "NVSMI")
+        if isdir(nvsmi) && !(nvsmi in Libdl.DL_LOAD_PATH)
+            pushfirst!(Libdl.DL_LOAD_PATH, nvsmi)


This is not equivalent. Why can't it be done globally, setting the constant string?

maleadt · 2026-03-02T15:03:31Z

test/core/device/ldg.jl

    ir = sprint(io->CUDA.code_llvm(io, CUDA.pointerref_ldg, Tuple{Core.LLVMPtr{Int,AS.Global},Int,Val{1}}))
-    @test occursin("@llvm.nvvm.ldg", ir)
+    if Base.libllvm_version >= v"20"
+        @test occursin("load i64, ptr addrspace(1)", ir)


This should test for the replacement pattern, which is an !invariant.load

maleadt · 2026-03-02T15:04:21Z

test/base/texture.jl

 using Interpolations

+# Texture interpolation crashes LLVM in Julia 1.13
+VERSION < v"1.13-" && @testset "texture" begin


maleadt · 2026-03-02T15:04:21Z

test/base/texture.jl

 using Interpolations

+# Texture interpolation crashes LLVM in Julia 1.13
+VERSION < v"1.13-" && @testset "texture" begin


Co-authored by: Erik Schnetter <schnetter@gmail.com> Co-authored by: KARLO\karlo <karlo.sepetanc@live.com>

KSepetanc · 2026-03-04T07:33:20Z

Could you please edit commit msg. This is my first contribut here. "Co-authored by" should be "Co-authored-by" for Github to recognize.

Co-authored-by: Karlo Sepetanc <karlo.sepetanc@live.com> Co-authored-by: Tim Besard <tim.besard@gmail.com>

maleadt · 2026-03-04T07:37:45Z

bdb6c1d

github-actions bot reviewed Jan 19, 2026

View reviewed changes

christiangnrd reviewed Feb 4, 2026

View reviewed changes

.buildkite/pipeline.yml Show resolved Hide resolved

KSepetanc mentioned this pull request Feb 18, 2026

Support Julia 1.13 with fix for @device_functions macro #3031

Closed

maleadt self-assigned this Mar 2, 2026

maleadt requested changes Mar 2, 2026

View reviewed changes

maleadt reviewed Mar 2, 2026

View reviewed changes

maleadt force-pushed the eschnett/julia-1.13 branch 2 times, most recently from ff4bccd to 55978dc Compare March 3, 2026 09:35

This was referenced Mar 3, 2026

Add Base.min override for Float16 and extend LLVM version guard to v20. #3038

Merged

Fix sytrs! test: verify non-pivoting path directly instead of skipping #3039

Merged

Fixes for Julia 1.13.

f90472d

Co-authored by: Erik Schnetter <schnetter@gmail.com> Co-authored by: KARLO\karlo <karlo.sepetanc@live.com>

maleadt force-pushed the eschnett/julia-1.13 branch from 08c9eb6 to f90472d Compare March 3, 2026 16:43

maleadt merged commit a7d3f8b into JuliaGPU:master Mar 4, 2026
2 of 3 checks passed

maleadt added a commit that referenced this pull request Mar 4, 2026

Fixes for Julia 1.13. (#3020)

bdb6c1d

Co-authored-by: Karlo Sepetanc <karlo.sepetanc@live.com> Co-authored-by: Tim Besard <tim.besard@gmail.com>

Conversation

eschnett commented Jan 19, 2026

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

CUDA.jl Benchmarks

Uh oh!

eschnett commented Jan 19, 2026

Uh oh!

eschnett commented Jan 19, 2026

Uh oh!

eschnett commented Feb 3, 2026

Uh oh!

maleadt commented Feb 4, 2026

Uh oh!

eschnett commented Feb 4, 2026

Uh oh!

eschnett commented Feb 4, 2026

Uh oh!

eschnett commented Feb 4, 2026

Uh oh!

Uh oh!

codecov bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eschnett commented Feb 13, 2026

Uh oh!

eschnett commented Feb 18, 2026

Uh oh!

eschnett commented Feb 26, 2026

Uh oh!

maleadt left a comment

Choose a reason for hiding this comment

Uh oh!

maleadt Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

maleadt Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

maleadt Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

maleadt Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maleadt Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

maleadt Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KSepetanc commented Mar 4, 2026

Uh oh!

maleadt commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot left a comment •

edited

Loading

codecov bot commented Feb 13, 2026 •

edited

Loading

maleadt Mar 2, 2026 •

edited

Loading