Skip to content

Support Julia 1.13 with fix for @device_functions macro#3031

Closed
KSepetanc wants to merge 18 commits intoJuliaGPU:masterfrom
KSepetanc:eschnett/julia-1.13
Closed

Support Julia 1.13 with fix for @device_functions macro#3031
KSepetanc wants to merge 18 commits intoJuliaGPU:masterfrom
KSepetanc:eschnett/julia-1.13

Conversation

@KSepetanc
Copy link
Contributor

Closes #3019.

@eschnett asked me to create a new duplicate PR #3020 of his, but with fix for macro @device_functions. He couldn't test if the fix works as I made PR on his fork that does not have CI infrastructure.

@eschnett
Copy link
Contributor

Well, I didn't really ask for a duplicate PR. I suggested to either merge them into CUDA.jl as two separate, sequential PRs, or – if you want to merge them as a single PR into CUDA.jl – create such a PR. I don't care either way; using two separate PRs seems simpler, but I leave that choice up to you.

@KSepetanc
Copy link
Contributor Author

KSepetanc commented Feb 18, 2026

The way I see it, this is the second option, i.e. single PR with both changes.

@device_functions macro is broken due to changes in 1.13 so it makes sense to include it in PR for 1.13 support.

@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.35%. Comparing base (7a27d77) to head (aa08bb6).

Files with missing lines Patch % Lines
lib/nvml/NVML.jl 50.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3031      +/-   ##
==========================================
- Coverage   89.46%   89.35%   -0.12%     
==========================================
  Files         148      148              
  Lines       13047    13044       -3     
==========================================
- Hits        11673    11655      -18     
- Misses       1374     1389      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: b1e9b57 Previous: 1810b7a Ratio
latency/precompile 44328349517 ns 44300180944.5 ns 1.00
latency/ttfp 13141566320 ns 13138137112 ns 1.00
latency/import 3768894174.5 ns 3757487166.5 ns 1.00
integration/volumerhs 9442367.5 ns 9441754.5 ns 1.00
integration/byval/slices=1 145610 ns 145846 ns 1.00
integration/byval/slices=3 422716 ns 423265 ns 1.00
integration/byval/reference 143823.5 ns 143916 ns 1.00
integration/byval/slices=2 284142 ns 284641 ns 1.00
integration/cudadevrt 102447 ns 102633 ns 1.00
kernel/indexing 13360 ns 13466 ns 0.99
kernel/indexing_checked 14152 ns 13982 ns 1.01
kernel/occupancy 649.422619047619 ns 699.625850340136 ns 0.93
kernel/launch 2059.9 ns 2067.8 ns 1.00
kernel/rand 16211 ns 16244 ns 1.00
array/reverse/1d 18777 ns 18605 ns 1.01
array/reverse/2dL_inplace 66066 ns 66133 ns 1.00
array/reverse/1dL 69078 ns 68870 ns 1.00
array/reverse/2d 20795 ns 20781 ns 1.00
array/reverse/1d_inplace 10525.166666666668 ns 10493.666666666666 ns 1.00
array/reverse/2d_inplace 10614 ns 10765 ns 0.99
array/reverse/2dL 72846 ns 72777.5 ns 1.00
array/reverse/1dL_inplace 66097 ns 66166 ns 1.00
array/copy 18360.5 ns 18321 ns 1.00
array/iteration/findall/int 145080 ns 145251 ns 1.00
array/iteration/findall/bool 130258 ns 130303 ns 1.00
array/iteration/findfirst/int 82889 ns 83996 ns 0.99
array/iteration/findfirst/bool 80606 ns 81209 ns 0.99
array/iteration/scalar 66588 ns 64953 ns 1.03
array/iteration/logical 194180.5 ns 197334 ns 0.98
array/iteration/findmin/1d 83560.5 ns 85667.5 ns 0.98
array/iteration/findmin/2d 116518 ns 117130 ns 0.99
array/reductions/reduce/Int64/1d 39034 ns 38913 ns 1.00
array/reductions/reduce/Int64/dims=1 41876.5 ns 41855 ns 1.00
array/reductions/reduce/Int64/dims=2 58784 ns 59043 ns 1.00
array/reductions/reduce/Int64/dims=1L 86987 ns 87102 ns 1.00
array/reductions/reduce/Int64/dims=2L 84033 ns 84295 ns 1.00
array/reductions/reduce/Float32/1d 33717 ns 33785 ns 1.00
array/reductions/reduce/Float32/dims=1 39187 ns 48986 ns 0.80
array/reductions/reduce/Float32/dims=2 56332 ns 56655 ns 0.99
array/reductions/reduce/Float32/dims=1L 51325 ns 51438 ns 1.00
array/reductions/reduce/Float32/dims=2L 69258 ns 69460.5 ns 1.00
array/reductions/mapreduce/Int64/1d 39257.5 ns 38699 ns 1.01
array/reductions/mapreduce/Int64/dims=1 51465.5 ns 41686 ns 1.23
array/reductions/mapreduce/Int64/dims=2 58854 ns 58974 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 87070 ns 87184 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 84235 ns 84571 ns 1.00
array/reductions/mapreduce/Float32/1d 33106 ns 33512 ns 0.99
array/reductions/mapreduce/Float32/dims=1 39493.5 ns 47745 ns 0.83
array/reductions/mapreduce/Float32/dims=2 55782 ns 56241 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 51188 ns 51435 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 68300.5 ns 69604 ns 0.98
array/broadcast 20139 ns 20361 ns 0.99
array/copyto!/gpu_to_gpu 10570.666666666666 ns 10601.666666666666 ns 1.00
array/copyto!/cpu_to_gpu 213494 ns 214964 ns 0.99
array/copyto!/gpu_to_cpu 281531 ns 282717 ns 1.00
array/accumulate/Int64/1d 118212 ns 118054 ns 1.00
array/accumulate/Int64/dims=1 79130 ns 78929 ns 1.00
array/accumulate/Int64/dims=2 155192 ns 155861 ns 1.00
array/accumulate/Int64/dims=1L 1705618 ns 1705368 ns 1.00
array/accumulate/Int64/dims=2L 960326 ns 960330.5 ns 1.00
array/accumulate/Float32/1d 100301.5 ns 100426 ns 1.00
array/accumulate/Float32/dims=1 75752 ns 75943 ns 1.00
array/accumulate/Float32/dims=2 143887.5 ns 143974 ns 1.00
array/accumulate/Float32/dims=1L 1583879 ns 1584300 ns 1.00
array/accumulate/Float32/dims=2L 658657 ns 658063 ns 1.00
array/construct 1268.2 ns 1252.6 ns 1.01
array/random/randn/Float32 35400 ns 35435 ns 1.00
array/random/randn!/Float32 29917 ns 29972 ns 1.00
array/random/rand!/Int64 32291 ns 28260 ns 1.14
array/random/rand!/Float32 8209.333333333334 ns 8310 ns 0.99
array/random/rand/Int64 29169.5 ns 29927 ns 0.97
array/random/rand/Float32 12376.5 ns 12324 ns 1.00
array/permutedims/4d 51508 ns 51686 ns 1.00
array/permutedims/2d 52234.5 ns 52279 ns 1.00
array/permutedims/3d 52585 ns 52911 ns 0.99
array/sorting/1d 2733948 ns 2735042.5 ns 1.00
array/sorting/by 3303469 ns 3304486.5 ns 1.00
array/sorting/2d 1066285 ns 1066581 ns 1.00
cuda/synchronization/stream/auto 1039.7272727272727 ns 993.5882352941177 ns 1.05
cuda/synchronization/stream/nonblocking 7625.1 ns 7392.700000000001 ns 1.03
cuda/synchronization/stream/blocking 783.9074074074074 ns 811.8282828282828 ns 0.97
cuda/synchronization/context/auto 1145.4 ns 1160.9 ns 0.99
cuda/synchronization/context/nonblocking 7973.1 ns 7875.6 ns 1.01
cuda/synchronization/context/blocking 881.3770491803278 ns 899.7058823529412 ns 0.98

This comment was automatically generated by workflow using github-action-benchmark.

@eschnett eschnett mentioned this pull request Feb 26, 2026
@maleadt
Copy link
Member

maleadt commented Mar 2, 2026

I'll fold this into #3020.

@maleadt maleadt closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot load CUDA.jl with Julia 1.13

3 participants