Skip to content

Add Base.min override for Float16 and extend LLVM version guard to v20.#3038

Merged
maleadt merged 1 commit intomasterfrom
tb/llvm_minmax
Mar 3, 2026
Merged

Add Base.min override for Float16 and extend LLVM version guard to v20.#3038
maleadt merged 1 commit intomasterfrom
tb/llvm_minmax

Conversation

@maleadt
Copy link
Member

@maleadt maleadt commented Mar 3, 2026

LLVM 20 lowers Base.min(::Float16, ::Float16) to min.NaN.f16, a PTX instruction requiring sm_80+, causing failures on Turing (sm_75) GPUs. Add a Julia-level override matching the existing Base.max workaround, and extend the version guard from LLVM 18 to 20 since the upstream fix (llvm/llvm-project@6f318d47) only landed in LLVM 21.

As observed in #3020

LLVM 20 lowers Base.min(::Float16, ::Float16) to min.NaN.f16, a PTX
instruction requiring sm_80+, causing failures on Turing (sm_75) GPUs.
Add a Julia-level override matching the existing Base.max workaround,
and extend the version guard from LLVM 18 to 20 since the upstream fix
(llvm/llvm-project@6f318d47) only landed in LLVM 21.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.33%. Comparing base (1810b7a) to head (de9be6a).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3038      +/-   ##
==========================================
- Coverage   89.49%   89.33%   -0.17%     
==========================================
  Files         148      148              
  Lines       13047    13047              
==========================================
- Hits        11676    11655      -21     
- Misses       1371     1392      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: de9be6a Previous: 1810b7a Ratio
latency/precompile 44675226940.5 ns 44300180944.5 ns 1.01
latency/ttfp 13291171512 ns 13138137112 ns 1.01
latency/import 3784128603 ns 3757487166.5 ns 1.01
integration/volumerhs 9440873.5 ns 9441754.5 ns 1.00
integration/byval/slices=1 145616 ns 145846 ns 1.00
integration/byval/slices=3 422814 ns 423265 ns 1.00
integration/byval/reference 143792 ns 143916 ns 1.00
integration/byval/slices=2 284069 ns 284641 ns 1.00
integration/cudadevrt 102357 ns 102633 ns 1.00
kernel/indexing 13245 ns 13466 ns 0.98
kernel/indexing_checked 14083 ns 13982 ns 1.01
kernel/occupancy 635.202380952381 ns 699.625850340136 ns 0.91
kernel/launch 2025.5 ns 2067.8 ns 0.98
kernel/rand 14585 ns 16244 ns 0.90
array/reverse/1d 18615 ns 18605 ns 1.00
array/reverse/2dL_inplace 66177 ns 66133 ns 1.00
array/reverse/1dL 68804 ns 68870 ns 1.00
array/reverse/2d 21266 ns 20781 ns 1.02
array/reverse/1d_inplace 10491 ns 10493.666666666666 ns 1.00
array/reverse/2d_inplace 11367 ns 10765 ns 1.06
array/reverse/2dL 73210 ns 72777.5 ns 1.01
array/reverse/1dL_inplace 66188 ns 66166 ns 1.00
array/copy 18366 ns 18321 ns 1.00
array/iteration/findall/int 145622.5 ns 145251 ns 1.00
array/iteration/findall/bool 130340 ns 130303 ns 1.00
array/iteration/findfirst/int 85134 ns 83996 ns 1.01
array/iteration/findfirst/bool 82631 ns 81209 ns 1.02
array/iteration/scalar 67040 ns 64953 ns 1.03
array/iteration/logical 197058.5 ns 197334 ns 1.00
array/iteration/findmin/1d 83432 ns 85667.5 ns 0.97
array/iteration/findmin/2d 117087 ns 117130 ns 1.00
array/reductions/reduce/Int64/1d 38905 ns 38913 ns 1.00
array/reductions/reduce/Int64/dims=1 41600 ns 41855 ns 0.99
array/reductions/reduce/Int64/dims=2 58808 ns 59043 ns 1.00
array/reductions/reduce/Int64/dims=1L 87117 ns 87102 ns 1.00
array/reductions/reduce/Int64/dims=2L 84669 ns 84295 ns 1.00
array/reductions/reduce/Float32/1d 34237 ns 33785 ns 1.01
array/reductions/reduce/Float32/dims=1 43934 ns 48986 ns 0.90
array/reductions/reduce/Float32/dims=2 56239 ns 56655 ns 0.99
array/reductions/reduce/Float32/dims=1L 51394 ns 51438 ns 1.00
array/reductions/reduce/Float32/dims=2L 69575 ns 69460.5 ns 1.00
array/reductions/mapreduce/Int64/1d 39210.5 ns 38699 ns 1.01
array/reductions/mapreduce/Int64/dims=1 46057 ns 41686 ns 1.10
array/reductions/mapreduce/Int64/dims=2 58993 ns 58974 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 87229 ns 87184 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 84397 ns 84571 ns 1.00
array/reductions/mapreduce/Float32/1d 34022 ns 33512 ns 1.02
array/reductions/mapreduce/Float32/dims=1 39843 ns 47745 ns 0.83
array/reductions/mapreduce/Float32/dims=2 55903 ns 56241 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 51260 ns 51435 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69261 ns 69604 ns 1.00
array/broadcast 20628 ns 20361 ns 1.01
array/copyto!/gpu_to_gpu 10673.333333333334 ns 10601.666666666666 ns 1.01
array/copyto!/cpu_to_gpu 213909 ns 214964 ns 1.00
array/copyto!/gpu_to_cpu 283527 ns 282717 ns 1.00
array/accumulate/Int64/1d 118150.5 ns 118054 ns 1.00
array/accumulate/Int64/dims=1 79533 ns 78929 ns 1.01
array/accumulate/Int64/dims=2 155242 ns 155861 ns 1.00
array/accumulate/Int64/dims=1L 1697447 ns 1705368 ns 1.00
array/accumulate/Int64/dims=2L 960552 ns 960330.5 ns 1.00
array/accumulate/Float32/1d 100637.5 ns 100426 ns 1.00
array/accumulate/Float32/dims=1 76099 ns 75943 ns 1.00
array/accumulate/Float32/dims=2 144215 ns 143974 ns 1.00
array/accumulate/Float32/dims=1L 1584181 ns 1584300 ns 1.00
array/accumulate/Float32/dims=2L 656485 ns 658063 ns 1.00
array/construct 1291 ns 1252.6 ns 1.03
array/random/randn/Float32 36310 ns 35435 ns 1.02
array/random/randn!/Float32 30120 ns 29972 ns 1.00
array/random/rand!/Int64 34550 ns 28260 ns 1.22
array/random/rand!/Float32 8320.166666666668 ns 8310 ns 1.00
array/random/rand/Int64 36976 ns 29927 ns 1.24
array/random/rand/Float32 12342 ns 12324 ns 1.00
array/permutedims/4d 50805 ns 51686 ns 0.98
array/permutedims/2d 52400 ns 52279 ns 1.00
array/permutedims/3d 52639 ns 52911 ns 0.99
array/sorting/1d 2734832 ns 2735042.5 ns 1.00
array/sorting/by 3304279 ns 3304486.5 ns 1.00
array/sorting/2d 1067131 ns 1066581 ns 1.00
cuda/synchronization/stream/auto 1064.090909090909 ns 993.5882352941177 ns 1.07
cuda/synchronization/stream/nonblocking 7534.299999999999 ns 7392.700000000001 ns 1.02
cuda/synchronization/stream/blocking 821.4470588235295 ns 811.8282828282828 ns 1.01
cuda/synchronization/context/auto 1150.3 ns 1160.9 ns 0.99
cuda/synchronization/context/nonblocking 7125.9 ns 7875.6 ns 0.90
cuda/synchronization/context/blocking 894.469387755102 ns 899.7058823529412 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt maleadt merged commit 444d208 into master Mar 3, 2026
3 checks passed
@maleadt maleadt deleted the tb/llvm_minmax branch March 3, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant