Skip to content

Various improvements to the docs#3030

Open
giordano wants to merge 12 commits intoJuliaGPU:masterfrom
giordano:mg/docs
Open

Various improvements to the docs#3030
giordano wants to merge 12 commits intoJuliaGPU:masterfrom
giordano:mg/docs

Conversation

@giordano
Copy link
Contributor

I had some...uhm...fun in the last couple of days trying to port some C++ CUDA code to CUDA.jl, and profile it. I dumped into this PR my experience, hoping to make lives of people after me a little bit easier 🙂

@github-actions
Copy link
Contributor

github-actions bot commented Feb 13, 2026

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/src/device/intrinsics/indexing.jl b/src/device/intrinsics/indexing.jl
index 5e6209fe3..dd9655911 100644
--- a/src/device/intrinsics/indexing.jl
+++ b/src/device/intrinsics/indexing.jl
@@ -92,62 +92,62 @@ end
 @doc """
     threadIdx()::NamedTuple
 
-Returns the thread index within the block as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based, unlike the `threadIdx` built-in variable in the C/C++ extension which is 0-based.
+    Returns the thread index within the block as a `NamedTuple` with keys `x`, `y`, and `z`.
+    These indices are 1-based, unlike the `threadIdx` built-in variable in the C/C++ extension which is 0-based.
 """ threadIdx
 @inline threadIdx() = (x=threadIdx_x(), y=threadIdx_y(), z=threadIdx_z())
 
 @doc """
     blockDim()::NamedTuple
 
-Returns the dimensions (in threads) of the block as a `NamedTuple` with keys `x`, `y`, and `z`.
-Unlike the `*Idx` intrinsics, `blockDim` returns the same value as its C/C++ extension counterpart.
+    Returns the dimensions (in threads) of the block as a `NamedTuple` with keys `x`, `y`, and `z`.
+    Unlike the `*Idx` intrinsics, `blockDim` returns the same value as its C/C++ extension counterpart.
 """ blockDim
 @inline blockDim() = (x=blockDim_x(), y=blockDim_y(), z=blockDim_z())
 
 @doc """
     blockIdx()::NamedTuple
 
-Returns the block index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based, unlike the `blockIdx` built-in variable in the C/C++ extension which is 0-based.
+    Returns the block index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+    These indices are 1-based, unlike the `blockIdx` built-in variable in the C/C++ extension which is 0-based.
 """ blockIdx
 @inline blockIdx() = (x=blockIdx_x(), y=blockIdx_y(), z=blockIdx_z())
 
 @doc """
     gridDim()::NamedTuple
 
-Returns the dimensions (in blocks) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-Unlike the `*Idx` intrinsics, `gridDim` returns the same value as its C/C++ extension counterpart.
+    Returns the dimensions (in blocks) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+    Unlike the `*Idx` intrinsics, `gridDim` returns the same value as its C/C++ extension counterpart.
 """ gridDim
 @inline gridDim() = (x=gridDim_x(), y=gridDim_y(), z=gridDim_z())
 
 @doc """
     blockIdxInCluster()::NamedTuple
 
-Returns the block index within the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based.
+    Returns the block index within the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
+    These indices are 1-based.
 """ blockIdxInCluster
 @inline blockIdxInCluster() = (x=blockIdxInCluster_x(), y=blockIdxInCluster_y(), z=blockIdxInCluster_z())
 
 @doc """
     clusterDim()::NamedTuple
 
-Returns the dimensions (in blocks) of the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
+    Returns the dimensions (in blocks) of the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
 """ clusterDim
 @inline clusterDim() = (x=clusterDim_x(), y=clusterDim_y(), z=clusterDim_z())
 
 @doc """
     clusterIdx()::NamedTuple
 
-Returns the cluster index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based.
+    Returns the cluster index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+    These indices are 1-based.
 """ clusterIdx
 @inline clusterIdx() = (x=clusterIdx_x(), y=clusterIdx_y(), z=clusterIdx_z())
 
 @doc """
     gridClusterDim()::NamedTuple
 
-Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+    Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
 """ gridClusterDim
 @inline gridClusterDim() = (x=gridClusterDim_x(), y=gridClusterDim_y(), z=gridClusterDim_z())
 
@@ -155,7 +155,7 @@ Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`
     linearBlockIdxInCluster()::Int32
 
 Returns the linear block index within the cluster.
-These indices are 1-based.
+    These indices are 1-based.
 """ linearBlockIdxInCluster
 @eval @inline $(:linearBlockIdxInCluster)() = _index($(Val(Symbol("cluster.ctarank"))), $(Val(0:max_cluster_length-1))) + 1i32
 
@@ -170,7 +170,7 @@ Returns the linear cluster size (in blocks).
     warpsize()::Int32
 
 Returns the warp size (in threads).
-This corresponds to the `warpSize` built-in variable in the C/C++ extension.
+    This corresponds to the `warpSize` built-in variable in the C/C++ extension.
 """ warpsize
 @inline warpsize() = ccall("llvm.nvvm.read.ptx.sreg.warpsize", llvmcall, Int32, ())
 
@@ -178,7 +178,7 @@ This corresponds to the `warpSize` built-in variable in the C/C++ extension.
     laneid()::Int32
 
 Returns the thread's lane within the warp.
-This ID is 1-based.
+    This ID is 1-based.
 """ laneid
 @inline laneid() = ccall("llvm.nvvm.read.ptx.sreg.laneid", llvmcall, Int32, ()) + 1i32
 

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: f3c0846 Previous: f7b7929 Ratio
latency/precompile 45002121512.5 ns 45018748344 ns 1.00
latency/ttfp 12927195194 ns 12770284486 ns 1.01
latency/import 3542151579 ns 3541917719 ns 1.00
integration/volumerhs 9440693 ns 9450947.5 ns 1.00
integration/byval/slices=1 146099 ns 146127 ns 1.00
integration/byval/slices=3 423506 ns 423159 ns 1.00
integration/byval/reference 144164 ns 143932 ns 1.00
integration/byval/slices=2 284988 ns 284759.5 ns 1.00
integration/cudadevrt 102829 ns 102551 ns 1.00
kernel/indexing 13480 ns 13204 ns 1.02
kernel/indexing_checked 14372 ns 13977 ns 1.03
kernel/occupancy 674.9496855345911 ns 664.05625 ns 1.02
kernel/launch 2231.4444444444443 ns 2163.9444444444443 ns 1.03
kernel/rand 14964 ns 18131 ns 0.83
array/reverse/1d 18806 ns 18471 ns 1.02
array/reverse/2dL_inplace 66144 ns 65988 ns 1.00
array/reverse/1dL 69375 ns 69022 ns 1.01
array/reverse/2d 21151 ns 20733 ns 1.02
array/reverse/1d_inplace 10469.666666666666 ns 8573 ns 1.22
array/reverse/2d_inplace 10540 ns 10232 ns 1.03
array/reverse/2dL 73156.5 ns 72825 ns 1.00
array/reverse/1dL_inplace 66136 ns 65937 ns 1.00
array/copy 19107 ns 18988 ns 1.01
array/iteration/findall/int 150518 ns 150059 ns 1.00
array/iteration/findall/bool 132933.5 ns 132365.5 ns 1.00
array/iteration/findfirst/int 83958 ns 83639 ns 1.00
array/iteration/findfirst/bool 81654 ns 81468 ns 1.00
array/iteration/scalar 67751 ns 66443.5 ns 1.02
array/iteration/logical 204171 ns 200236 ns 1.02
array/iteration/findmin/1d 87930.5 ns 86614.5 ns 1.02
array/iteration/findmin/2d 118171 ns 117241 ns 1.01
array/reductions/reduce/Int64/1d 44225 ns 42766 ns 1.03
array/reductions/reduce/Int64/dims=1 42675.5 ns 52907 ns 0.81
array/reductions/reduce/Int64/dims=2 60126 ns 60231 ns 1.00
array/reductions/reduce/Int64/dims=1L 88052 ns 87828 ns 1.00
array/reductions/reduce/Int64/dims=2L 84785 ns 84956.5 ns 1.00
array/reductions/reduce/Float32/1d 35344.5 ns 34964 ns 1.01
array/reductions/reduce/Float32/dims=1 49647.5 ns 40442.5 ns 1.23
array/reductions/reduce/Float32/dims=2 57300 ns 57125 ns 1.00
array/reductions/reduce/Float32/dims=1L 52478 ns 52000 ns 1.01
array/reductions/reduce/Float32/dims=2L 70397 ns 69982.5 ns 1.01
array/reductions/mapreduce/Int64/1d 43614 ns 42509 ns 1.03
array/reductions/mapreduce/Int64/dims=1 53101 ns 42334 ns 1.25
array/reductions/mapreduce/Int64/dims=2 60233 ns 59835 ns 1.01
array/reductions/mapreduce/Int64/dims=1L 88035 ns 87864 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 85256 ns 85164 ns 1.00
array/reductions/mapreduce/Float32/1d 35067 ns 34719 ns 1.01
array/reductions/mapreduce/Float32/dims=1 40061 ns 45273 ns 0.88
array/reductions/mapreduce/Float32/dims=2 57092 ns 56959 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52232.5 ns 52179 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69921 ns 69729 ns 1.00
array/broadcast 20979 ns 20464 ns 1.03
array/copyto!/gpu_to_gpu 11512 ns 11261 ns 1.02
array/copyto!/cpu_to_gpu 217133 ns 216266 ns 1.00
array/copyto!/gpu_to_cpu 284447 ns 282685.5 ns 1.01
array/accumulate/Int64/1d 119559 ns 119363 ns 1.00
array/accumulate/Int64/dims=1 80810.5 ns 80474 ns 1.00
array/accumulate/Int64/dims=2 157639 ns 157437.5 ns 1.00
array/accumulate/Int64/dims=1L 1707311.5 ns 1706725 ns 1.00
array/accumulate/Int64/dims=2L 962530.5 ns 962008 ns 1.00
array/accumulate/Float32/1d 102000.5 ns 101483 ns 1.01
array/accumulate/Float32/dims=1 78118 ns 77247 ns 1.01
array/accumulate/Float32/dims=2 145130.5 ns 143932 ns 1.01
array/accumulate/Float32/dims=1L 1586913 ns 1593993 ns 1.00
array/accumulate/Float32/dims=2L 658901 ns 660832 ns 1.00
array/construct 1311.9 ns 1332.6 ns 0.98
array/random/randn/Float32 38856 ns 38567.5 ns 1.01
array/random/randn!/Float32 29569 ns 31716 ns 0.93
array/random/rand!/Int64 27138 ns 34263.5 ns 0.79
array/random/rand!/Float32 8569.333333333334 ns 8628 ns 0.99
array/random/rand/Int64 35000.5 ns 30788.5 ns 1.14
array/random/rand/Float32 13213 ns 13144 ns 1.01
array/permutedims/4d 52756.5 ns 52096 ns 1.01
array/permutedims/2d 53280.5 ns 52583 ns 1.01
array/permutedims/3d 53462 ns 53461 ns 1.00
array/sorting/1d 2737046.5 ns 2734388 ns 1.00
array/sorting/by 3305342 ns 3327876 ns 0.99
array/sorting/2d 1069795 ns 1072450 ns 1.00
cuda/synchronization/stream/auto 1044.5 ns 1031.7 ns 1.01
cuda/synchronization/stream/nonblocking 7392 ns 7628.4 ns 0.97
cuda/synchronization/stream/blocking 837.3461538461538 ns 827.9 ns 1.01
cuda/synchronization/context/auto 1181.2 ns 1165.1 ns 1.01
cuda/synchronization/context/nonblocking 7662.6 ns 7638.9 ns 1.00
cuda/synchronization/context/blocking 948.551724137931 ns 925.0566037735849 ns 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@codecov
Copy link

codecov bot commented Feb 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.47%. Comparing base (5472295) to head (0aef286).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3030      +/-   ##
==========================================
- Coverage   89.48%   89.47%   -0.01%     
==========================================
  Files         148      148              
  Lines       13043    13043              
==========================================
- Hits        11671    11670       -1     
- Misses       1372     1373       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

""" threadIdx
@inline threadIdx() = (x=threadIdx_x(), y=threadIdx_y(), z=threadIdx_z())
Returns the dimensions of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
These dimensions have the same starting index as the `gridDim` built-in variable in the C/C++ extension.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gridDim returns a dimension/size, not an index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced "index" with "dimension" here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting dimension doesn't make much sense to me. What else could a size() query return? 0 vs 1-based indexing doesn't apply here.

That said, I'm okay with this if you think this clarifies things.

Copy link
Member

@christiangnrd christiangnrd Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could be phrased along the lines of:

Unlike the `*Idx` intrinsics `gridDim` returns the same value as its C/C++ extension counterpart.

I do think this should be mentioned in form though. The indexing intrinsics being offset while the dim intrinsics not makes sense when you think about it, but I've also gotten confused by this, and not everyone will think/know to check the source code to confirm.

Either way, the same edits the gridDim receives should also be mirrored to blockDim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@giordano
Copy link
Contributor Author

Bump? I expanded also doscstrings introduced in #3017.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants