Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/src/device/intrinsics/indexing.jl b/src/device/intrinsics/indexing.jl
index 5e6209fe3..dd9655911 100644
--- a/src/device/intrinsics/indexing.jl
+++ b/src/device/intrinsics/indexing.jl
@@ -92,62 +92,62 @@ end
@doc """
threadIdx()::NamedTuple
-Returns the thread index within the block as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based, unlike the `threadIdx` built-in variable in the C/C++ extension which is 0-based.
+ Returns the thread index within the block as a `NamedTuple` with keys `x`, `y`, and `z`.
+ These indices are 1-based, unlike the `threadIdx` built-in variable in the C/C++ extension which is 0-based.
""" threadIdx
@inline threadIdx() = (x=threadIdx_x(), y=threadIdx_y(), z=threadIdx_z())
@doc """
blockDim()::NamedTuple
-Returns the dimensions (in threads) of the block as a `NamedTuple` with keys `x`, `y`, and `z`.
-Unlike the `*Idx` intrinsics, `blockDim` returns the same value as its C/C++ extension counterpart.
+ Returns the dimensions (in threads) of the block as a `NamedTuple` with keys `x`, `y`, and `z`.
+ Unlike the `*Idx` intrinsics, `blockDim` returns the same value as its C/C++ extension counterpart.
""" blockDim
@inline blockDim() = (x=blockDim_x(), y=blockDim_y(), z=blockDim_z())
@doc """
blockIdx()::NamedTuple
-Returns the block index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based, unlike the `blockIdx` built-in variable in the C/C++ extension which is 0-based.
+ Returns the block index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+ These indices are 1-based, unlike the `blockIdx` built-in variable in the C/C++ extension which is 0-based.
""" blockIdx
@inline blockIdx() = (x=blockIdx_x(), y=blockIdx_y(), z=blockIdx_z())
@doc """
gridDim()::NamedTuple
-Returns the dimensions (in blocks) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-Unlike the `*Idx` intrinsics, `gridDim` returns the same value as its C/C++ extension counterpart.
+ Returns the dimensions (in blocks) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+ Unlike the `*Idx` intrinsics, `gridDim` returns the same value as its C/C++ extension counterpart.
""" gridDim
@inline gridDim() = (x=gridDim_x(), y=gridDim_y(), z=gridDim_z())
@doc """
blockIdxInCluster()::NamedTuple
-Returns the block index within the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based.
+ Returns the block index within the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
+ These indices are 1-based.
""" blockIdxInCluster
@inline blockIdxInCluster() = (x=blockIdxInCluster_x(), y=blockIdxInCluster_y(), z=blockIdxInCluster_z())
@doc """
clusterDim()::NamedTuple
-Returns the dimensions (in blocks) of the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
+ Returns the dimensions (in blocks) of the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
""" clusterDim
@inline clusterDim() = (x=clusterDim_x(), y=clusterDim_y(), z=clusterDim_z())
@doc """
clusterIdx()::NamedTuple
-Returns the cluster index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based.
+ Returns the cluster index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+ These indices are 1-based.
""" clusterIdx
@inline clusterIdx() = (x=clusterIdx_x(), y=clusterIdx_y(), z=clusterIdx_z())
@doc """
gridClusterDim()::NamedTuple
-Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+ Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
""" gridClusterDim
@inline gridClusterDim() = (x=gridClusterDim_x(), y=gridClusterDim_y(), z=gridClusterDim_z())
@@ -155,7 +155,7 @@ Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`
linearBlockIdxInCluster()::Int32
Returns the linear block index within the cluster.
-These indices are 1-based.
+ These indices are 1-based.
""" linearBlockIdxInCluster
@eval @inline $(:linearBlockIdxInCluster)() = _index($(Val(Symbol("cluster.ctarank"))), $(Val(0:max_cluster_length-1))) + 1i32
@@ -170,7 +170,7 @@ Returns the linear cluster size (in blocks).
warpsize()::Int32
Returns the warp size (in threads).
-This corresponds to the `warpSize` built-in variable in the C/C++ extension.
+ This corresponds to the `warpSize` built-in variable in the C/C++ extension.
""" warpsize
@inline warpsize() = ccall("llvm.nvvm.read.ptx.sreg.warpsize", llvmcall, Int32, ())
@@ -178,7 +178,7 @@ This corresponds to the `warpSize` built-in variable in the C/C++ extension.
laneid()::Int32
Returns the thread's lane within the warp.
-This ID is 1-based.
+ This ID is 1-based.
""" laneid
@inline laneid() = ccall("llvm.nvvm.read.ptx.sreg.laneid", llvmcall, Int32, ()) + 1i32
|
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: f3c0846 | Previous: f7b7929 | Ratio |
|---|---|---|---|
latency/precompile |
45002121512.5 ns |
45018748344 ns |
1.00 |
latency/ttfp |
12927195194 ns |
12770284486 ns |
1.01 |
latency/import |
3542151579 ns |
3541917719 ns |
1.00 |
integration/volumerhs |
9440693 ns |
9450947.5 ns |
1.00 |
integration/byval/slices=1 |
146099 ns |
146127 ns |
1.00 |
integration/byval/slices=3 |
423506 ns |
423159 ns |
1.00 |
integration/byval/reference |
144164 ns |
143932 ns |
1.00 |
integration/byval/slices=2 |
284988 ns |
284759.5 ns |
1.00 |
integration/cudadevrt |
102829 ns |
102551 ns |
1.00 |
kernel/indexing |
13480 ns |
13204 ns |
1.02 |
kernel/indexing_checked |
14372 ns |
13977 ns |
1.03 |
kernel/occupancy |
674.9496855345911 ns |
664.05625 ns |
1.02 |
kernel/launch |
2231.4444444444443 ns |
2163.9444444444443 ns |
1.03 |
kernel/rand |
14964 ns |
18131 ns |
0.83 |
array/reverse/1d |
18806 ns |
18471 ns |
1.02 |
array/reverse/2dL_inplace |
66144 ns |
65988 ns |
1.00 |
array/reverse/1dL |
69375 ns |
69022 ns |
1.01 |
array/reverse/2d |
21151 ns |
20733 ns |
1.02 |
array/reverse/1d_inplace |
10469.666666666666 ns |
8573 ns |
1.22 |
array/reverse/2d_inplace |
10540 ns |
10232 ns |
1.03 |
array/reverse/2dL |
73156.5 ns |
72825 ns |
1.00 |
array/reverse/1dL_inplace |
66136 ns |
65937 ns |
1.00 |
array/copy |
19107 ns |
18988 ns |
1.01 |
array/iteration/findall/int |
150518 ns |
150059 ns |
1.00 |
array/iteration/findall/bool |
132933.5 ns |
132365.5 ns |
1.00 |
array/iteration/findfirst/int |
83958 ns |
83639 ns |
1.00 |
array/iteration/findfirst/bool |
81654 ns |
81468 ns |
1.00 |
array/iteration/scalar |
67751 ns |
66443.5 ns |
1.02 |
array/iteration/logical |
204171 ns |
200236 ns |
1.02 |
array/iteration/findmin/1d |
87930.5 ns |
86614.5 ns |
1.02 |
array/iteration/findmin/2d |
118171 ns |
117241 ns |
1.01 |
array/reductions/reduce/Int64/1d |
44225 ns |
42766 ns |
1.03 |
array/reductions/reduce/Int64/dims=1 |
42675.5 ns |
52907 ns |
0.81 |
array/reductions/reduce/Int64/dims=2 |
60126 ns |
60231 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
88052 ns |
87828 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
84785 ns |
84956.5 ns |
1.00 |
array/reductions/reduce/Float32/1d |
35344.5 ns |
34964 ns |
1.01 |
array/reductions/reduce/Float32/dims=1 |
49647.5 ns |
40442.5 ns |
1.23 |
array/reductions/reduce/Float32/dims=2 |
57300 ns |
57125 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
52478 ns |
52000 ns |
1.01 |
array/reductions/reduce/Float32/dims=2L |
70397 ns |
69982.5 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
43614 ns |
42509 ns |
1.03 |
array/reductions/mapreduce/Int64/dims=1 |
53101 ns |
42334 ns |
1.25 |
array/reductions/mapreduce/Int64/dims=2 |
60233 ns |
59835 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
88035 ns |
87864 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
85256 ns |
85164 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
35067 ns |
34719 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1 |
40061 ns |
45273 ns |
0.88 |
array/reductions/mapreduce/Float32/dims=2 |
57092 ns |
56959 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52232.5 ns |
52179 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69921 ns |
69729 ns |
1.00 |
array/broadcast |
20979 ns |
20464 ns |
1.03 |
array/copyto!/gpu_to_gpu |
11512 ns |
11261 ns |
1.02 |
array/copyto!/cpu_to_gpu |
217133 ns |
216266 ns |
1.00 |
array/copyto!/gpu_to_cpu |
284447 ns |
282685.5 ns |
1.01 |
array/accumulate/Int64/1d |
119559 ns |
119363 ns |
1.00 |
array/accumulate/Int64/dims=1 |
80810.5 ns |
80474 ns |
1.00 |
array/accumulate/Int64/dims=2 |
157639 ns |
157437.5 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1707311.5 ns |
1706725 ns |
1.00 |
array/accumulate/Int64/dims=2L |
962530.5 ns |
962008 ns |
1.00 |
array/accumulate/Float32/1d |
102000.5 ns |
101483 ns |
1.01 |
array/accumulate/Float32/dims=1 |
78118 ns |
77247 ns |
1.01 |
array/accumulate/Float32/dims=2 |
145130.5 ns |
143932 ns |
1.01 |
array/accumulate/Float32/dims=1L |
1586913 ns |
1593993 ns |
1.00 |
array/accumulate/Float32/dims=2L |
658901 ns |
660832 ns |
1.00 |
array/construct |
1311.9 ns |
1332.6 ns |
0.98 |
array/random/randn/Float32 |
38856 ns |
38567.5 ns |
1.01 |
array/random/randn!/Float32 |
29569 ns |
31716 ns |
0.93 |
array/random/rand!/Int64 |
27138 ns |
34263.5 ns |
0.79 |
array/random/rand!/Float32 |
8569.333333333334 ns |
8628 ns |
0.99 |
array/random/rand/Int64 |
35000.5 ns |
30788.5 ns |
1.14 |
array/random/rand/Float32 |
13213 ns |
13144 ns |
1.01 |
array/permutedims/4d |
52756.5 ns |
52096 ns |
1.01 |
array/permutedims/2d |
53280.5 ns |
52583 ns |
1.01 |
array/permutedims/3d |
53462 ns |
53461 ns |
1.00 |
array/sorting/1d |
2737046.5 ns |
2734388 ns |
1.00 |
array/sorting/by |
3305342 ns |
3327876 ns |
0.99 |
array/sorting/2d |
1069795 ns |
1072450 ns |
1.00 |
cuda/synchronization/stream/auto |
1044.5 ns |
1031.7 ns |
1.01 |
cuda/synchronization/stream/nonblocking |
7392 ns |
7628.4 ns |
0.97 |
cuda/synchronization/stream/blocking |
837.3461538461538 ns |
827.9 ns |
1.01 |
cuda/synchronization/context/auto |
1181.2 ns |
1165.1 ns |
1.01 |
cuda/synchronization/context/nonblocking |
7662.6 ns |
7638.9 ns |
1.00 |
cuda/synchronization/context/blocking |
948.551724137931 ns |
925.0566037735849 ns |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3030 +/- ##
==========================================
- Coverage 89.48% 89.47% -0.01%
==========================================
Files 148 148
Lines 13043 13043
==========================================
- Hits 11671 11670 -1
- Misses 1372 1373 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
src/device/intrinsics/indexing.jl
Outdated
| """ threadIdx | ||
| @inline threadIdx() = (x=threadIdx_x(), y=threadIdx_y(), z=threadIdx_z()) | ||
| Returns the dimensions of the grid as a `NamedTuple` with keys `x`, `y`, and `z`. | ||
| These dimensions have the same starting index as the `gridDim` built-in variable in the C/C++ extension. |
There was a problem hiding this comment.
gridDim returns a dimension/size, not an index.
There was a problem hiding this comment.
Replaced "index" with "dimension" here.
There was a problem hiding this comment.
starting dimension doesn't make much sense to me. What else could a size() query return? 0 vs 1-based indexing doesn't apply here.
That said, I'm okay with this if you think this clarifies things.
There was a problem hiding this comment.
Maybe it could be phrased along the lines of:
Unlike the `*Idx` intrinsics `gridDim` returns the same value as its C/C++ extension counterpart.
I do think this should be mentioned in form though. The indexing intrinsics being offset while the dim intrinsics not makes sense when you think about it, but I've also gotten confused by this, and not everyone will think/know to check the source code to confirm.
Either way, the same edits the gridDim receives should also be mirrored to blockDim
Co-authored-by: Christian Guinard <28689358+christiangnrd@users.noreply.github.com>
|
Bump? I expanded also doscstrings introduced in #3017. |
I had some...uhm...fun in the last couple of days trying to port some C++ CUDA code to CUDA.jl, and profile it. I dumped into this PR my experience, hoping to make lives of people after me a little bit easier 🙂