[SYCL][Matrix] syntax changes as preparation before moving joint matrix from experimental namespace#11215
Conversation
experimental namespace As part of the effort to move joint matrix from experimental namespace to supported. A review of the API is being done as part of intel#7964. This results in the following changes in the syntax: 1- Add Td to joint_matrix_mad as Tc can be different from Td on the GPU, Now, we make D as an input argument to mad. 2- Change “packed” to ext_intel_packed: 3- Move EWOps (get_wi_data, wi_element, get_coord) to detail namespace) 4- add const to joint_matrix in store and mad 5 - add joint_matrix_copy/assignment function 6- add apply with coordination (change existing tests) 7- change get_coord vector type from int32_t to size_t 8- delete explicitly both = and copy ctor.
| @@ -138,9 +127,9 @@ template <typename Group, typename T, use Use, size_t Rows, size_t Cols, | |||
| __SYCL2020_DEPRECATED("get_wi_data() is deprecated for CUDA backend. Please " | |||
There was a problem hiding this comment.
We should remove this. This is not really deprecated as joint_matrix is experimental so we can just remove APIs. Deprecated means they still exist and implementations maintain them. In the case of get_wi_data. it is replaced by joint_matrix_apply
There was a problem hiding this comment.
This will be addressed by @JackAKirk among other CUDA changes in a separate PR.
There was a problem hiding this comment.
Yes I will make this change as soon as this PR is merged.
| throw runtime_error("get_wi_data is available using: " | ||
| "ext::intel::experimental::matrix::get_wi_data.", | ||
| "ext::oneapi::detail::get_wi_data.", | ||
| PI_ERROR_INVALID_DEVICE); |
There was a problem hiding this comment.
We should not advise users to use get_wi_data. When does this runtime error occur?
There was a problem hiding this comment.
ok, i see. wi_data class here is in sycl::ext::oneapi::experimental::matrix namespace and it is for NV. the errmsg is for intel users who uses NV's wi_data
There was a problem hiding this comment.
how about
get_wi_data is available using: ext::oneapi::detail::get_wi_data but intel users are expected to use joint_matrix_copy instead
There was a problem hiding this comment.
We should never advise users to use anything from detail namespace. Detail namespace are implementation details and can change at any time. It is not part of documented API.
| #else | ||
| throw runtime_error("get_wi_data is available using: " | ||
| "ext::intel::experimental::matrix::get_wi_data.", | ||
| "ext::oneapi::detail::get_wi_data.", |
| spv_scope_traits<Group>::value); | ||
| break; | ||
| case sycl::ext::intel::experimental::matrix::layout::packed: | ||
| case sycl::ext::oneapi::experimental::matrix::layout::ext_intel_packed: |
There was a problem hiding this comment.
Minor: you dont need to specify the whole namespace here
| std::size_t M, std::size_t K, std::size_t N, layout LayoutA, | ||
| layout LayoutB> | ||
| inline __SYCL_ALWAYS_INLINE void joint_matrix_mad( | ||
| Group sg, const joint_matrix<Group, Ta, use::a, M, K, LayoutA> &A, |
dkhaldi
left a comment
There was a problem hiding this comment.
Main comments:
- We should not use
get_wi_dataor things indetailin tests or errors, these should be replaced with joint_matrix_apply - remove namespace when specifying ext_intel_packed so things look shorter
| N * 4, matrix_layout::packed_b); | ||
| sub_c = joint_matrix_mad(sg, sub_a, sub_b, sub_c); | ||
| } | ||
| joint_matrix_store( |
There was a problem hiding this comment.
You should be able to avoid changes in Legacy folder. Were they caused by clang-format?
| ext::intel::experimental::matrix::layout::packed> | ||
| joint_matrix< | ||
| sub_group, T2, use::b, TK, TN, | ||
| ext::oneapi::experimental::matrix::layout::ext_intel_packed> |
| K); | ||
| auto wi_slice_a = | ||
| sycl::ext::intel::experimental::matrix::get_wi_data(sg, sub_a); | ||
| auto wi_slice_a = sycl::ext::oneapi::detail::get_wi_data(sg, sub_a); |
There was a problem hiding this comment.
We should not use get_wi_data or detail in tests, these should be replaced with joint_matrix_apply
| N * vnniFactor); | ||
| auto wi_slice_b = | ||
| sycl::ext::intel::experimental::matrix::get_wi_data(sg, sub_b); | ||
| auto wi_slice_b = sycl::ext::oneapi::detail::get_wi_data(sg, sub_b); |
There was a problem hiding this comment.
We should not use get_wi_data or detail in tests, these should be replaced with joint_matrix_apply
|
sycl/test/matrix/matrix-bfloat16-test-coord-basicB.cpp previously failed |
|
I am WIP on rebasing |
|
will handle cuda testcase later |
| "ext::intel::experimental::matrix::get_wi_data.", | ||
| PI_ERROR_INVALID_DEVICE); | ||
| throw runtime_error( | ||
| "get_wi_data is available using: ext::oneapi::detail::get_wi_data, but " |
There was a problem hiding this comment.
Just say: "get_wi_data is unavailable, use joint_matrix_copy instead."
sycl/test/matrix/query-use.cpp
Outdated
| @@ -0,0 +1,158 @@ | |||
| // RUN: %clangxx -DSYCL_EXT_ONEAPI_MATRIX_VERSION=4 -fsycl -o query-use %s | |||
There was a problem hiding this comment.
This is an old version of the test
We should probably remove this test because it assumes some distribution. Also, it does the same thing as sycl/test-e2e/Matrix/get_coord_int8_matB.cpp |
|
I looked more carefully at https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Matrix/element_wise_irreg_sum_rows_impl.hpp |
|
@intel/llvm-gatekeepers ping? |
|
@YuriPlyakhin has requested changes approval is needed. |
|
@YuriPlyakhin could you approve the pr? Dounia has answered your comments above and if there should be small changes, we can create a new pr. |
|
@intel/llvm-gatekeepers, please help merge |
There is an input from Steffen above. |
Correct, I missed that. Yury is OOO today but this can wait. |
YuriPlyakhin
left a comment
There was a problem hiding this comment.
LGTM. Important comments were addressed. Test fine tuning can be done later.
|
@intel/llvm-gatekeepers , I approved, please, merge. |
As discussed in #11215 this patch: - removed mutable from `joint_matrix_cuda` (This change requires an upstream llvm patch (https://reviews.llvm.org/rGb781c7ab574f)) - removed `get_wi_data()` I also added back the cases that the change in the `joint_matrix_mad` interface allows: namely when the type of C/D matrices differ. I correspondingly updated the tests, to test the new cases that are supported. I also updated the support matrix for cuda in the spec doc for the newly supported combinations. --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
* Support one block AMD matrix core instructions for `__gfx90a__` architecture. * Supports `__builtin_amdgcn_mfma_i32_32x32x8i8`, `__builtin_amdgcn_mfma_i32_16x16x16i8`, `__builtin_amdgcn_mfma_f64_16x16x4f64`, `__builtin_amdgcn_mfma_f32_32x32x8bf16_1k`, `__builtin_amdgcn_mfma_f32_16x16x16bf16_1k`, `__builtin_amdgcn_mfma_f32_32x32x8f16` and `__builtin_amdgcn_mfma_f32_16x16x16f16` instructions. * Add HIP matrix core support into joint_matrix documentation. Should be merged after - #11215 --------- Co-authored-by: Bing1 Yu <bing1.yu@intel.com> Co-authored-by: mmoadeli <mahmoudmoadeli@codeplay.com>
* Support one block AMD matrix core instructions for `__gfx90a__` architecture. * Supports `__builtin_amdgcn_mfma_i32_32x32x8i8`, `__builtin_amdgcn_mfma_i32_16x16x16i8`, `__builtin_amdgcn_mfma_f64_16x16x4f64`, `__builtin_amdgcn_mfma_f32_32x32x8bf16_1k`, `__builtin_amdgcn_mfma_f32_16x16x16bf16_1k`, `__builtin_amdgcn_mfma_f32_32x32x8f16` and `__builtin_amdgcn_mfma_f32_16x16x16f16` instructions. * Add HIP matrix core support into joint_matrix documentation. Should be merged after - #11215 --------- Co-authored-by: Bing1 Yu <bing1.yu@intel.com> Co-authored-by: mmoadeli <mahmoudmoadeli@codeplay.com>
As part of the effort to move joint matrix from experimental namespace to supported. A review of the API is being done as part of #7964. This results in the following changes in the syntax:
1- Add Td to joint_matrix_mad as Tc can be different from Td on the GPU, Now, we make D as an input argument to mad.
2- Change “packed” to ext_intel_packed:
3- Move EWOps (get_wi_data, wi_element, get_coord) to detail namespace) 4- add const to joint_matrix in store and mad
5 - add joint_matrix_copy/assignment function
6- add apply with coordination (change existing tests)
7- change get_coord vector type from int32_t to size_t
8- delete explicitly both = and copy ctor.