[SYCL][CUDA] joint_matrix required changes following #11215#11563
[SYCL][CUDA] joint_matrix required changes following #11215#11563aelovikov-intel merged 11 commits intointel:syclfrom
Conversation
Added new supported mma builtins where C/D types differ. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
and test new cases. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
const variables. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
from tf32 device code check test. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
| sg, sub_c, accC.template get_multi_ptr<access::decorated::yes>(), | ||
| N, layout::row_major); | ||
|
|
||
| // Round a, b to tf32 |
There was a problem hiding this comment.
I think this file/directory needs an entry in CODEOWNERS file. Please merge a separate PR with that change. Once done, I expect not review from SYCL RT team will be required here.
| } | ||
|
|
||
| joint_matrix_mad(sg, sub_c, sub_a, sub_b, sub_c); | ||
| joint_matrix_mad(sg, sub_d, sub_a, sub_b, sub_c); |
There was a problem hiding this comment.
how many times this will actually run?
You want the return matrix to be "sub_c" so it can be input again, right?
There was a problem hiding this comment.
Just once, I changed it to run once so that sub_d can be a different joint_matrix (with a different type to sub_c), in order to test the new cases I added to the support matrix/joint_matrix_tensorcores_sm70.cpp.
| |16 |16 |16 | ||
| |8 |32 |16 | ||
| |32 |8 |16 | ||
| .3+| `matrix_type::fp16` .3+| `matrix_type::fp16` .3+| `matrix_type::fp32` |
There was a problem hiding this comment.
BTW, where are the bfloat16 combinations?
There was a problem hiding this comment.
They are in the same table at the bottom, between tf32 and fp64.
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
| // Round a, b to tf32 | ||
| for (auto i = 0; i < 4; ++i) | ||
| get_wi_data(sg, sub_a)[i] = | ||
| round_to_tf32(get_wi_data(sg, sub_a)[i]); |
There was a problem hiding this comment.
I've just realized that to be consistent round_to_tf32 should also have a device code check. I removed it here since it didn't have one, but it will be better to add it back via
auto round_lambda = [](auto &x) { x = round_to_tf32(x); };
joint_matrix_apply(sg, sub_a, round_lambda);
and add a check that it calls the correct nvvm builtin.
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
npmiller
left a comment
There was a problem hiding this comment.
matrix-nvptx-tf32-test.cpp changes LGTM
I think it should be "the types of C/D matrices differ". |
|
I don't see any changes that require runtime-reviewers approval. @JackAKirk , should I merge this in? |
Yes please! |
As discussed in #11215 this patch:
joint_matrix_cuda(This change requires an upstream llvm patch (https://reviews.llvm.org/rGb781c7ab574f))get_wi_data()I also added back the cases that the change in the
joint_matrix_madinterface allows: namely when the type of C/D matrices differ. I correspondingly updated the tests, to test the new cases that are supported.I also updated the support matrix for cuda in the spec doc for the newly supported combinations.