[SYCL][CUDA] Add bf16 builtins operating on storage types by t4c1 · Pull Request #5748 · intel/llvm

t4c1 · 2022-03-07T11:27:59Z

Add bf16 builtins operating on storage types. Partially implements https://github.com/intel/llvm/pull/5645/files for CUDA (only operations on storage types).

This PR includes a bugfix for some NVPTX intrinsics, which will also be pushed upstream.

Blocked by #5724.

Tests for this are in intel/llvm-test-suite#897.

mlychkov

Changes in llvm intrinsics LGTM.

s-kanaev

RT changes LGTM.

@t4c1 , should there be a test for this change?

t4c1 · 2022-03-14T07:47:28Z

Which change do you have in mind? There are some test being added to the test suite (linked in PR description). Or do you mean something else needs testing?

s-kanaev

RT changes LGTM

s-kanaev · 2022-03-14T08:23:37Z

There are some test being added to the test suite (linked in PR description).

@t4c1 , sorry, didn't notice that at first glance. Seem like that'll do.

bader · 2022-03-14T14:19:31Z

I had to merge with the sycl branch to resolve the conflict with 53a9d54.

t4c1 · 2022-03-14T14:21:18Z

Thanks.

bader · 2022-03-14T21:27:10Z

@t4c1, could you fix these warnings, please?
From https://github.com/intel/llvm/runs/5544122115?check_suite_focus=true

/home/runner/work/llvm/llvm/build/include/sycl/ext/oneapi/bf16_storage_builtins.hpp:40:68: error: unused parameter 'x' [-Werror,-Wunused-parameter]
std::enable_if_t<detail::is_bf16_storage_type<T>::value, T> fabs(T x) {
                                                                   ^
/home/runner/work/llvm/llvm/build/include/sycl/ext/oneapi/bf16_storage_builtins.hpp:49:68: error: unused parameter 'x' [-Werror,-Wunused-parameter]
std::enable_if_t<detail::is_bf16_storage_type<T>::value, T> fmin(T x, T y) {
                                                                   ^
/home/runner/work/llvm/llvm/build/include/sycl/ext/oneapi/bf16_storage_builtins.hpp:49:73: error: unused parameter 'y' [-Werror,-Wunused-parameter]
std::enable_if_t<detail::is_bf16_storage_type<T>::value, T> fmin(T x, T y) {
                                                                        ^
/home/runner/work/llvm/llvm/build/include/sycl/ext/oneapi/bf16_storage_builtins.hpp:58:68: error: unused parameter 'x' [-Werror,-Wunused-parameter]
std::enable_if_t<detail::is_bf16_storage_type<T>::value, T> fmax(T x, T y) {
                                                                   ^
/home/runner/work/llvm/llvm/build/include/sycl/ext/oneapi/bf16_storage_builtins.hpp:58:73: error: unused parameter 'y' [-Werror,-Wunused-parameter]
std::enable_if_t<detail::is_bf16_storage_type<T>::value, T> fmax(T x, T y) {
                                                                        ^
/home/runner/work/llvm/llvm/build/include/sycl/ext/oneapi/bf16_storage_builtins.hpp:67:67: error: unused parameter 'x' [-Werror,-Wunused-parameter]
std::enable_if_t<detail::is_bf16_storage_type<T>::value, T> fma(T x, T y, T z) {
                                                                  ^
/home/runner/work/llvm/llvm/build/include/sycl/ext/oneapi/bf16_storage_builtins.hpp:67:72: error: unused parameter 'y' [-Werror,-Wunused-parameter]
std::enable_if_t<detail::is_bf16_storage_type<T>::value, T> fma(T x, T y, T z) {
                                                                       ^
/home/runner/work/llvm/llvm/build/include/sycl/ext/oneapi/bf16_storage_builtins.hpp:67:77: error: unused parameter 'z' [-Werror,-Wunused-parameter]
std::enable_if_t<detail::is_bf16_storage_type<T>::value, T> fma(T x, T y, T z) {
                                                                            ^
8 errors generated.

This PR introduces full support of element wise operations in the cuda backend. `wi_data`, `get_matrix_fill`, and `joint_matrix.get_wi_data()` are introduced for portability with the Intel backend. In addition, in the CUDA backend users can call `joint_matrix.wi_marray` to access the marray that stores the WI owned elements of the matrix and perform optimized element wise operations using math functions that take marrays. bfloat16 element wise operations support is also included and this PR adds bfloat16 scalar/marray impls replacing the existing uint16_t "storage type" implementations for fma, fmax, fmin, and fabs math functions. The bfloat16 fma_relu function impl has now been added directly in #5749. The existing temporary uint16_t implementations (introduced in #5748 with unmerged tests intel/llvm-test-suite#897) have been removed, since these bfloat16 implementations replaces them. Signed-off-by: jack.kirk <jack.kirk@codeplay.com>

t4c1 added 3 commits March 7, 2022 03:01

[SYCL][CUDA] add bf16 builtins

373b27f

fix a bug in intrinsics

e684326

remove redundant declaration

0449fc9

t4c1 requested review from a team as code owners March 7, 2022 11:28

t4c1 requested a review from s-kanaev March 7, 2022 11:28

t4c1 mentioned this pull request Mar 7, 2022

[SYCL] Add tests for bf16 builtins operating on storage types intel/llvm-test-suite#897

Open

format

2f3afe4

JackAKirk mentioned this pull request Mar 7, 2022

[SYCL][Doc] math functions added to bfloat16 ext #5645

Merged

mlychkov previously approved these changes Mar 10, 2022

View reviewed changes

s-kanaev reviewed Mar 10, 2022

View reviewed changes

Merge branch 'sycl' into bf16_builtins

bc6e32a

t4c1 dismissed mlychkov’s stale review via bc6e32a March 14, 2022 08:03

bader requested a review from s-kanaev March 14, 2022 08:07

s-kanaev previously approved these changes Mar 14, 2022

View reviewed changes

hdelan mentioned this pull request Mar 14, 2022

[SYCL] Add fma_relu extension #5749

Closed

Merge branch 'sycl' into bf16_builtins

b78453c

bader dismissed s-kanaev’s stale review via b78453c March 14, 2022 14:14

bader merged commit 413a9ef into intel:sycl Mar 14, 2022

t4c1 mentioned this pull request Mar 15, 2022

[SYCL] Fix unused parameter warnings in bf16 storage builtins #5811

Merged

JackAKirk mentioned this pull request Apr 5, 2022

[SYCL][CUDA] Joint_matrix elem wise ops inc bfloat16 #5964

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA] Add bf16 builtins operating on storage types#5748

[SYCL][CUDA] Add bf16 builtins operating on storage types#5748
bader merged 6 commits intointel:syclfrom
t4c1:bf16_builtins

t4c1 commented Mar 7, 2022 •

edited

Loading

Uh oh!

mlychkov left a comment

Uh oh!

s-kanaev left a comment

Uh oh!

t4c1 commented Mar 14, 2022

Uh oh!

s-kanaev left a comment

Uh oh!

s-kanaev commented Mar 14, 2022

Uh oh!

bader commented Mar 14, 2022

Uh oh!

t4c1 commented Mar 14, 2022

Uh oh!

bader commented Mar 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

t4c1 commented Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlychkov left a comment

Choose a reason for hiding this comment

Uh oh!

s-kanaev left a comment

Choose a reason for hiding this comment

Uh oh!

t4c1 commented Mar 14, 2022

Uh oh!

s-kanaev left a comment

Choose a reason for hiding this comment

Uh oh!

s-kanaev commented Mar 14, 2022

Uh oh!

bader commented Mar 14, 2022

Uh oh!

t4c1 commented Mar 14, 2022

Uh oh!

bader commented Mar 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

t4c1 commented Mar 7, 2022 •

edited

Loading