[SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax#5724
Merged
bader merged 21 commits intointel:syclfrom Mar 14, 2022
Merged
[SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax#5724bader merged 21 commits intointel:syclfrom
bader merged 21 commits intointel:syclfrom
Conversation
Adds support for the following builtins:
abs, neg:
- .bf16,
- .bf16x2
min, max
- {.ftz}{.NaN}{.xorsign.abs}.f16
- {.ftz}{.NaN}{.xorsign.abs}.f16x2
- {.NaN}{.xorsign.abs}.bf16
- {.NaN}{.xorsign.abs}.bf16x2
- {.ftz}{.NaN}{.xorsign.abs}.f32
Differential Revision: https://reviews.llvm.org/D117887
This patch adds builtins/intrinsics for the following variants of FMA: NOTE: follow-up commit with the missing clang-side changes. - f16, f16x2 - rn - rn_ftz - rn_sat - rn_ftz_sat - rn_relu - rn_ftz_relu - bf16, bf16x2 - rn - rn_relu ptxas (Cuda compilation tools, release 11.0, V11.0.194) is happy with the generated assembly. Differential Revision: https://reviews.llvm.org/D118977
NOTE: this is a follow-up commit with the missing clang-side changes. This patch adds builtins and intrinsics for the f16 and f16x2 variants of the ex2 instruction. These two variants were added in PTX7.0, and are supported by sm_75 and above. Note that this isn't wired with the exp2 llvm intrinsic because the ex2 instruction is only available in its approx variant. Running ptxas on the assembly generated by the test f16-ex2.ll works as expected. Differential Revision: https://reviews.llvm.org/D119157
This was referenced Mar 7, 2022
bader
reviewed
Mar 9, 2022
Apply review suggestions. Co-authored-by: Alexey Bader <alexey.bader@intel.com>
bader
previously approved these changes
Mar 9, 2022
Contributor
bader
left a comment
There was a problem hiding this comment.
libclc changes look good to me.
bader
previously approved these changes
Mar 9, 2022
smanna12
reviewed
Mar 9, 2022
smanna12
reviewed
Mar 9, 2022
Contributor
Author
|
I just removed the changes to native_exp2, as that is being implemented in a slightly different way in #5747. |
smanna12
approved these changes
Mar 9, 2022
Contributor
There was a problem hiding this comment.
FE changes LGTM As per comment: #5724 (comment)
This change (everything in clang folder) are already merged upstream. I just added them to this PR as they are required to build it. They will be part of the next pulldown.
This PR also contains some changes (everything in clang folder) that have been merged in upstream llvm since last pulldown and are required for building it.
@t4c1, could you please add upstream link?
Contributor
Author
|
Done - updated the PR description. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For functions fma, fmin, fmax and fmax adds bf16 builtins to libclc and optimizes half builtins to use half instructions if supported by the device.
This PR also contains some changes (everything in clang folder) that have been merged in upstream llvm since last pulldown and are required for building it. There are parts of (something went wrong when merging these, so only parts were merged at first. The changes in this PR are the remainder): https://reviews.llvm.org/D118977 https://reviews.llvm.org/D117887 https://reviews.llvm.org/D119157
Tests for half changes are in intel/llvm-test-suite#880. Tests for bf16 implementations will be added together with adding support for these to runtime in future PRs.