[SYCL][CUDA] tf32 matrix MAD impl using uint32_t#5709
[SYCL][CUDA] tf32 matrix MAD impl using uint32_t#5709JackAKirk wants to merge 6 commits intointel:syclfrom
Conversation
Signed-off-by: jack.kirk <jack.kirk@codeplay.com>
| buffer<uint32_t, 1> bufB(B, range<1>(K * N)); | ||
| buffer<float, 1> bufC(C, range<1>(M * N)); | ||
| buffer<float, 1> bufD(D, range<1>(M * N)); | ||
|
|
There was a problem hiding this comment.
Can you add a complete example in test/matrix where you show the necessary "manual" conversion function from float to fp19(uint32) during initialization and then from fp19 to float during accumulation and verification?
There was a problem hiding this comment.
Yeah it's here: intel/llvm-test-suite#881
for the float to fp19
uint32_t make_tf32(float const &x);
For the fp19 to float:
float tf32_to_fp32(uint32_t x);
(I'll rename both to e.g. make_fp19)
| // number of rows of a. | ||
| constexpr int K = 8; // number of cols of a/number of rows of b. | ||
|
|
||
| uint32_t A[M * K]; |
There was a problem hiding this comment.
add a comment that uint32 is used here as a storage for fp19
There was a problem hiding this comment.
Thanks for the comments, I've updated both tests now.
|
LGTM but we need to start adopting the name tf32 instead of fp19. |
|
This PR is no longer necessary: The complete tf32 implementation is now ready which can replace this PR: #5870 |
CUDA backend Implementation of tf32 MAD using the underlying 32 bit type, fully consistent with the existing matrix extension.
Integration test added here: intel/llvm-test-suite#881