Conversation
|
This looks very promising! If we could also generate versions with The |
|
I can confirm the performance gain compared to explicit AVX on desktop Zen 4 as well, built with |
Shnatsel
left a comment
There was a problem hiding this comment.
Looks great to me!
This supersedes both wide and explicit AVX intrinsics for YCbCr. Unlike wide, this can be compiled with #[target_feature] and used with runtime CPU feature detection.
I can't wait for this to be merged and to express the AVX YCbCr in terms of this!
This gives the compiler more information to vectorize the code.
On zen3 with target-cpu=native this is nearly 40% faster in the ycbcr criterion micro benchmark than the current avx2 code path.