Proof of Concept: VAES Support#144
Conversation
|
All SMHasher tests passed with latest fix |
|
On EPYC 7773X, one can get the following results with this patch: Without |
| current[2] = current[2].aesenc(blocks[2]); | ||
| current[3] = current[3].aesenc(blocks[3]); | ||
| sum[0] = sum[0].shuffle_and_add(blocks[0]); | ||
| sum[1] = sum[1].shuffle_and_add(blocks[1]); |
| # Use VAES extension if possible. The hash value may be incompatible with NON-VAES targets | ||
| vaes = [] |
There was a problem hiding this comment.
We can use cfg to detect the feature and don't need a feature declared here.
| // Rust is confused with targets supporting VAES without AVX512 extensions. | ||
| // We need to manually specify the underlying intrinsic; otherwise the compiler | ||
| // will have trouble inlining the code. |
There was a problem hiding this comment.
Is there a link to an issue on this?
| target_feature = "avx512vaes", | ||
| not(miri) | ||
| ))] | ||
| if data.len() > 128 { |
There was a problem hiding this comment.
Rather than adding another 'if' I think a cleaner way to handle this would be to add a function to factor out a method in operations. Like there could be aesenc_x4 which then uses cfg to provide one implementation or the other depending on if the cpu instruction is available.
|
Thanks for putting this together. |
|
@tkaitchuck Sorry for the long delay. I was held back by some other stuffs. I would like to push this forward. Any suggestion for what to do next? |
|
I will give this another try in a new PR. |
The idea is to add use VAES instruction to scan wider length each loop iteration.
(I also tested scan the same length per loop with less instruction, but it does not speed up at all).
We can gain 100% speed up.
Without
VAESWith
VAES:Notice:
aeshash/wider-stringis same asaeshash/stringexcept that its lengths set has larger data point.vaespassed all quality tests but its hash value may not be compatible with non-aes targets.