add alignment on JPEG [i16; 64] blocks#28
Conversation
…uture optimizations and auto vectorization
|
Just some random thoughts, but wouldn't it be better to expose the 8x8 nature of the blocks to the compiler by introducing a Block structure that is backed by an [8; [i16;8]]? |
|
We can even keep using unaligned loads for safety and the compiler will automatically transform them into aligned ones where possible. |
|
The best row size depends on how wide the SIMD registers are, so it probably is better not to hardcode. The main advantage of this change is that even if you transmute to a SIMD in-place, the compile knows that it's properly aligned. If you use the bytemuck crate you can cast directly from [i16;64] to the appropriate SIMD type safely (it statically asserts on the alignment). |
|
Also there's some changes in here that came from cargo fmt, maybe as part of the checkin test you can add to the github action so that changes are always checked for formatting before checkin |
|
I checked the generated code and it's true the compiler changes the loads to aligned with this change. |
I've add the check and did a cargo fmt in #30. Otherwise, this PR looks good. There are a few merge conflicts that need to be resolved before it can be merged. |
|
I've fixed the merge issues and cleaned up any unnecessary changes. It needs workflow approval though to run the tests. Thanks! |
Ensure that we access coefficient blocks 32 byte aligned so that we can autovectorize and optimize some codepaths via SIMD