static FORWARD: [usize; 128] = [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127,
];
static REVERSED: [usize; 128] = [
0, 64, 32, 96, 16, 80, 48, 112, 8, 72, 40, 104, 24, 88, 56, 120, 4, 68, 36, 100, 20, 84, 52,
116, 12, 76, 44, 108, 28, 92, 60, 124, 2, 66, 34, 98, 18, 82, 50, 114, 10, 74, 42, 106, 26, 90,
58, 122, 6, 70, 38, 102, 22, 86, 54, 118, 14, 78, 46, 110, 30, 94, 62, 126, 1, 65, 33, 97, 17,
81, 49, 113, 9, 73, 41, 105, 25, 89, 57, 121, 5, 69, 37, 101, 21, 85, 53, 117, 13, 77, 45, 109,
29, 93, 61, 125, 3, 67, 35, 99, 19, 83, 51, 115, 11, 75, 43, 107, 27, 91, 59, 123, 7, 71, 39,
103, 23, 87, 55, 119, 15, 79, 47, 111, 31, 95, 63, 127,
];
But it would be good if these benches could be double checked by someone else to make sure this isn't all just artefactual on on my particular set up or something...
I was just playing around with this library and was having a look at the
cobra_applyfunction. It recalculatesa.reverse_bits() >> ((block_width - 1).leading_zeros());an awful lot due to the number of loops, and it showed up as a big chunk in the flame chart.Block width is fixed at 128, and
a(andbandcwhere relevant) are also always just0..BLOCK_WIDTH. I tried just extracting it into a lookup table:and then replacing all the similar
with
and I got 25 - 30% gains across
log_n = 15..20.The
cobratest still seems to pass fine. Is this a valid optimisation?EDIT: actually just doing
REVERSED.into_iter().enumerate()is both more elegant and worth another couple of percent on my laptop.I also note that declaring
REVERSEDasstatic, rather thanconstharms performance by a few percent.But it would be good if these benches could be double checked by someone else to make sure this isn't all just artefactual on on my particular set up or something...