release: v0.2.3 — ship RoPE fix to crates.io (yank 0.2.2)#98
Merged
Conversation
crates.io v0.2.2 was published 2026-05-20, before the RoPE pairing fix landed (PR #96, merged today). Cannot republish 0.2.2 (immutable on crates.io), so bumping to 0.2.3 to ship the fix. v0.2.2 will be yanked on crates.io post-publish to prevent new installs from getting the broken interleaved RoPE. - Workspace version 0.2.2 → 0.2.3 - Internal path-dep minimum versions bumped to 0.2.3 - Release notes renamed v0.2.2.md → v0.2.3.md with yank notice - GitHub tag v0.2.2 left in place for history Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Perf regression report (ADR-058)
|
| Bench | Δ point | 95% CI | new ns | base ns | verdict |
|---|---|---|---|---|---|
simd_query_batch_dot_product/pair_loop/768d_256c |
+9.85% | [+9.73%, +9.97%] | 22011.7 | 20037.2 | ❌ FAIL |
simd_query_batch_dot_product/simd_batch/768d_256c |
+9.18% | [+9.06%, +9.30%] | 18560.3 | 16999.2 | ❌ FAIL |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c |
+7.92% | [+7.53%, +8.34%] | 38869.6 | 36016.0 | ❌ FAIL |
simd_batch_cosine_normalized_query/simd_batch/1024d_256c |
+7.50% | [+7.15%, +7.81%] | 38907.4 | 36194.0 | ❌ FAIL |
simd_query_batch_dot_product/simd_batch/768d_16c |
+5.97% | [+5.95%, +5.98%] | 661.7 | 624.5 | ⚠ WARN |
int8_vs_float32_cosine/int8/384 |
+5.27% | [+4.94%, +5.65%] | 17.2 | 16.3 | ⚠ WARN |
int8_batch_cosine/int8_loop/1000 |
+5.08% | [+4.87%, +5.28%] | 19070.7 | 18148.9 | ⚠ WARN |
simd_query_batch_dot_product/simd_batch/128d_64c |
+3.92% | [+3.88%, +3.97%] | 538.9 | 518.6 | ⚠ WARN |
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c |
-3.24% | [-3.36%, -3.13%] | 28436.5 | 29389.1 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c |
-3.29% | [-3.64%, -2.94%] | 35245.0 | 36444.8 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c |
-3.74% | [-3.78%, -3.68%] | 946.3 | 983.0 | 🚀 WIN |
simd_dot_product/simd/1024 |
-4.12% | [-4.19%, -4.04%] | 70.6 | 73.6 | 🚀 WIN |
simd_throughput_384/dot_product |
-4.75% | [-4.91%, -4.60%] | 31.5 | 33.1 | 🚀 WIN |
simd_batch_cosine/simd_batch/1000 |
-6.18% | [-6.26%, -6.10%] | 81374.1 | 86738.6 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c |
-8.01% | [-8.10%, -7.92%] | 1385.7 | 1506.4 | 🚀 WIN |
simd_batch_dot_product/simd_batch/1000 |
-14.56% | [-14.61%, -14.50%] | 73353.0 | 85852.2 | 🚀 WIN |
All 247 measurements
| Bench | Δ point | CI-lower | CI-upper |
|---|---|---|---|
add_bias_gelu/4096 |
-0.01% | -0.02% | +0.00% |
add_bias_gelu/896 |
+0.00% | -0.01% | +0.02% |
binary_cosine_distance/binary/1024 |
+0.14% | +0.12% | +0.15% |
binary_cosine_distance/binary/1536 |
+0.48% | +0.46% | +0.50% |
binary_cosine_distance/binary/384 |
+0.33% | +0.31% | +0.35% |
binary_cosine_distance/binary/768 |
+0.15% | +0.13% | +0.16% |
binary_cosine_distance/float32_simd/1024 |
-0.05% | -0.08% | -0.02% |
binary_cosine_distance/float32_simd/1536 |
+0.03% | +0.02% | +0.04% |
binary_cosine_distance/float32_simd/384 |
+0.22% | +0.19% | +0.25% |
binary_cosine_distance/float32_simd/768 |
+0.15% | +0.13% | +0.17% |
elementwise_mul/4096 |
-2.51% | -2.55% | -2.47% |
gelu/4096 |
+0.00% | -0.02% | +0.02% |
gelu/896 |
+0.00% | -0.01% | +0.01% |
int4_cosine_distance/float32_simd/1024 |
+0.30% | +0.26% | +0.34% |
int4_cosine_distance/float32_simd/1536 |
+0.05% | +0.04% | +0.06% |
int4_cosine_distance/float32_simd/384 |
+0.16% | +0.13% | +0.19% |
int4_cosine_distance/float32_simd/768 |
+0.07% | +0.05% | +0.09% |
int4_cosine_distance/int4/1024 |
-0.04% | -0.06% | -0.01% |
int4_cosine_distance/int4/1536 |
-0.10% | -0.13% | -0.07% |
int4_cosine_distance/int4/384 |
+0.12% | +0.09% | +0.16% |
int4_cosine_distance/int4/768 |
+0.46% | +0.43% | +0.49% |
int8_batch_cosine/float32_simd/10 |
-0.06% | -0.07% | -0.05% |
int8_batch_cosine/float32_simd/100 |
+0.45% | +0.42% | +0.47% |
int8_batch_cosine/float32_simd/1000 |
-1.04% | -1.11% | -0.97% |
int8_batch_cosine/int8_loop/10 |
-0.02% | -0.05% | +0.02% |
int8_batch_cosine/int8_loop/100 |
+0.51% | +0.49% | +0.54% |
int8_batch_cosine/int8_loop/1000 |
+5.08% | +4.87% | +5.28% |
int8_prepared_dot_product/per_call/1024 |
-0.00% | -0.02% | +0.01% |
int8_prepared_dot_product/per_call/127 |
-0.05% | -0.06% | -0.04% |
int8_prepared_dot_product/per_call/128 |
-0.01% | -0.01% | +0.01% |
int8_prepared_dot_product/per_call/129 |
-0.02% | -0.03% | -0.01% |
int8_prepared_dot_product/per_call/384 |
-0.01% | -0.04% | +0.01% |
int8_prepared_dot_product/per_call/768 |
+0.01% | -0.00% | +0.02% |
int8_prepared_dot_product/prepared/1024 |
-0.78% | -0.85% | -0.71% |
int8_prepared_dot_product/prepared/127 |
-0.40% | -0.43% | -0.38% |
int8_prepared_dot_product/prepared/128 |
+1.19% | +0.86% | +1.50% |
int8_prepared_dot_product/prepared/129 |
-0.32% | -0.35% | -0.29% |
int8_prepared_dot_product/prepared/384 |
-2.64% | -2.69% | -2.59% |
int8_prepared_dot_product/prepared/768 |
+0.04% | -0.02% | +0.08% |
int8_quantization/quantize/1024 |
-0.00% | -0.02% | +0.01% |
int8_quantization/quantize/1536 |
+0.11% | +0.09% | +0.12% |
int8_quantization/quantize/384 |
+0.00% | -0.00% | +0.01% |
int8_quantization/quantize/768 |
-0.01% | -0.03% | -0.00% |
int8_raw_dot_product/dot_product_i8/1024 |
+0.59% | +0.51% | +0.68% |
int8_raw_dot_product/dot_product_i8/127 |
-0.17% | -0.20% | -0.14% |
int8_raw_dot_product/dot_product_i8/128 |
+1.61% | +1.54% | +1.68% |
int8_raw_dot_product/dot_product_i8/129 |
+0.36% | +0.35% | +0.38% |
int8_raw_dot_product/dot_product_i8/384 |
-1.19% | -1.29% | -1.10% |
int8_raw_dot_product/dot_product_i8/768 |
-1.17% | -1.22% | -1.13% |
int8_raw_dot_product/dot_product_i8_raw/1024 |
+0.01% | -0.03% | +0.04% |
int8_raw_dot_product/dot_product_i8_raw/127 |
-0.82% | -0.84% | -0.80% |
int8_raw_dot_product/dot_product_i8_raw/128 |
+0.03% | -0.04% | +0.10% |
int8_raw_dot_product/dot_product_i8_raw/129 |
+0.77% | +0.71% | +0.83% |
int8_raw_dot_product/dot_product_i8_raw/384 |
-0.33% | -0.42% | -0.25% |
int8_raw_dot_product/dot_product_i8_raw/768 |
-0.40% | -0.42% | -0.38% |
int8_vs_float32_cosine/float32_simd/1024 |
+0.21% | +0.19% | +0.22% |
int8_vs_float32_cosine/float32_simd/1536 |
+0.02% | +0.00% | +0.04% |
int8_vs_float32_cosine/float32_simd/384 |
-0.19% | -0.31% | -0.08% |
int8_vs_float32_cosine/float32_simd/768 |
+0.09% | +0.04% | +0.14% |
int8_vs_float32_cosine/int8/1024 |
-0.89% | -0.93% | -0.85% |
int8_vs_float32_cosine/int8/1536 |
-0.23% | -0.34% | -0.14% |
int8_vs_float32_cosine/int8/384 |
+5.27% | +4.94% | +5.65% |
int8_vs_float32_cosine/int8/768 |
-0.42% | -0.51% | -0.34% |
layer_norm/4096 |
+0.11% | +0.08% | +0.13% |
layer_norm/896 |
-0.06% | -0.10% | -0.03% |
memory_size/search_1000_float32 |
+0.46% | +0.43% | +0.49% |
memory_size/search_1000_int8 |
+1.13% | +1.05% | +1.21% |
rms_norm/4096 |
+1.23% | +1.18% | +1.27% |
rms_norm/896 |
-0.11% | -0.14% | -0.08% |
silu_inplace/4096 |
+0.01% | -0.01% | +0.02% |
silu_inplace/896 |
+0.00% | -0.02% | +0.02% |
simd_batch_cosine/scalar_loop/10 |
-0.05% | -0.07% | -0.04% |
simd_batch_cosine/scalar_loop/100 |
+0.12% | +0.09% | +0.14% |
simd_batch_cosine/scalar_loop/1000 |
-0.27% | -0.30% | -0.24% |
simd_batch_cosine/simd_batch/10 |
+0.26% | +0.25% | +0.27% |
simd_batch_cosine/simd_batch/100 |
+0.73% | +0.71% | +0.75% |
simd_batch_cosine/simd_batch/1000 |
-6.18% | -6.26% | -6.10% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c |
+0.20% | +0.17% | +0.24% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c |
+1.04% | +1.03% | +1.05% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c |
+0.82% | +0.43% | +1.17% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c |
+0.12% | +0.11% | +0.13% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c |
+0.58% | +0.57% | +0.59% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c |
+0.46% | +0.42% | +0.49% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c |
-0.07% | -0.09% | -0.06% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c |
+0.13% | +0.11% | +0.14% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c |
+0.29% | +0.25% | +0.35% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c |
+0.36% | +0.35% | +0.37% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c |
-0.68% | -0.90% | -0.35% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c |
+0.30% | +0.29% | +0.31% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c |
-2.92% | -3.02% | -2.81% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c |
+0.19% | +0.18% | +0.19% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c |
+0.66% | +0.65% | +0.67% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c |
-0.17% | -0.23% | -0.12% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c |
+1.04% | +1.02% | +1.05% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c |
-3.29% | -3.64% | -2.94% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c |
+0.12% | +0.11% | +0.14% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c |
+0.41% | +0.40% | +0.43% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c |
-0.14% | -0.20% | -0.07% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c |
+0.03% | +0.02% | +0.04% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c |
-0.02% | -0.04% | -0.01% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c |
+0.10% | +0.09% | +0.12% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c |
+0.38% | +0.37% | +0.39% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c |
-1.04% | -1.09% | -0.99% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c |
+0.49% | +0.48% | +0.50% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c |
-3.24% | -3.36% | -3.13% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c |
+0.05% | +0.04% | +0.05% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c |
+0.49% | +0.48% | +0.51% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c |
-1.17% | -1.31% | -0.96% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c |
-0.77% | -0.79% | -0.76% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c |
+7.92% | +7.53% | +8.34% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c |
-0.02% | -0.03% | -0.01% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c |
+0.27% | +0.26% | +0.29% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c |
+0.75% | +0.71% | +0.80% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c |
-0.03% | -0.04% | -0.01% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c |
+0.11% | +0.09% | +0.13% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c |
+0.01% | -0.01% | +0.02% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c |
+0.80% | +0.79% | +0.81% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c |
+2.15% | +2.12% | +2.19% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c |
-0.74% | -0.75% | -0.72% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c |
-0.61% | -0.70% | -0.52% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c |
+0.03% | +0.02% | +0.05% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c |
+0.54% | +0.53% | +0.55% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c |
+1.53% | +1.47% | +1.58% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c |
-8.01% | -8.10% | -7.92% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c |
+1.86% | +1.38% | +2.33% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c |
+2.66% | +2.56% | +2.74% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c |
+0.27% | +0.25% | +0.29% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c |
+1.62% | +1.58% | +1.66% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c |
-0.47% | -0.52% | -0.41% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c |
+0.71% | +0.68% | +0.75% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c |
+0.98% | +0.90% | +1.06% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c |
+1.10% | +1.09% | +1.12% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c |
+3.45% | +2.99% | +4.06% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c |
-3.74% | -3.78% | -3.68% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c |
-2.43% | -2.51% | -2.35% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c |
+0.83% | +0.79% | +0.88% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c |
-0.73% | -0.85% | -0.62% |
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c |
-1.27% | -1.31% | -1.24% |
simd_batch_cosine_normalized_query/simd_batch/1024d_16c |
-0.73% | -0.74% | -0.71% |
simd_batch_cosine_normalized_query/simd_batch/1024d_256c |
+7.50% | +7.15% | +7.81% |
simd_batch_cosine_normalized_query/simd_batch/1024d_4c |
+0.04% | +0.03% | +0.05% |
simd_batch_cosine_normalized_query/simd_batch/1024d_64c |
+0.17% | +0.15% | +0.18% |
simd_batch_cosine_normalized_query/simd_batch/384d_1000c |
+0.57% | +0.54% | +0.60% |
simd_batch_cosine_normalized_query/simd_batch/384d_16c |
-0.34% | -0.36% | -0.33% |
simd_batch_cosine_normalized_query/simd_batch/384d_256c |
-0.06% | -0.08% | -0.04% |
simd_batch_cosine_normalized_query/simd_batch/384d_4c |
-0.12% | -0.13% | -0.11% |
simd_batch_cosine_normalized_query/simd_batch/384d_64c |
+0.63% | +0.62% | +0.64% |
simd_batch_cosine_normalized_query/simd_batch/768d_1000c |
+1.84% | +1.79% | +1.88% |
simd_batch_cosine_normalized_query/simd_batch/768d_16c |
-0.38% | -0.39% | -0.37% |
simd_batch_cosine_normalized_query/simd_batch/768d_256c |
+1.90% | +1.73% | +2.10% |
simd_batch_cosine_normalized_query/simd_batch/768d_4c |
-0.12% | -0.13% | -0.10% |
simd_batch_cosine_normalized_query/simd_batch/768d_64c |
+0.53% | +0.52% | +0.54% |
simd_batch_dot_product/scalar_loop/10 |
+0.05% | +0.05% | +0.06% |
simd_batch_dot_product/scalar_loop/100 |
-0.26% | -0.32% | -0.21% |
simd_batch_dot_product/scalar_loop/1000 |
-0.75% | -0.82% | -0.68% |
simd_batch_dot_product/simd_batch/10 |
-0.23% | -0.27% | -0.18% |
simd_batch_dot_product/simd_batch/100 |
-0.44% | -0.45% | -0.42% |
simd_batch_dot_product/simd_batch/1000 |
-14.56% | -14.61% | -14.50% |
simd_cosine_similarity/scalar/1024 |
-0.00% | -0.03% | +0.03% |
simd_cosine_similarity/scalar/1536 |
+0.04% | +0.03% | +0.05% |
simd_cosine_similarity/scalar/384 |
-0.26% | -0.32% | -0.21% |
simd_cosine_similarity/scalar/768 |
-0.08% | -0.10% | -0.06% |
simd_cosine_similarity/simd/1024 |
-0.31% | -0.33% | -0.29% |
simd_cosine_similarity/simd/1536 |
-0.11% | -0.13% | -0.09% |
simd_cosine_similarity/simd/384 |
+1.35% | +1.19% | +1.52% |
simd_cosine_similarity/simd/768 |
+0.02% | -0.01% | +0.06% |
simd_dot_product/scalar/1024 |
+0.01% | -0.03% | +0.06% |
simd_dot_product/scalar/1536 |
-0.00% | -0.01% | +0.01% |
simd_dot_product/scalar/384 |
-0.00% | -0.02% | +0.01% |
simd_dot_product/scalar/768 |
+0.02% | +0.00% | +0.03% |
simd_dot_product/simd/1024 |
-4.12% | -4.19% | -4.04% |
simd_dot_product/simd/1536 |
-0.11% | -0.17% | -0.04% |
simd_dot_product/simd/384 |
+0.00% | -0.05% | +0.05% |
simd_dot_product/simd/768 |
-0.32% | -0.39% | -0.25% |
simd_euclidean_distance/scalar/1024 |
+0.02% | -0.01% | +0.04% |
simd_euclidean_distance/scalar/1536 |
-0.01% | -0.03% | +0.00% |
simd_euclidean_distance/scalar/384 |
-0.25% | -0.29% | -0.21% |
simd_euclidean_distance/scalar/768 |
+0.01% | -0.04% | +0.06% |
simd_euclidean_distance/simd/1024 |
-0.17% | -0.20% | -0.14% |
simd_euclidean_distance/simd/1536 |
+0.26% | +0.25% | +0.27% |
simd_euclidean_distance/simd/384 |
+0.89% | +0.83% | +0.93% |
simd_euclidean_distance/simd/768 |
+0.72% | +0.64% | +0.77% |
simd_normalize/scalar/1024 |
-0.34% | -0.54% | -0.15% |
simd_normalize/scalar/1536 |
-0.36% | -0.54% | -0.16% |
simd_normalize/scalar/384 |
-0.17% | -0.56% | +0.21% |
simd_normalize/scalar/768 |
-0.48% | -0.69% | -0.27% |
simd_normalize/simd/1024 |
+0.21% | -0.70% | +1.15% |
simd_normalize/simd/1536 |
+1.44% | +0.60% | +2.25% |
simd_normalize/simd/384 |
-0.19% | -1.60% | +1.31% |
simd_normalize/simd/768 |
+0.94% | -0.26% | +2.13% |
simd_normalized_cosine_fast_path/cosine_full/1024 |
+0.52% | +0.48% | +0.55% |
simd_normalized_cosine_fast_path/cosine_full/384 |
+0.32% | +0.20% | +0.44% |
simd_normalized_cosine_fast_path/cosine_full/768 |
+0.13% | +0.07% | +0.19% |
simd_normalized_cosine_fast_path/dot_product/1024 |
-1.76% | -1.91% | -1.61% |
simd_normalized_cosine_fast_path/dot_product/384 |
-1.94% | -2.07% | -1.78% |
simd_normalized_cosine_fast_path/dot_product/768 |
-1.92% | -2.00% | -1.84% |
simd_prepared_query_normalized_cosine/dot_product_loop/1024 |
+2.08% | +1.97% | +2.17% |
simd_prepared_query_normalized_cosine/dot_product_loop/384 |
+1.13% | +1.05% | +1.21% |
simd_prepared_query_normalized_cosine/dot_product_loop/768 |
+2.53% | +2.44% | +2.61% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 |
+0.16% | +0.11% | +0.22% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 |
+0.74% | +0.67% | +0.80% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 |
+0.10% | +0.05% | +0.16% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 |
+1.72% | +1.63% | +1.83% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 |
+0.86% | +0.67% | +1.04% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 |
+1.43% | +1.34% | +1.51% |
simd_query_batch_dot_product/pair_loop/128d_16c |
-0.51% | -0.55% | -0.47% |
simd_query_batch_dot_product/pair_loop/128d_256c |
+0.23% | +0.20% | +0.25% |
simd_query_batch_dot_product/pair_loop/128d_4c |
-0.26% | -0.34% | -0.18% |
simd_query_batch_dot_product/pair_loop/128d_64c |
+1.60% | +1.53% | +1.67% |
simd_query_batch_dot_product/pair_loop/384d_16c |
-0.68% | -0.75% | -0.61% |
simd_query_batch_dot_product/pair_loop/384d_256c |
+0.34% | +0.32% | +0.37% |
simd_query_batch_dot_product/pair_loop/384d_4c |
+1.78% | +1.65% | +1.91% |
simd_query_batch_dot_product/pair_loop/384d_64c |
+1.99% | +1.98% | +2.01% |
simd_query_batch_dot_product/pair_loop/768d_16c |
-1.39% | -1.42% | -1.36% |
simd_query_batch_dot_product/pair_loop/768d_256c |
+9.85% | +9.73% | +9.97% |
simd_query_batch_dot_product/pair_loop/768d_4c |
+0.44% | +0.41% | +0.48% |
simd_query_batch_dot_product/pair_loop/768d_64c |
-0.66% | -0.79% | -0.54% |
simd_query_batch_dot_product/simd_batch/128d_16c |
-0.30% | -0.34% | -0.26% |
simd_query_batch_dot_product/simd_batch/128d_256c |
-0.77% | -0.86% | -0.68% |
simd_query_batch_dot_product/simd_batch/128d_4c |
+0.02% | -0.02% | +0.06% |
simd_query_batch_dot_product/simd_batch/128d_64c |
+3.92% | +3.88% | +3.97% |
simd_query_batch_dot_product/simd_batch/384d_16c |
+0.50% | +0.48% | +0.52% |
simd_query_batch_dot_product/simd_batch/384d_256c |
+0.69% | +0.61% | +0.77% |
simd_query_batch_dot_product/simd_batch/384d_4c |
+0.05% | +0.02% | +0.08% |
simd_query_batch_dot_product/simd_batch/384d_64c |
+2.25% | +2.16% | +2.35% |
simd_query_batch_dot_product/simd_batch/768d_16c |
+5.97% | +5.95% | +5.98% |
simd_query_batch_dot_product/simd_batch/768d_256c |
+9.18% | +9.06% | +9.30% |
simd_query_batch_dot_product/simd_batch/768d_4c |
+0.09% | +0.07% | +0.11% |
simd_query_batch_dot_product/simd_batch/768d_64c |
+1.58% | +1.51% | +1.65% |
simd_squared_euclidean_fast_path/euclidean_full/1024 |
+0.06% | +0.03% | +0.08% |
simd_squared_euclidean_fast_path/euclidean_full/384 |
+0.82% | +0.77% | +0.87% |
simd_squared_euclidean_fast_path/euclidean_full/768 |
+0.29% | +0.26% | +0.31% |
simd_squared_euclidean_fast_path/squared_euclidean/1024 |
+0.23% | +0.20% | +0.25% |
simd_squared_euclidean_fast_path/squared_euclidean/384 |
-0.17% | -0.22% | -0.13% |
simd_squared_euclidean_fast_path/squared_euclidean/768 |
+0.05% | +0.03% | +0.08% |
simd_throughput_384/cosine_similarity |
+0.27% | +0.19% | +0.35% |
simd_throughput_384/dot_product |
-4.75% | -4.91% | -4.60% |
simd_throughput_384/euclidean_distance |
+1.28% | +1.25% | +1.31% |
simd_throughput_384/normalize |
-1.31% | -1.32% | -1.29% |
softmax_attention/128 |
-0.06% | -0.08% | -0.05% |
softmax_attention/512 |
+0.85% | +0.73% | +0.99% |
tier_prepared_query/binary_query_once_1000 |
-0.09% | -0.11% | -0.06% |
tier_prepared_query/binary_query_per_call_1000 |
+0.00% | -0.01% | +0.01% |
tier_prepared_query/int4_query_once_1000 |
-0.11% | -0.14% | -0.08% |
tier_prepared_query/int4_query_per_call_1000 |
-0.13% | -0.14% | -0.12% |
tier_prepared_query/int8_query_once_1000 |
+0.19% | +0.16% | +0.21% |
tier_prepared_query/int8_query_per_call_1000 |
+0.04% | +0.03% | +0.06% |
Rule: CI-lower of change ≤3.0% passes silently; (3.0%, 7.0%] warns; >7.0% fails. Override via PR label bench-allow-regression.
x86_64-linux — perf regression report
❌ 64 FAIL (regression >7.0% confirmed by 95% CI)
⚠ 8 WARN (regression 3.0-7.0% confirmed)
🚀 141 confirmed improvement
| Bench | Δ point | 95% CI | new ns | base ns | verdict |
|---|---|---|---|---|---|
int8_raw_dot_product/dot_product_i8/768 |
+43.67% | [+43.41%, +43.94%] | 30.2 | 21.0 | ❌ FAIL |
simd_throughput_384/cosine_similarity |
+43.30% | [+42.68%, +43.84%] | 44.6 | 31.1 | ❌ FAIL |
int8_vs_float32_cosine/int8/768 |
+39.18% | [+38.44%, +39.72%] | 33.5 | 24.0 | ❌ FAIL |
simd_throughput_384/euclidean_distance |
+38.77% | [+38.27%, +39.30%] | 35.3 | 25.4 | ❌ FAIL |
int8_prepared_dot_product/prepared/768 |
+37.41% | [+36.57%, +38.05%] | 29.7 | 21.6 | ❌ FAIL |
simd_query_batch_dot_product/pair_loop/384d_16c |
+35.09% | [+34.75%, +35.41%] | 480.1 | 355.4 | ❌ FAIL |
simd_query_batch_dot_product/pair_loop/384d_4c |
+34.70% | [+34.50%, +35.02%] | 128.8 | 95.6 | ❌ FAIL |
simd_throughput_384/dot_product |
+32.84% | [+32.53%, +33.17%] | 28.7 | 21.6 | ❌ FAIL |
int8_raw_dot_product/dot_product_i8_raw/768 |
+29.99% | [+29.42%, +30.38%] | 24.4 | 18.8 | ❌ FAIL |
int4_cosine_distance/float32_simd/1024 |
+27.24% | [+26.70%, +27.82%] | 87.3 | 68.6 | ❌ FAIL |
layer_norm/4096 |
+26.77% | [+26.09%, +27.25%] | 874.0 | 689.5 | ❌ FAIL |
simd_normalized_cosine_fast_path/cosine_full/384 |
+26.57% | [+25.75%, +27.29%] | 44.2 | 34.9 | ❌ FAIL |
simd_batch_cosine/simd_batch/10 |
+25.92% | [+25.30%, +26.51%] | 429.1 | 340.7 | ❌ FAIL |
binary_cosine_distance/float32_simd/384 |
+25.83% | [+25.01%, +26.56%] | 46.4 | 36.9 | ❌ FAIL |
int4_cosine_distance/float32_simd/1536 |
+24.81% | [+24.29%, +25.34%] | 119.8 | 96.0 | ❌ FAIL |
simd_euclidean_distance/simd/1024 |
+24.73% | [+24.24%, +25.22%] | 79.6 | 63.8 | ❌ FAIL |
simd_squared_euclidean_fast_path/euclidean_full/384 |
+24.63% | [+24.03%, +25.22%] | 35.2 | 28.3 | ❌ FAIL |
int8_raw_dot_product/dot_product_i8_raw/1024 |
+23.90% | [+23.50%, +24.31%] | 30.8 | 24.9 | ❌ FAIL |
simd_squared_euclidean_fast_path/euclidean_full/1024 |
+23.53% | [+23.05%, +24.04%] | 79.7 | 64.6 | ❌ FAIL |
int8_batch_cosine/float32_simd/10 |
+23.85% | [+22.99%, +24.59%] | 438.4 | 353.9 | ❌ FAIL |
int8_vs_float32_cosine/int8/384 |
+23.28% | [+22.87%, +23.69%] | 18.2 | 14.8 | ❌ FAIL |
int8_batch_cosine/int8_loop/1000 |
+23.33% | [+22.46%, +24.20%] | 20409.0 | 16548.2 | ❌ FAIL |
int8_vs_float32_cosine/float32_simd/1536 |
+22.95% | [+22.21%, +23.64%] | 118.6 | 96.5 | ❌ FAIL |
simd_batch_cosine_normalized_query/simd_batch/384d_16c |
+22.62% | [+21.87%, +23.29%] | 676.8 | 552.0 | ❌ FAIL |
int8_prepared_dot_product/prepared/1024 |
+22.08% | [+21.84%, +22.34%] | 34.7 | 28.5 | ❌ FAIL |
int4_cosine_distance/float32_simd/384 |
+21.29% | [+20.76%, +21.83%] | 46.6 | 38.4 | ❌ FAIL |
simd_normalized_cosine_fast_path/cosine_full/1024 |
+21.26% | [+20.76%, +21.76%] | 82.0 | 67.6 | ❌ FAIL |
int8_vs_float32_cosine/int8/1024 |
+21.12% | [+20.45%, +21.59%] | 38.1 | 31.4 | ❌ FAIL |
simd_query_batch_dot_product/simd_batch/384d_16c |
+20.40% | [+19.77%, +20.94%] | 261.3 | 217.0 | ❌ FAIL |
simd_query_batch_dot_product/simd_batch/384d_4c |
+19.54% | [+19.39%, +19.68%] | 75.6 | 63.3 | ❌ FAIL |
int8_vs_float32_cosine/int8/1536 |
+19.46% | [+18.56%, +20.14%] | 49.5 | 41.5 | ❌ FAIL |
binary_cosine_distance/binary/384 |
+18.31% | [+18.00%, +18.58%] | 49.6 | 41.9 | ❌ FAIL |
simd_cosine_similarity/simd/1536 |
+18.55% | [+17.79%, +19.24%] | 112.4 | 94.8 | ❌ FAIL |
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c |
+17.96% | [+17.44%, +18.49%] | 659.1 | 558.7 | ❌ FAIL |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c |
+16.79% | [+16.26%, +17.35%] | 695.5 | 595.5 | ❌ FAIL |
simd_euclidean_distance/simd/768 |
+16.38% | [+15.83%, +16.92%] | 54.6 | 46.9 | ❌ FAIL |
simd_dot_product/simd/1536 |
+16.08% | [+15.73%, +16.35%] | 94.4 | 81.3 | ❌ FAIL |
int8_batch_cosine/int8_loop/10 |
+16.00% | [+15.64%, +16.40%] | 177.7 | 153.2 | ❌ FAIL |
memory_size/search_1000_int8 |
+15.95% | [+15.26%, +16.48%] | 16717.3 | 14417.7 | ❌ FAIL |
simd_squared_euclidean_fast_path/squared_euclidean/1024 |
+15.44% | [+14.96%, +15.93%] | 75.0 | 65.0 | ❌ FAIL |
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c |
+15.13% | [+14.69%, +15.59%] | 680.9 | 591.4 | ❌ FAIL |
simd_dot_product/simd/1024 |
+14.40% | [+14.08%, +14.74%] | 64.1 | 56.0 | ❌ FAIL |
int8_raw_dot_product/dot_product_i8_raw/384 |
+14.40% | [+13.84%, +14.97%] | 13.0 | 11.3 | ❌ FAIL |
int8_vs_float32_cosine/float32_simd/384 |
+14.61% | [+13.77%, +15.39%] | 44.9 | 39.1 | ❌ FAIL |
int8_raw_dot_product/dot_product_i8/384 |
+13.58% | [+13.14%, +14.01%] | 15.4 | 13.6 | ❌ FAIL |
simd_batch_cosine_normalized_query/simd_batch/384d_4c |
+13.42% | [+12.98%, +13.86%] | 176.8 | 155.9 | ❌ FAIL |
simd_normalized_cosine_fast_path/dot_product/384 |
+12.97% | [+12.78%, +13.11%] | 28.6 | 25.3 | ❌ FAIL |
int8_batch_cosine/int8_loop/100 |
+13.09% | [+12.48%, +13.56%] | 1799.9 | 1591.6 | ❌ FAIL |
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c |
+12.20% | [+11.83%, +12.58%] | 169.7 | 151.3 | ❌ FAIL |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c |
+12.33% | [+11.70%, +12.98%] | 183.6 | 163.4 | ❌ FAIL |
int8_prepared_dot_product/prepared/384 |
+11.93% | [+11.38%, +12.66%] | 15.8 | 14.1 | ❌ FAIL |
tier_prepared_query/int8_query_once_1000 |
+11.46% | [+10.67%, +11.98%] | 18584.2 | 16674.0 | ❌ FAIL |
tier_prepared_query/binary_query_once_1000 |
+10.82% | [+10.53%, +11.19%] | 48538.3 | 43801.2 | ❌ FAIL |
binary_cosine_distance/binary/768 |
+10.66% | [+10.40%, +10.98%] | 87.1 | 78.7 | ❌ FAIL |
int8_raw_dot_product/dot_product_i8/1024 |
+10.69% | [+10.15%, +11.05%] | 34.7 | 31.3 | ❌ FAIL |
simd_squared_euclidean_fast_path/euclidean_full/768 |
+10.21% | [+9.80%, +10.63%] | 54.5 | 49.5 | ❌ FAIL |
simd_query_batch_dot_product/simd_batch/768d_4c |
+10.12% | [+9.40%, +10.69%] | 131.7 | 119.6 | ❌ FAIL |
simd_normalized_cosine_fast_path/dot_product/1024 |
+9.60% | [+9.32%, +9.91%] | 62.7 | 57.2 | ❌ FAIL |
int8_raw_dot_product/dot_product_i8_raw/128 |
+9.26% | [+8.56%, +9.79%] | 7.2 | 6.6 | ❌ FAIL |
binary_cosine_distance/binary/1024 |
+8.52% | [+8.23%, +8.90%] | 111.9 | 103.1 | ❌ FAIL |
simd_euclidean_distance/simd/384 |
+8.42% | [+7.83%, +8.93%] | 35.3 | 32.6 | ❌ FAIL |
simd_query_batch_dot_product/simd_batch/128d_64c |
+8.11% | [+7.51%, +8.88%] | 501.6 | 464.0 | ❌ FAIL |
simd_cosine_similarity/simd/384 |
+7.71% | [+7.14%, +8.31%] | 45.1 | 41.8 | ❌ FAIL |
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c |
+7.55% | [+7.12%, +7.96%] | 179.2 | 166.6 | ❌ FAIL |
binary_cosine_distance/binary/1536 |
+6.53% | [+6.27%, +6.74%] | 161.8 | 151.9 | ⚠ WARN |
int8_prepared_dot_product/prepared/129 |
+6.64% | [+6.10%, +7.03%] | 11.0 | 10.3 | ⚠ WARN |
simd_squared_euclidean_fast_path/squared_euclidean/384 |
+6.44% | [+5.95%, +6.90%] | 30.3 | 28.5 | ⚠ WARN |
simd_query_batch_dot_product/simd_batch/128d_4c |
+6.38% | [+5.95%, +6.78%] | 42.2 | 39.6 | ⚠ WARN |
int8_raw_dot_product/dot_product_i8/128 |
+5.06% | [+4.83%, +5.31%] | 9.8 | 9.3 | ⚠ WARN |
int8_raw_dot_product/dot_product_i8/129 |
+5.52% | [+4.80%, +6.19%] | 10.4 | 9.8 | ⚠ WARN |
int8_raw_dot_product/dot_product_i8_raw/129 |
+4.45% | [+4.14%, +4.74%] | 7.6 | 7.2 | ⚠ WARN |
int8_prepared_dot_product/prepared/128 |
+4.90% | [+4.07%, +5.53%] | 10.0 | 9.5 | ⚠ WARN |
int8_vs_float32_cosine/float32_simd/768 |
-3.27% | [-3.71%, -2.80%] | 67.2 | 69.4 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c |
-3.51% | [-3.89%, -3.10%] | 2630.8 | 2726.3 | 🚀 WIN |
simd_query_batch_dot_product/simd_batch/384d_256c |
-3.84% | [-4.06%, -3.55%] | 5376.3 | 5590.7 | 🚀 WIN |
simd_cosine_similarity/simd/1024 |
-4.44% | [-4.77%, -4.11%] | 82.1 | 85.9 | 🚀 WIN |
silu_inplace/896 |
-4.19% | [-4.77%, -3.53%] | 2711.4 | 2829.8 | 🚀 WIN |
tier_prepared_query/binary_query_per_call_1000 |
-4.58% | [-4.80%, -4.37%] | 883183.9 | 925593.6 | 🚀 WIN |
int8_vs_float32_cosine/float32_simd/1024 |
-5.49% | [-5.86%, -5.10%] | 82.6 | 87.4 | 🚀 WIN |
add_bias_gelu/896 |
-6.20% | [-6.59%, -5.82%] | 378.0 | 403.0 | 🚀 WIN |
binary_cosine_distance/float32_simd/1536 |
-6.25% | [-6.68%, -5.87%] | 119.9 | 127.9 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c |
-6.70% | [-7.08%, -6.32%] | 262.3 | 281.1 | 🚀 WIN |
silu_inplace/4096 |
-5.64% | [-7.25%, -4.31%] | 12444.1 | 13188.4 | 🚀 WIN |
simd_query_batch_dot_product/simd_batch/768d_16c |
-7.97% | [-8.29%, -7.62%] | 554.0 | 602.0 | 🚀 WIN |
simd_dot_product/simd/384 |
-8.36% | [-8.50%, -8.21%] | 28.7 | 31.3 | 🚀 WIN |
simd_throughput_384/normalize |
-8.57% | [-9.00%, -8.20%] | 106.0 | 115.9 | 🚀 WIN |
int8_batch_cosine/float32_simd/100 |
-8.89% | [-9.23%, -8.55%] | 4387.7 | 4815.8 | 🚀 WIN |
simd_normalize/simd/384 |
-6.12% | [-9.55%, -2.61%] | 70.9 | 75.6 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c |
-9.36% | [-9.62%, -9.09%] | 270.2 | 298.0 | 🚀 WIN |
simd_query_batch_dot_product/pair_loop/768d_4c |
-9.88% | [-9.95%, -9.81%] | 195.4 | 216.8 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c |
-10.35% | [-10.86%, -9.84%] | 333.2 | 371.7 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c |
-10.63% | [-10.96%, -10.27%] | 324.7 | 363.3 | 🚀 WIN |
int8_prepared_dot_product/prepared/127 |
-10.71% | [-11.00%, -10.52%] | 17.5 | 19.6 | 🚀 WIN |
simd_dot_product/simd/768 |
-11.02% | [-11.18%, -10.82%] | 49.5 | 55.6 | 🚀 WIN |
int8_raw_dot_product/dot_product_i8/127 |
-11.45% | [-11.92%, -11.01%] | 17.2 | 19.4 | 🚀 WIN |
simd_dot_product/scalar/384 |
-12.08% | [-12.20%, -11.96%] | 337.9 | 384.3 | 🚀 WIN |
simd_batch_cosine/simd_batch/100 |
-12.07% | [-12.67%, -11.55%] | 4371.2 | 4971.3 | 🚀 WIN |
simd_normalized_cosine_fast_path/dot_product/768 |
-12.97% | [-13.17%, -12.75%] | 49.4 | 56.8 | 🚀 WIN |
simd_query_batch_dot_product/simd_batch/128d_256c |
-12.89% | [-13.47%, -12.10%] | 2038.6 | 2340.3 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/1024d_4c |
-13.28% | [-13.53%, -13.00%] | 323.7 | 373.3 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/768d_16c |
-13.22% | [-13.53%, -12.90%] | 1076.9 | 1240.9 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/768d_256c |
-13.40% | [-13.69%, -13.13%] | 18127.9 | 20932.2 | 🚀 WIN |
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 |
-12.75% | [-13.79%, -11.91%] | 45318.5 | 51938.8 | 🚀 WIN |
simd_cosine_similarity/scalar/384 |
-13.83% | [-14.10%, -13.54%] | 996.7 | 1156.8 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c |
-13.92% | [-14.31%, -13.54%] | 1089.8 | 1266.1 | 🚀 WIN |
simd_batch_dot_product/scalar_loop/100 |
-14.25% | [-14.41%, -14.07%] | 33063.9 | 38560.1 | 🚀 WIN |
memory_size/search_1000_float32 |
-14.21% | [-14.51%, -13.90%] | 43508.7 | 50716.0 | 🚀 WIN |
simd_batch_cosine/scalar_loop/10 |
-14.47% | [-14.57%, -14.39%] | 9882.7 | 11554.3 | 🚀 WIN |
simd_batch_dot_product/scalar_loop/10 |
-14.54% | [-14.59%, -14.48%] | 3293.3 | 3853.6 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c |
-14.32% | [-14.62%, -14.05%] | 42002.8 | 49025.1 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c |
-14.52% | [-14.84%, -14.22%] | 332.3 | 388.8 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c |
-14.92% | [-15.22%, -14.57%] | 18178.7 | 21365.4 | 🚀 WIN |
simd_euclidean_distance/scalar/384 |
-15.10% | [-15.32%, -14.86%] | 344.4 | 405.6 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c |
-15.28% | [-15.55%, -15.02%] | 43635.3 | 51507.5 | 🚀 WIN |
simd_batch_cosine/scalar_loop/1000 |
-15.53% | [-15.67%, -15.44%] | 990843.3 | 1173058.3 | 🚀 WIN |
simd_dot_product/scalar/768 |
-15.59% | [-15.77%, -15.32%] | 697.6 | 826.4 | 🚀 WIN |
simd_euclidean_distance/simd/1536 |
-15.29% | [-15.80%, -14.82%] | 95.6 | 112.9 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c |
-15.49% | [-15.83%, -15.17%] | 44257.8 | 52370.4 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c |
-15.56% | [-15.97%, -15.09%] | 17404.6 | 20612.3 | 🚀 WIN |
simd_batch_cosine/scalar_loop/100 |
-15.25% | [-16.11%, -14.56%] | 98761.0 | 116538.8 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/768d_64c |
-16.22% | [-16.51%, -15.92%] | 4276.5 | 5104.5 | 🚀 WIN |
simd_cosine_similarity/scalar/768 |
-16.34% | [-16.51%, -16.24%] | 2078.7 | 2484.8 | 🚀 WIN |
simd_query_batch_dot_product/pair_loop/128d_256c |
-16.41% | [-16.59%, -16.18%] | 3586.3 | 4290.4 | 🚀 WIN |
tier_prepared_query/int4_query_per_call_1000 |
-16.48% | [-16.70%, -16.26%] | 3618804.6 | 4332724.7 | 🚀 WIN |
simd_batch_dot_product/scalar_loop/1000 |
-16.80% | [-16.99%, -16.66%] | 337678.9 | 405845.5 | 🚀 WIN |
simd_euclidean_distance/scalar/768 |
-17.08% | [-17.13%, -17.00%] | 703.0 | 847.8 | 🚀 WIN |
simd_dot_product/scalar/1024 |
-17.15% | [-17.28%, -17.07%] | 934.8 | 1128.4 | 🚀 WIN |
simd_cosine_similarity/scalar/1024 |
-17.31% | [-17.46%, -17.21%] | 2796.4 | 3381.7 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c |
-17.14% | [-17.57%, -16.78%] | 17590.6 | 21230.2 | 🚀 WIN |
simd_normalize/scalar/1536 |
-17.29% | [-17.59%, -17.08%] | 1574.0 | 1903.0 | 🚀 WIN |
simd_normalize/scalar/1024 |
-17.15% | [-17.72%, -16.70%] | 1057.1 | 1276.0 | 🚀 WIN |
simd_normalize/scalar/768 |
-17.60% | [-17.92%, -17.32%] | 788.2 | 956.5 | 🚀 WIN |
simd_cosine_similarity/scalar/1536 |
-17.85% | [-17.94%, -17.79%] | 4230.4 | 5149.7 | 🚀 WIN |
simd_normalize/scalar/384 |
-17.76% | [-18.04%, -17.50%] | 396.5 | 482.2 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c |
-17.97% | [-18.08%, -17.88%] | 7906.1 | 9637.9 | 🚀 WIN |
simd_euclidean_distance/scalar/1024 |
-17.90% | [-18.13%, -17.77%] | 941.3 | 1146.6 | 🚀 WIN |
simd_euclidean_distance/scalar/1536 |
-18.20% | [-18.30%, -18.13%] | 1420.3 | 1736.3 | 🚀 WIN |
simd_dot_product/scalar/1536 |
-18.00% | [-18.49%, -17.67%] | 1413.9 | 1724.4 | 🚀 WIN |
int8_prepared_dot_product/per_call/129 |
-17.43% | [-18.52%, -16.05%] | 755.2 | 914.5 | 🚀 WIN |
int8_prepared_dot_product/per_call/128 |
-18.39% | [-18.65%, -18.14%] | 741.5 | 908.6 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/384d_1000c |
-18.46% | [-18.72%, -18.24%] | 43349.7 | 53164.1 | 🚀 WIN |
int8_prepared_dot_product/per_call/127 |
-18.35% | [-18.79%, -17.74%] | 742.4 | 909.2 | 🚀 WIN |
int8_quantization/quantize/1024 |
-18.48% | [-18.83%, -18.23%] | 5874.1 | 7205.9 | 🚀 WIN |
tier_prepared_query/int8_query_per_call_1000 |
-18.75% | [-18.83%, -18.69%] | 2202059.3 | 2710291.2 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c |
-18.70% | [-18.83%, -18.51%] | 15062.9 | 18528.5 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c |
-18.63% | [-18.86%, -18.40%] | 1048.4 | 1288.5 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c |
-18.61% | [-18.89%, -18.34%] | 4333.7 | 5324.6 | 🚀 WIN |
int8_quantization/quantize/768 |
-18.69% | [-19.00%, -18.41%] | 4390.7 | 5399.9 | 🚀 WIN |
simd_query_batch_dot_product/simd_batch/384d_64c |
-19.04% | [-19.33%, -18.81%] | 1127.7 | 1392.9 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c |
-19.26% | [-19.62%, -18.93%] | 4126.0 | 5109.9 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c |
-19.86% | [-19.93%, -19.78%] | 868.6 | 1083.8 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c |
-19.65% | [-19.98%, -19.31%] | 22202.7 | 27632.4 | 🚀 WIN |
int8_quantization/quantize/384 |
-19.80% | [-20.44%, -19.26%] | 2158.2 | 2691.1 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c |
-20.54% | [-20.90%, -20.18%] | 21598.5 | 27182.4 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c |
-20.70% | [-20.96%, -20.41%] | 1911.6 | 2410.6 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c |
-21.21% | [-21.53%, -20.87%] | 1065.5 | 1352.3 | 🚀 WIN |
int8_prepared_dot_product/per_call/768 |
-21.33% | [-22.08%, -20.41%] | 4287.3 | 5449.9 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c |
-22.23% | [-22.46%, -22.00%] | 4224.7 | 5432.2 | 🚀 WIN |
simd_query_batch_dot_product/simd_batch/768d_64c |
-22.41% | [-22.52%, -22.23%] | 2152.1 | 2773.7 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/1024d_256c |
-22.68% | [-22.93%, -22.41%] | 21710.3 | 28077.7 | 🚀 WIN |
int8_prepared_dot_product/per_call/384 |
-22.25% | [-22.94%, -21.59%] | 2105.5 | 2708.1 | 🚀 WIN |
int8_prepared_dot_product/per_call/1024 |
-22.36% | [-23.02%, -21.65%] | 5607.8 | 7223.2 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c |
-22.87% | [-23.13%, -22.59%] | 5181.9 | 6718.4 | 🚀 WIN |
gelu/4096 |
-15.76% | [-23.70%, -9.00%] | 1641.5 | 1948.6 | 🚀 WIN |
int8_raw_dot_product/dot_product_i8_raw/127 |
-23.33% | [-23.73%, -22.99%] | 13.6 | 17.7 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c |
-23.53% | [-23.84%, -23.19%] | 1300.4 | 1700.6 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c |
-24.04% | [-24.27%, -23.79%] | 21947.2 | 28894.7 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/1024d_16c |
-24.31% | [-24.56%, -24.07%] | 1308.1 | 1728.3 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c |
-24.44% | [-24.71%, -24.12%] | 18862.0 | 24963.9 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c |
-24.46% | [-24.72%, -24.07%] | 3480.1 | 4607.2 | 🚀 WIN |
elementwise_mul/4096 |
-22.65% | [-24.72%, -21.08%] | 245.2 | 317.0 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c |
-25.03% | [-25.29%, -24.79%] | 5253.9 | 7008.0 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c |
-25.49% | [-25.81%, -25.16%] | 1317.0 | 1767.6 | 🚀 WIN |
simd_batch_dot_product/simd_batch/100 |
-25.82% | [-25.95%, -25.66%] | 3337.8 | 4499.8 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c |
-25.76% | [-25.97%, -25.51%] | 1322.4 | 1781.3 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/1024d_64c |
-25.62% | [-26.03%, -25.25%] | 5137.3 | 6906.8 | 🚀 WIN |
int8_quantization/quantize/1536 |
-25.82% | [-26.06%, -25.58%] | 8051.8 | 10855.1 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c |
-25.84% | [-26.10%, -25.52%] | 235.8 | 317.9 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c |
-26.24% | [-26.54%, -25.93%] | 5221.4 | 7079.1 | 🚀 WIN |
simd_query_batch_dot_product/pair_loop/768d_256c |
-29.56% | [-29.69%, -29.42%] | 12981.7 | 18429.3 | 🚀 WIN |
simd_query_batch_dot_product/pair_loop/768d_16c |
-29.61% | [-29.72%, -29.47%] | 762.0 | 1082.5 | 🚀 WIN |
simd_normalize/simd/768 |
-28.56% | [-30.91%, -25.96%] | 123.6 | 172.9 | 🚀 WIN |
softmax_attention/128 |
-30.60% | [-30.92%, -30.18%] | 4144.6 | 5971.8 | 🚀 WIN |
simd_normalize/simd/1024 |
-28.21% | [-30.97%, -25.26%] | 160.1 | 223.1 | 🚀 WIN |
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 |
-33.93% | [-34.12%, -33.75%] | 72282.8 | 109404.3 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c |
-33.99% | [-34.21%, -33.78%] | 72706.6 | 110146.7 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c |
-34.08% | [-34.70%, -33.16%] | 32237.3 | 48905.1 | 🚀 WIN |
simd_query_batch_dot_product/pair_loop/768d_64c |
-34.86% | [-34.98%, -34.70%] | 2973.8 | 4565.4 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/768d_1000c |
-35.11% | [-35.27%, -34.94%] | 71072.6 | 109521.4 | 🚀 WIN |
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 |
-35.53% | [-35.78%, -35.23%] | 90228.0 | 139960.5 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c |
-35.82% | [-36.02%, -35.62%] | 980.2 | 1527.4 | 🚀 WIN |
simd_prepared_query_normalized_cosine/dot_product_loop/384 |
-36.00% | [-36.13%, -35.84%] | 30082.5 | 47004.4 | 🚀 WIN |
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 |
-37.34% | [-37.47%, -37.23%] | 28590.9 | 45626.2 | 🚀 WIN |
simd_normalize/simd/1536 |
-35.68% | [-37.67%, -33.63%] | 213.4 | 331.8 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c |
-38.05% | [-38.16%, -37.89%] | 3800.9 | 6135.2 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c |
-38.54% | [-38.70%, -38.37%] | 69900.0 | 113729.7 | 🚀 WIN |
softmax_attention/512 |
-38.50% | [-38.95%, -38.21%] | 60835.8 | 98919.8 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c |
-39.09% | [-39.27%, -38.93%] | 68453.2 | 112390.4 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c |
-39.75% | [-39.92%, -39.58%] | 86544.2 | 143631.9 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c |
-40.51% | [-40.64%, -40.38%] | 85159.0 | 143136.8 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c |
-40.70% | [-40.81%, -40.59%] | 60556.6 | 102119.2 | 🚀 WIN |
simd_batch_cosine/simd_batch/1000 |
-45.09% | [-45.30%, -44.85%] | 56613.5 | 103101.3 | 🚀 WIN |
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 |
-46.73% | [-46.83%, -46.61%] | 54533.0 | 102362.5 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c |
-46.80% | [-47.05%, -46.44%] | 86601.8 | 162789.4 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c |
-47.60% | [-47.71%, -47.50%] | 85201.9 | 162612.4 | 🚀 WIN |
simd_prepared_query_normalized_cosine/dot_product_loop/768 |
-48.83% | [-48.96%, -48.66%] | 51297.2 | 100239.1 | 🚀 WIN |
int8_batch_cosine/float32_simd/1000 |
-50.11% | [-50.28%, -49.92%] | 52928.5 | 106097.9 | 🚀 WIN |
simd_prepared_query_normalized_cosine/dot_product_loop/1024 |
-52.05% | [-52.21%, -51.89%] | 65455.4 | 136498.7 | 🚀 WIN |
simd_batch_dot_product/simd_batch/1000 |
-54.08% | [-54.21%, -53.93%] | 45092.7 | 98205.1 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c |
-54.42% | [-54.84%, -54.02%] | 65814.1 | 144397.5 | 🚀 WIN |
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 |
-55.38% | [-55.48%, -55.21%] | 62917.5 | 140993.5 | 🚀 WIN |
rms_norm/896 |
-97.45% | [-97.48%, -97.42%] | 206.4 | 8103.3 | 🚀 WIN |
rms_norm/4096 |
-97.73% | [-97.81%, -97.63%] | 768.9 | 33940.0 | 🚀 WIN |
All 247 measurements
| Bench | Δ point | CI-lower | CI-upper |
|---|---|---|---|
add_bias_gelu/4096 |
-0.82% | -1.02% | -0.67% |
add_bias_gelu/896 |
-6.20% | -6.59% | -5.82% |
binary_cosine_distance/binary/1024 |
+8.52% | +8.23% | +8.90% |
binary_cosine_distance/binary/1536 |
+6.53% | +6.27% | +6.74% |
binary_cosine_distance/binary/384 |
+18.31% | +18.00% | +18.58% |
binary_cosine_distance/binary/768 |
+10.66% | +10.40% | +10.98% |
binary_cosine_distance/float32_simd/1024 |
-2.56% | -2.88% | -2.21% |
binary_cosine_distance/float32_simd/1536 |
-6.25% | -6.68% | -5.87% |
binary_cosine_distance/float32_simd/384 |
+25.83% | +25.01% | +26.56% |
binary_cosine_distance/float32_simd/768 |
+0.41% | +0.02% | +0.82% |
elementwise_mul/4096 |
-22.65% | -24.72% | -21.08% |
gelu/4096 |
-15.76% | -23.70% | -9.00% |
gelu/896 |
-1.27% | -3.70% | +0.57% |
int4_cosine_distance/float32_simd/1024 |
+27.24% | +26.70% | +27.82% |
int4_cosine_distance/float32_simd/1536 |
+24.81% | +24.29% | +25.34% |
int4_cosine_distance/float32_simd/384 |
+21.29% | +20.76% | +21.83% |
int4_cosine_distance/float32_simd/768 |
+0.37% | -0.11% | +0.85% |
int4_cosine_distance/int4/1024 |
+0.62% | -0.08% | +1.26% |
int4_cosine_distance/int4/1536 |
+0.12% | -0.00% | +0.26% |
int4_cosine_distance/int4/384 |
-2.15% | -2.89% | -1.43% |
int4_cosine_distance/int4/768 |
-1.82% | -2.64% | -1.05% |
int8_batch_cosine/float32_simd/10 |
+23.85% | +22.99% | +24.59% |
int8_batch_cosine/float32_simd/100 |
-8.89% | -9.23% | -8.55% |
int8_batch_cosine/float32_simd/1000 |
-50.11% | -50.28% | -49.92% |
int8_batch_cosine/int8_loop/10 |
+16.00% | +15.64% | +16.40% |
int8_batch_cosine/int8_loop/100 |
+13.09% | +12.48% | +13.56% |
int8_batch_cosine/int8_loop/1000 |
+23.33% | +22.46% | +24.20% |
int8_prepared_dot_product/per_call/1024 |
-22.36% | -23.02% | -21.65% |
int8_prepared_dot_product/per_call/127 |
-18.35% | -18.79% | -17.74% |
int8_prepared_dot_product/per_call/128 |
-18.39% | -18.65% | -18.14% |
int8_prepared_dot_product/per_call/129 |
-17.43% | -18.52% | -16.05% |
int8_prepared_dot_product/per_call/384 |
-22.25% | -22.94% | -21.59% |
int8_prepared_dot_product/per_call/768 |
-21.33% | -22.08% | -20.41% |
int8_prepared_dot_product/prepared/1024 |
+22.08% | +21.84% | +22.34% |
int8_prepared_dot_product/prepared/127 |
-10.71% | -11.00% | -10.52% |
int8_prepared_dot_product/prepared/128 |
+4.90% | +4.07% | +5.53% |
int8_prepared_dot_product/prepared/129 |
+6.64% | +6.10% | +7.03% |
int8_prepared_dot_product/prepared/384 |
+11.93% | +11.38% | +12.66% |
int8_prepared_dot_product/prepared/768 |
+37.41% | +36.57% | +38.05% |
int8_quantization/quantize/1024 |
-18.48% | -18.83% | -18.23% |
int8_quantization/quantize/1536 |
-25.82% | -26.06% | -25.58% |
int8_quantization/quantize/384 |
-19.80% | -20.44% | -19.26% |
int8_quantization/quantize/768 |
-18.69% | -19.00% | -18.41% |
int8_raw_dot_product/dot_product_i8/1024 |
+10.69% | +10.15% | +11.05% |
int8_raw_dot_product/dot_product_i8/127 |
-11.45% | -11.92% | -11.01% |
int8_raw_dot_product/dot_product_i8/128 |
+5.06% | +4.83% | +5.31% |
int8_raw_dot_product/dot_product_i8/129 |
+5.52% | +4.80% | +6.19% |
int8_raw_dot_product/dot_product_i8/384 |
+13.58% | +13.14% | +14.01% |
int8_raw_dot_product/dot_product_i8/768 |
+43.67% | +43.41% | +43.94% |
int8_raw_dot_product/dot_product_i8_raw/1024 |
+23.90% | +23.50% | +24.31% |
int8_raw_dot_product/dot_product_i8_raw/127 |
-23.33% | -23.73% | -22.99% |
int8_raw_dot_product/dot_product_i8_raw/128 |
+9.26% | +8.56% | +9.79% |
int8_raw_dot_product/dot_product_i8_raw/129 |
+4.45% | +4.14% | +4.74% |
int8_raw_dot_product/dot_product_i8_raw/384 |
+14.40% | +13.84% | +14.97% |
int8_raw_dot_product/dot_product_i8_raw/768 |
+29.99% | +29.42% | +30.38% |
int8_vs_float32_cosine/float32_simd/1024 |
-5.49% | -5.86% | -5.10% |
int8_vs_float32_cosine/float32_simd/1536 |
+22.95% | +22.21% | +23.64% |
int8_vs_float32_cosine/float32_simd/384 |
+14.61% | +13.77% | +15.39% |
int8_vs_float32_cosine/float32_simd/768 |
-3.27% | -3.71% | -2.80% |
int8_vs_float32_cosine/int8/1024 |
+21.12% | +20.45% | +21.59% |
int8_vs_float32_cosine/int8/1536 |
+19.46% | +18.56% | +20.14% |
int8_vs_float32_cosine/int8/384 |
+23.28% | +22.87% | +23.69% |
int8_vs_float32_cosine/int8/768 |
+39.18% | +38.44% | +39.72% |
layer_norm/4096 |
+26.77% | +26.09% | +27.25% |
layer_norm/896 |
+0.13% | -0.31% | +0.50% |
memory_size/search_1000_float32 |
-14.21% | -14.51% | -13.90% |
memory_size/search_1000_int8 |
+15.95% | +15.26% | +16.48% |
rms_norm/4096 |
-97.73% | -97.81% | -97.63% |
rms_norm/896 |
-97.45% | -97.48% | -97.42% |
silu_inplace/4096 |
-5.64% | -7.25% | -4.31% |
silu_inplace/896 |
-4.19% | -4.77% | -3.53% |
simd_batch_cosine/scalar_loop/10 |
-14.47% | -14.57% | -14.39% |
simd_batch_cosine/scalar_loop/100 |
-15.25% | -16.11% | -14.56% |
simd_batch_cosine/scalar_loop/1000 |
-15.53% | -15.67% | -15.44% |
simd_batch_cosine/simd_batch/10 |
+25.92% | +25.30% | +26.51% |
simd_batch_cosine/simd_batch/100 |
-12.07% | -12.67% | -11.55% |
simd_batch_cosine/simd_batch/1000 |
-45.09% | -45.30% | -44.85% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c |
-39.75% | -39.92% | -39.58% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c |
-25.49% | -25.81% | -25.16% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c |
-19.65% | -19.98% | -19.31% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c |
-10.35% | -10.86% | -9.84% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c |
-25.03% | -25.29% | -24.79% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c |
-15.28% | -15.55% | -15.02% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c |
+15.13% | +14.69% | +15.59% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c |
+1.59% | +1.16% | +2.07% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c |
+7.55% | +7.12% | +7.96% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c |
-1.96% | -2.32% | -1.59% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c |
-38.54% | -38.70% | -38.37% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c |
-21.21% | -21.53% | -20.87% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c |
-17.14% | -17.57% | -16.78% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c |
-9.36% | -9.62% | -9.09% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c |
-22.23% | -22.46% | -22.00% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c |
-40.51% | -40.64% | -40.38% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c |
-23.53% | -23.84% | -23.19% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c |
-20.54% | -20.90% | -20.18% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c |
-10.63% | -10.96% | -10.27% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c |
-22.87% | -23.13% | -22.59% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c |
-14.32% | -14.62% | -14.05% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c |
+17.96% | +17.44% | +18.49% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c |
-0.84% | -1.20% | -0.48% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c |
+12.20% | +11.83% | +12.58% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c |
-3.51% | -3.89% | -3.10% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c |
-39.09% | -39.27% | -38.93% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c |
-18.63% | -18.86% | -18.40% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c |
-15.56% | -15.97% | -15.09% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c |
-6.70% | -7.08% | -6.32% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c |
-19.26% | -19.62% | -18.93% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c |
-46.80% | -47.05% | -46.44% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c |
-25.76% | -25.97% | -25.51% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c |
-24.04% | -24.27% | -23.79% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c |
-14.52% | -14.84% | -14.22% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c |
-26.24% | -26.54% | -25.93% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c |
-15.49% | -15.83% | -15.17% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c |
+16.79% | +16.26% | +17.35% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c |
+3.11% | +2.63% | +3.57% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c |
+12.33% | +11.70% | +12.98% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c |
-0.02% | -1.08% | +1.31% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c |
-33.99% | -34.21% | -33.78% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c |
-13.92% | -14.31% | -13.54% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c |
-14.92% | -15.22% | -14.57% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c |
-2.76% | -3.16% | -2.40% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c |
-18.61% | -18.89% | -18.34% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c |
-54.42% | -54.84% | -54.02% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c |
-35.82% | -36.02% | -35.62% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c |
-24.44% | -24.71% | -24.12% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c |
-25.84% | -26.10% | -25.52% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c |
-38.05% | -38.16% | -37.89% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c |
-34.08% | -34.70% | -33.16% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c |
-2.89% | -3.19% | -2.54% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c |
-17.97% | -18.08% | -17.88% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c |
+2.64% | +2.41% | +2.78% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c |
-20.70% | -20.96% | -20.41% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c |
-40.70% | -40.81% | -40.59% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c |
-19.86% | -19.93% | -19.78% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c |
-18.70% | -18.83% | -18.51% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c |
+0.12% | -0.10% | +0.31% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c |
-24.46% | -24.72% | -24.07% |
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c |
-47.60% | -47.71% | -47.50% |
simd_batch_cosine_normalized_query/simd_batch/1024d_16c |
-24.31% | -24.56% | -24.07% |
simd_batch_cosine_normalized_query/simd_batch/1024d_256c |
-22.68% | -22.93% | -22.41% |
simd_batch_cosine_normalized_query/simd_batch/1024d_4c |
-13.28% | -13.53% | -13.00% |
simd_batch_cosine_normalized_query/simd_batch/1024d_64c |
-25.62% | -26.03% | -25.25% |
simd_batch_cosine_normalized_query/simd_batch/384d_1000c |
-18.46% | -18.72% | -18.24% |
simd_batch_cosine_normalized_query/simd_batch/384d_16c |
+22.62% | +21.87% | +23.29% |
simd_batch_cosine_normalized_query/simd_batch/384d_256c |
+0.16% | -0.23% | +0.59% |
simd_batch_cosine_normalized_query/simd_batch/384d_4c |
+13.42% | +12.98% | +13.86% |
simd_batch_cosine_normalized_query/simd_batch/384d_64c |
-2.74% | -3.10% | -2.38% |
simd_batch_cosine_normalized_query/simd_batch/768d_1000c |
-35.11% | -35.27% | -34.94% |
simd_batch_cosine_normalized_query/simd_batch/768d_16c |
-13.22% | -13.53% | -12.90% |
simd_batch_cosine_normalized_query/simd_batch/768d_256c |
-13.40% | -13.69% | -13.13% |
simd_batch_cosine_normalized_query/simd_batch/768d_4c |
+3.05% | +2.37% | +3.67% |
simd_batch_cosine_normalized_query/simd_batch/768d_64c |
-16.22% | -16.51% | -15.92% |
simd_batch_dot_product/scalar_loop/10 |
-14.54% | -14.59% | -14.48% |
simd_batch_dot_product/scalar_loop/100 |
-14.25% | -14.41% | -14.07% |
simd_batch_dot_product/scalar_loop/1000 |
-16.80% | -16.99% | -16.66% |
simd_batch_dot_product/simd_batch/10 |
-0.04% | -0.71% | +0.50% |
simd_batch_dot_product/simd_batch/100 |
-25.82% | -25.95% | -25.66% |
simd_batch_dot_product/simd_batch/1000 |
-54.08% | -54.21% | -53.93% |
simd_cosine_similarity/scalar/1024 |
-17.31% | -17.46% | -17.21% |
simd_cosine_similarity/scalar/1536 |
-17.85% | -17.94% | -17.79% |
simd_cosine_similarity/scalar/384 |
-13.83% | -14.10% | -13.54% |
simd_cosine_similarity/scalar/768 |
-16.34% | -16.51% | -16.24% |
simd_cosine_similarity/simd/1024 |
-4.44% | -4.77% | -4.11% |
simd_cosine_similarity/simd/1536 |
+18.55% | +17.79% | +19.24% |
simd_cosine_similarity/simd/384 |
+7.71% | +7.14% | +8.31% |
simd_cosine_similarity/simd/768 |
+3.02% | +2.62% | +3.42% |
simd_dot_product/scalar/1024 |
-17.15% | -17.28% | -17.07% |
simd_dot_product/scalar/1536 |
-18.00% | -18.49% | -17.67% |
simd_dot_product/scalar/384 |
-12.08% | -12.20% | -11.96% |
simd_dot_product/scalar/768 |
-15.59% | -15.77% | -15.32% |
simd_dot_product/simd/1024 |
+14.40% | +14.08% | +14.74% |
simd_dot_product/simd/1536 |
+16.08% | +15.73% | +16.35% |
simd_dot_product/simd/384 |
-8.36% | -8.50% | -8.21% |
simd_dot_product/simd/768 |
-11.02% | -11.18% | -10.82% |
simd_euclidean_distance/scalar/1024 |
-17.90% | -18.13% | -17.77% |
simd_euclidean_distance/scalar/1536 |
-18.20% | -18.30% | -18.13% |
simd_euclidean_distance/scalar/384 |
-15.10% | -15.32% | -14.86% |
simd_euclidean_distance/scalar/768 |
-17.08% | -17.13% | -17.00% |
simd_euclidean_distance/simd/1024 |
+24.73% | +24.24% | +25.22% |
simd_euclidean_distance/simd/1536 |
-15.29% | -15.80% | -14.82% |
simd_euclidean_distance/simd/384 |
+8.42% | +7.83% | +8.93% |
simd_euclidean_distance/simd/768 |
+16.38% | +15.83% | +16.92% |
simd_normalize/scalar/1024 |
-17.15% | -17.72% | -16.70% |
simd_normalize/scalar/1536 |
-17.29% | -17.59% | -17.08% |
simd_normalize/scalar/384 |
-17.76% | -18.04% | -17.50% |
simd_normalize/scalar/768 |
-17.60% | -17.92% | -17.32% |
simd_normalize/simd/1024 |
-28.21% | -30.97% | -25.26% |
simd_normalize/simd/1536 |
-35.68% | -37.67% | -33.63% |
simd_normalize/simd/384 |
-6.12% | -9.55% | -2.61% |
simd_normalize/simd/768 |
-28.56% | -30.91% | -25.96% |
simd_normalized_cosine_fast_path/cosine_full/1024 |
+21.26% | +20.76% | +21.76% |
simd_normalized_cosine_fast_path/cosine_full/384 |
+26.57% | +25.75% | +27.29% |
simd_normalized_cosine_fast_path/cosine_full/768 |
+0.17% | -0.27% | +0.62% |
simd_normalized_cosine_fast_path/dot_product/1024 |
+9.60% | +9.32% | +9.91% |
simd_normalized_cosine_fast_path/dot_product/384 |
+12.97% | +12.78% | +13.11% |
simd_normalized_cosine_fast_path/dot_product/768 |
-12.97% | -13.17% | -12.75% |
simd_prepared_query_normalized_cosine/dot_product_loop/1024 |
-52.05% | -52.21% | -51.89% |
simd_prepared_query_normalized_cosine/dot_product_loop/384 |
-36.00% | -36.13% | -35.84% |
simd_prepared_query_normalized_cosine/dot_product_loop/768 |
-48.83% | -48.96% | -48.66% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 |
-35.53% | -35.78% | -35.23% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 |
-12.75% | -13.79% | -11.91% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 |
-33.93% | -34.12% | -33.75% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 |
-55.38% | -55.48% | -55.21% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 |
-37.34% | -37.47% | -37.23% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 |
-46.73% | -46.83% | -46.61% |
simd_query_batch_dot_product/pair_loop/128d_16c |
+1.89% | +1.71% | +2.09% |
simd_query_batch_dot_product/pair_loop/128d_256c |
-16.41% | -16.59% | -16.18% |
simd_query_batch_dot_product/pair_loop/128d_4c |
+0.53% | +0.05% | +1.10% |
simd_query_batch_dot_product/pair_loop/128d_64c |
+1.20% | +0.99% | +1.37% |
simd_query_batch_dot_product/pair_loop/384d_16c |
+35.09% | +34.75% | +35.41% |
simd_query_batch_dot_product/pair_loop/384d_256c |
+3.35% | +2.96% | +3.80% |
simd_query_batch_dot_product/pair_loop/384d_4c |
+34.70% | +34.50% | +35.02% |
simd_query_batch_dot_product/pair_loop/384d_64c |
-0.71% | -0.87% | -0.53% |
simd_query_batch_dot_product/pair_loop/768d_16c |
-29.61% | -29.72% | -29.47% |
simd_query_batch_dot_product/pair_loop/768d_256c |
-29.56% | -29.69% | -29.42% |
simd_query_batch_dot_product/pair_loop/768d_4c |
-9.88% | -9.95% | -9.81% |
simd_query_batch_dot_product/pair_loop/768d_64c |
-34.86% | -34.98% | -34.70% |
simd_query_batch_dot_product/simd_batch/128d_16c |
+2.92% | +2.65% | +3.23% |
simd_query_batch_dot_product/simd_batch/128d_256c |
-12.89% | -13.47% | -12.10% |
simd_query_batch_dot_product/simd_batch/128d_4c |
+6.38% | +5.95% | +6.78% |
simd_query_batch_dot_product/simd_batch/128d_64c |
+8.11% | +7.51% | +8.88% |
simd_query_batch_dot_product/simd_batch/384d_16c |
+20.40% | +19.77% | +20.94% |
simd_query_batch_dot_product/simd_batch/384d_256c |
-3.84% | -4.06% | -3.55% |
simd_query_batch_dot_product/simd_batch/384d_4c |
+19.54% | +19.39% | +19.68% |
simd_query_batch_dot_product/simd_batch/384d_64c |
-19.04% | -19.33% | -18.81% |
simd_query_batch_dot_product/simd_batch/768d_16c |
-7.97% | -8.29% | -7.62% |
simd_query_batch_dot_product/simd_batch/768d_256c |
+2.41% | +2.04% | +2.70% |
simd_query_batch_dot_product/simd_batch/768d_4c |
+10.12% | +9.40% | +10.69% |
simd_query_batch_dot_product/simd_batch/768d_64c |
-22.41% | -22.52% | -22.23% |
simd_squared_euclidean_fast_path/euclidean_full/1024 |
+23.53% | +23.05% | +24.04% |
simd_squared_euclidean_fast_path/euclidean_full/384 |
+24.63% | +24.03% | +25.22% |
simd_squared_euclidean_fast_path/euclidean_full/768 |
+10.21% | +9.80% | +10.63% |
simd_squared_euclidean_fast_path/squared_euclidean/1024 |
+15.44% | +14.96% | +15.93% |
simd_squared_euclidean_fast_path/squared_euclidean/384 |
+6.44% | +5.95% | +6.90% |
simd_squared_euclidean_fast_path/squared_euclidean/768 |
+0.13% | -0.28% | +0.51% |
simd_throughput_384/cosine_similarity |
+43.30% | +42.68% | +43.84% |
simd_throughput_384/dot_product |
+32.84% | +32.53% | +33.17% |
simd_throughput_384/euclidean_distance |
+38.77% | +38.27% | +39.30% |
simd_throughput_384/normalize |
-8.57% | -9.00% | -8.20% |
softmax_attention/128 |
-30.60% | -30.92% | -30.18% |
softmax_attention/512 |
-38.50% | -38.95% | -38.21% |
tier_prepared_query/binary_query_once_1000 |
+10.82% | +10.53% | +11.19% |
tier_prepared_query/binary_query_per_call_1000 |
-4.58% | -4.80% | -4.37% |
tier_prepared_query/int4_query_once_1000 |
+1.06% | +0.20% | +1.92% |
tier_prepared_query/int4_query_per_call_1000 |
-16.48% | -16.70% | -16.26% |
tier_prepared_query/int8_query_once_1000 |
+11.46% | +10.67% | +11.98% |
tier_prepared_query/int8_query_per_call_1000 |
-18.75% | -18.83% | -18.69% |
Rule: CI-lower of change ≤3.0% passes silently; (3.0%, 7.0%] warns; >7.0% fails. Override via PR label bench-allow-regression.
Gate is in advisory mode (Rollout step 3, ADR-058 §Rollout). Failures do not block merge for the first 7 days.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
crates.io v0.2.2 was published 2026-05-20, before the RoPE pairing fix landed (PR #96, merged today). Cannot republish 0.2.2 (immutable on crates.io), so bumping to 0.2.3 to ship the fix. v0.2.2 will be yanked on crates.io post-publish.
Changes
embed,tune)docs/releases/v0.2.2.md→v0.2.3.mdwith yank noticePost-merge
🤖 Generated with Claude Code