release: v0.2.4 — fix AVX2 cfg build break on macOS x86_64#100
Merged
Conversation
v0.2.3 fails to compile on macOS x86_64 (Intel Mac runners, cross-compile from aarch64-macos) because tiled_avx2.rs imports TILE_I/J/K from tiled.rs, but those constants are gated `#[cfg(not(target_os = "macos"))]`. The AVX2 import gate was only `#[cfg(target_arch = "x86_64")]` — on macOS x86_64 both conditions hold and the import unresolves. Fix: align tiled_avx2.rs cfg with tiled_neon.rs pattern: #[cfg(all(target_arch = "x86_64", not(target_os = "macos")))] The AVX2 microkernel is dead on macOS regardless — matmul_bt_tiled (the only caller) is already gated `#[cfg(not(target_os = "macos"))]` because Accelerate is selected for macOS. Pure cfg-fix, no SIMD logic touched. Bumps workspace + 3 inter-crate path-dep refs to 0.2.4. v0.2.3 to be yanked after this lands per the bump-and-yank recipe in CLAUDE.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Perf regression report (ADR-058)
|
| Bench | Δ point | 95% CI | new ns | base ns | verdict |
|---|---|---|---|---|---|
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c |
+8.35% | [+8.25%, +8.46%] | 1031.8 | 952.2 | ❌ FAIL |
simd_dot_product/simd/1536 |
+5.20% | [+5.13%, +5.29%] | 106.9 | 101.6 | ⚠ WARN |
int8_batch_cosine/int8_loop/1000 |
+4.65% | [+4.35%, +4.93%] | 20099.9 | 19206.5 | ⚠ WARN |
simd_query_batch_dot_product/simd_batch/768d_16c |
+3.90% | [+3.85%, +3.95%] | 737.2 | 709.5 | ⚠ WARN |
tier_prepared_query/int8_query_once_1000 |
+3.81% | [+3.77%, +3.85%] | 18576.6 | 17895.6 | ⚠ WARN |
simd_normalize/simd/768 |
+5.59% | [+3.52%, +7.85%] | 123.1 | 116.5 | ⚠ WARN |
simd_batch_cosine_normalized_query/simd_batch/1024d_256c |
+3.68% | [+3.48%, +3.88%] | 36741.6 | 35437.9 | ⚠ WARN |
simd_query_batch_dot_product/pair_loop/384d_64c |
+3.24% | [+3.23%, +3.25%] | 2507.0 | 2428.4 | ⚠ WARN |
simd_normalized_cosine_fast_path/dot_product/768 |
-3.96% | [-4.23%, -3.66%] | 56.0 | 58.3 | 🚀 WIN |
simd_dot_product/simd/1024 |
-4.34% | [-4.40%, -4.28%] | 70.3 | 73.4 | 🚀 WIN |
simd_batch_cosine/simd_batch/1000 |
-6.51% | [-6.57%, -6.45%] | 80338.7 | 85928.5 | 🚀 WIN |
simd_query_batch_dot_product/pair_loop/768d_256c |
-6.95% | [-7.04%, -6.86%] | 19847.6 | 21329.0 | 🚀 WIN |
simd_batch_dot_product/simd_batch/1000 |
-14.51% | [-14.58%, -14.44%] | 75708.6 | 88556.6 | 🚀 WIN |
All 247 measurements
| Bench | Δ point | CI-lower | CI-upper |
|---|---|---|---|
add_bias_gelu/4096 |
+0.01% | -0.00% | +0.02% |
add_bias_gelu/896 |
-0.01% | -0.03% | +0.01% |
binary_cosine_distance/binary/1024 |
+1.07% | +1.06% | +1.08% |
binary_cosine_distance/binary/1536 |
-0.04% | -0.05% | -0.02% |
binary_cosine_distance/binary/384 |
-0.19% | -0.21% | -0.16% |
binary_cosine_distance/binary/768 |
-0.21% | -0.22% | -0.19% |
binary_cosine_distance/float32_simd/1024 |
+0.03% | -0.00% | +0.08% |
binary_cosine_distance/float32_simd/1536 |
-0.04% | -0.05% | -0.02% |
binary_cosine_distance/float32_simd/384 |
+0.13% | +0.09% | +0.18% |
binary_cosine_distance/float32_simd/768 |
+0.19% | +0.18% | +0.21% |
elementwise_mul/4096 |
-2.25% | -2.30% | -2.19% |
gelu/4096 |
-0.00% | -0.03% | +0.02% |
gelu/896 |
+0.00% | -0.03% | +0.02% |
int4_cosine_distance/float32_simd/1024 |
-0.04% | -0.08% | -0.01% |
int4_cosine_distance/float32_simd/1536 |
+0.09% | +0.06% | +0.14% |
int4_cosine_distance/float32_simd/384 |
-0.53% | -0.55% | -0.51% |
int4_cosine_distance/float32_simd/768 |
+0.14% | +0.12% | +0.15% |
int4_cosine_distance/int4/1024 |
-0.39% | -0.47% | -0.32% |
int4_cosine_distance/int4/1536 |
-0.35% | -0.37% | -0.34% |
int4_cosine_distance/int4/384 |
+0.00% | -0.04% | +0.05% |
int4_cosine_distance/int4/768 |
+0.22% | +0.20% | +0.25% |
int8_batch_cosine/float32_simd/10 |
-0.02% | -0.04% | -0.01% |
int8_batch_cosine/float32_simd/100 |
+0.57% | +0.56% | +0.58% |
int8_batch_cosine/float32_simd/1000 |
+2.49% | +2.43% | +2.56% |
int8_batch_cosine/int8_loop/10 |
+0.83% | +0.79% | +0.88% |
int8_batch_cosine/int8_loop/100 |
+0.44% | +0.41% | +0.47% |
int8_batch_cosine/int8_loop/1000 |
+4.65% | +4.35% | +4.93% |
int8_prepared_dot_product/per_call/1024 |
-0.02% | -0.03% | -0.00% |
int8_prepared_dot_product/per_call/127 |
-0.08% | -0.10% | -0.07% |
int8_prepared_dot_product/per_call/128 |
+0.02% | +0.02% | +0.03% |
int8_prepared_dot_product/per_call/129 |
+0.06% | +0.05% | +0.07% |
int8_prepared_dot_product/per_call/384 |
+0.00% | -0.01% | +0.01% |
int8_prepared_dot_product/per_call/768 |
-0.00% | -0.01% | +0.01% |
int8_prepared_dot_product/prepared/1024 |
-0.60% | -0.63% | -0.58% |
int8_prepared_dot_product/prepared/127 |
+0.78% | +0.76% | +0.79% |
int8_prepared_dot_product/prepared/128 |
+0.49% | +0.45% | +0.54% |
int8_prepared_dot_product/prepared/129 |
+0.53% | +0.50% | +0.56% |
int8_prepared_dot_product/prepared/384 |
-0.44% | -0.48% | -0.40% |
int8_prepared_dot_product/prepared/768 |
+1.38% | +1.30% | +1.46% |
int8_quantization/quantize/1024 |
+0.00% | -0.01% | +0.01% |
int8_quantization/quantize/1536 |
-0.22% | -0.23% | -0.21% |
int8_quantization/quantize/384 |
+0.01% | -0.00% | +0.02% |
int8_quantization/quantize/768 |
+0.00% | -0.01% | +0.01% |
int8_raw_dot_product/dot_product_i8/1024 |
+0.98% | +0.95% | +1.01% |
int8_raw_dot_product/dot_product_i8/127 |
+0.57% | +0.56% | +0.59% |
int8_raw_dot_product/dot_product_i8/128 |
-0.79% | -0.89% | -0.67% |
int8_raw_dot_product/dot_product_i8/129 |
+0.72% | +0.69% | +0.76% |
int8_raw_dot_product/dot_product_i8/384 |
-0.44% | -0.45% | -0.42% |
int8_raw_dot_product/dot_product_i8/768 |
-0.89% | -0.94% | -0.84% |
int8_raw_dot_product/dot_product_i8_raw/1024 |
+0.03% | +0.01% | +0.05% |
int8_raw_dot_product/dot_product_i8_raw/127 |
-0.09% | -0.12% | -0.06% |
int8_raw_dot_product/dot_product_i8_raw/128 |
+0.18% | +0.15% | +0.20% |
int8_raw_dot_product/dot_product_i8_raw/129 |
+0.29% | +0.23% | +0.36% |
int8_raw_dot_product/dot_product_i8_raw/384 |
+0.04% | +0.02% | +0.07% |
int8_raw_dot_product/dot_product_i8_raw/768 |
+0.25% | +0.23% | +0.26% |
int8_vs_float32_cosine/float32_simd/1024 |
-0.10% | -0.11% | -0.08% |
int8_vs_float32_cosine/float32_simd/1536 |
-0.04% | -0.06% | -0.00% |
int8_vs_float32_cosine/float32_simd/384 |
-0.43% | -0.54% | -0.31% |
int8_vs_float32_cosine/float32_simd/768 |
+0.42% | +0.35% | +0.48% |
int8_vs_float32_cosine/int8/1024 |
+0.55% | +0.51% | +0.59% |
int8_vs_float32_cosine/int8/1536 |
-0.27% | -0.31% | -0.23% |
int8_vs_float32_cosine/int8/384 |
-1.34% | -1.50% | -1.19% |
int8_vs_float32_cosine/int8/768 |
-1.23% | -1.32% | -1.14% |
layer_norm/4096 |
-0.31% | -0.33% | -0.29% |
layer_norm/896 |
+0.01% | -0.02% | +0.03% |
memory_size/search_1000_float32 |
+0.33% | +0.31% | +0.36% |
memory_size/search_1000_int8 |
-0.98% | -1.05% | -0.91% |
rms_norm/4096 |
+0.79% | +0.76% | +0.82% |
rms_norm/896 |
-0.14% | -0.18% | -0.10% |
silu_inplace/4096 |
+0.02% | -0.00% | +0.05% |
silu_inplace/896 |
+0.01% | -0.02% | +0.05% |
simd_batch_cosine/scalar_loop/10 |
-0.00% | -0.01% | +0.01% |
simd_batch_cosine/scalar_loop/100 |
-0.08% | -0.10% | -0.06% |
simd_batch_cosine/scalar_loop/1000 |
-0.42% | -0.46% | -0.38% |
simd_batch_cosine/simd_batch/10 |
-0.05% | -0.10% | -0.02% |
simd_batch_cosine/simd_batch/100 |
+0.45% | +0.43% | +0.46% |
simd_batch_cosine/simd_batch/1000 |
-6.51% | -6.57% | -6.45% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c |
-0.12% | -0.15% | -0.08% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c |
-1.16% | -1.17% | -1.14% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c |
+0.84% | +0.62% | +1.06% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c |
+0.00% | -0.01% | +0.01% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c |
-0.12% | -0.16% | -0.10% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c |
+0.44% | +0.38% | +0.49% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c |
+0.65% | +0.63% | +0.68% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c |
-2.21% | -2.25% | -2.18% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c |
-0.08% | -0.09% | -0.07% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c |
-0.15% | -0.16% | -0.14% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c |
+0.36% | +0.30% | +0.43% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c |
-0.98% | -0.98% | -0.97% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c |
-1.05% | -1.16% | -0.94% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c |
-0.05% | -0.07% | -0.04% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c |
+0.06% | +0.04% | +0.07% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c |
-0.05% | -0.11% | +0.05% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c |
-1.15% | -1.16% | -1.14% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c |
-2.33% | -2.50% | -2.15% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c |
+0.20% | +0.19% | +0.22% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c |
-0.11% | -0.13% | -0.10% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c |
+0.13% | +0.10% | +0.16% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c |
+0.37% | +0.34% | +0.40% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c |
-2.40% | -2.43% | -2.36% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c |
+0.03% | +0.02% | +0.04% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c |
+0.04% | +0.02% | +0.05% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c |
+0.18% | +0.12% | +0.26% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c |
+0.32% | +0.30% | +0.33% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c |
-1.96% | -2.06% | -1.87% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c |
+0.04% | +0.03% | +0.04% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c |
-0.02% | -0.07% | +0.02% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c |
+0.03% | -0.00% | +0.07% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c |
-0.55% | -0.56% | -0.54% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c |
+0.48% | +0.31% | +0.66% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c |
-0.03% | -0.05% | -0.01% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c |
-0.11% | -0.13% | -0.10% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c |
+0.54% | +0.51% | +0.57% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c |
+0.18% | +0.17% | +0.19% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c |
+0.14% | +0.13% | +0.16% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c |
+0.04% | +0.02% | +0.06% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c |
+0.78% | +0.77% | +0.79% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c |
-0.85% | -0.89% | -0.81% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c |
+2.22% | +2.21% | +2.24% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c |
-0.52% | -0.62% | -0.42% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c |
+0.42% | +0.39% | +0.45% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c |
+0.49% | +0.48% | +0.50% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c |
+0.44% | +0.35% | +0.54% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c |
-1.03% | -1.08% | -0.99% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c |
+1.57% | +1.19% | +1.93% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c |
+0.19% | +0.16% | +0.23% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c |
+1.04% | +0.96% | +1.17% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c |
+2.13% | +2.09% | +2.17% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c |
+1.82% | +1.78% | +1.85% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c |
+0.43% | +0.41% | +0.44% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c |
+0.13% | +0.08% | +0.18% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c |
+2.25% | +2.24% | +2.26% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c |
+1.70% | +1.65% | +1.75% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c |
+8.35% | +8.25% | +8.46% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c |
-1.70% | -1.83% | -1.58% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c |
+1.49% | +1.42% | +1.55% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c |
+1.21% | +1.20% | +1.23% |
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c |
+0.10% | +0.07% | +0.14% |
simd_batch_cosine_normalized_query/simd_batch/1024d_16c |
-0.43% | -0.44% | -0.41% |
simd_batch_cosine_normalized_query/simd_batch/1024d_256c |
+3.68% | +3.48% | +3.88% |
simd_batch_cosine_normalized_query/simd_batch/1024d_4c |
+0.06% | +0.05% | +0.07% |
simd_batch_cosine_normalized_query/simd_batch/1024d_64c |
-0.10% | -0.12% | -0.09% |
simd_batch_cosine_normalized_query/simd_batch/384d_1000c |
+0.51% | +0.49% | +0.54% |
simd_batch_cosine_normalized_query/simd_batch/384d_16c |
+0.22% | +0.21% | +0.23% |
simd_batch_cosine_normalized_query/simd_batch/384d_256c |
+0.22% | +0.19% | +0.24% |
simd_batch_cosine_normalized_query/simd_batch/384d_4c |
+0.14% | +0.13% | +0.15% |
simd_batch_cosine_normalized_query/simd_batch/384d_64c |
+0.96% | +0.95% | +0.97% |
simd_batch_cosine_normalized_query/simd_batch/768d_1000c |
-0.74% | -0.79% | -0.69% |
simd_batch_cosine_normalized_query/simd_batch/768d_16c |
+1.40% | +1.38% | +1.42% |
simd_batch_cosine_normalized_query/simd_batch/768d_256c |
-1.51% | -1.62% | -1.40% |
simd_batch_cosine_normalized_query/simd_batch/768d_4c |
+0.43% | +0.42% | +0.44% |
simd_batch_cosine_normalized_query/simd_batch/768d_64c |
+0.48% | +0.47% | +0.49% |
simd_batch_dot_product/scalar_loop/10 |
-0.26% | -0.28% | -0.24% |
simd_batch_dot_product/scalar_loop/100 |
+0.38% | +0.35% | +0.41% |
simd_batch_dot_product/scalar_loop/1000 |
-0.54% | -0.60% | -0.50% |
simd_batch_dot_product/simd_batch/10 |
-1.73% | -1.82% | -1.64% |
simd_batch_dot_product/simd_batch/100 |
+0.11% | +0.09% | +0.12% |
simd_batch_dot_product/simd_batch/1000 |
-14.51% | -14.58% | -14.44% |
simd_cosine_similarity/scalar/1024 |
-0.01% | -0.03% | +0.01% |
simd_cosine_similarity/scalar/1536 |
-0.04% | -0.05% | -0.02% |
simd_cosine_similarity/scalar/384 |
-0.22% | -0.26% | -0.20% |
simd_cosine_similarity/scalar/768 |
+0.05% | +0.04% | +0.07% |
simd_cosine_similarity/simd/1024 |
+0.00% | -0.01% | +0.01% |
simd_cosine_similarity/simd/1536 |
-0.11% | -0.13% | -0.09% |
simd_cosine_similarity/simd/384 |
+0.12% | +0.00% | +0.24% |
simd_cosine_similarity/simd/768 |
+0.37% | +0.34% | +0.39% |
simd_dot_product/scalar/1024 |
-0.00% | -0.03% | +0.02% |
simd_dot_product/scalar/1536 |
-0.03% | -0.04% | -0.01% |
simd_dot_product/scalar/384 |
+0.01% | -0.01% | +0.03% |
simd_dot_product/scalar/768 |
+0.02% | +0.01% | +0.03% |
simd_dot_product/simd/1024 |
-4.34% | -4.40% | -4.28% |
simd_dot_product/simd/1536 |
+5.20% | +5.13% | +5.29% |
simd_dot_product/simd/384 |
-0.22% | -0.28% | -0.16% |
simd_dot_product/simd/768 |
+0.01% | -0.11% | +0.14% |
simd_euclidean_distance/scalar/1024 |
+1.04% | +1.00% | +1.07% |
simd_euclidean_distance/scalar/1536 |
+0.14% | +0.11% | +0.18% |
simd_euclidean_distance/scalar/384 |
+0.47% | +0.43% | +0.52% |
simd_euclidean_distance/scalar/768 |
+0.34% | +0.31% | +0.38% |
simd_euclidean_distance/simd/1024 |
+0.13% | +0.11% | +0.16% |
simd_euclidean_distance/simd/1536 |
-0.11% | -0.12% | -0.10% |
simd_euclidean_distance/simd/384 |
-0.67% | -0.70% | -0.64% |
simd_euclidean_distance/simd/768 |
-0.42% | -0.45% | -0.40% |
simd_normalize/scalar/1024 |
+0.45% | +0.16% | +0.80% |
simd_normalize/scalar/1536 |
+0.13% | -0.05% | +0.33% |
simd_normalize/scalar/384 |
-0.05% | -0.43% | +0.33% |
simd_normalize/scalar/768 |
+0.30% | +0.10% | +0.50% |
simd_normalize/simd/1024 |
+1.55% | +0.59% | +2.58% |
simd_normalize/simd/1536 |
+1.46% | +0.66% | +2.27% |
simd_normalize/simd/384 |
+2.75% | +0.82% | +4.65% |
simd_normalize/simd/768 |
+5.59% | +3.52% | +7.85% |
simd_normalized_cosine_fast_path/cosine_full/1024 |
+0.38% | +0.34% | +0.44% |
simd_normalized_cosine_fast_path/cosine_full/384 |
+0.46% | +0.37% | +0.55% |
simd_normalized_cosine_fast_path/cosine_full/768 |
+0.42% | +0.37% | +0.46% |
simd_normalized_cosine_fast_path/dot_product/1024 |
-2.89% | -3.06% | -2.72% |
simd_normalized_cosine_fast_path/dot_product/384 |
-1.12% | -1.23% | -1.01% |
simd_normalized_cosine_fast_path/dot_product/768 |
-3.96% | -4.23% | -3.66% |
simd_prepared_query_normalized_cosine/dot_product_loop/1024 |
+0.07% | -0.02% | +0.15% |
simd_prepared_query_normalized_cosine/dot_product_loop/384 |
+0.71% | +0.66% | +0.75% |
simd_prepared_query_normalized_cosine/dot_product_loop/768 |
+2.17% | +1.68% | +2.81% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 |
+0.13% | +0.08% | +0.18% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 |
+0.63% | +0.61% | +0.65% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 |
-0.31% | -0.36% | -0.27% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 |
-1.38% | -1.52% | -1.25% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 |
+0.78% | +0.73% | +0.84% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 |
+0.01% | -0.04% | +0.06% |
simd_query_batch_dot_product/pair_loop/128d_16c |
-0.48% | -0.53% | -0.44% |
simd_query_batch_dot_product/pair_loop/128d_256c |
+1.72% | +1.57% | +1.84% |
simd_query_batch_dot_product/pair_loop/128d_4c |
+0.38% | +0.29% | +0.47% |
simd_query_batch_dot_product/pair_loop/128d_64c |
-0.41% | -0.48% | -0.35% |
simd_query_batch_dot_product/pair_loop/384d_16c |
+2.31% | +2.29% | +2.33% |
simd_query_batch_dot_product/pair_loop/384d_256c |
-0.36% | -0.41% | -0.31% |
simd_query_batch_dot_product/pair_loop/384d_4c |
+0.91% | +0.81% | +1.01% |
simd_query_batch_dot_product/pair_loop/384d_64c |
+3.24% | +3.23% | +3.25% |
simd_query_batch_dot_product/pair_loop/768d_16c |
+0.57% | +0.51% | +0.64% |
simd_query_batch_dot_product/pair_loop/768d_256c |
-6.95% | -7.04% | -6.86% |
simd_query_batch_dot_product/pair_loop/768d_4c |
-0.62% | -0.74% | -0.52% |
simd_query_batch_dot_product/pair_loop/768d_64c |
+0.88% | +0.86% | +0.90% |
simd_query_batch_dot_product/simd_batch/128d_16c |
-0.98% | -1.02% | -0.94% |
simd_query_batch_dot_product/simd_batch/128d_256c |
+0.83% | +0.79% | +0.88% |
simd_query_batch_dot_product/simd_batch/128d_4c |
+0.29% | +0.24% | +0.34% |
simd_query_batch_dot_product/simd_batch/128d_64c |
+1.23% | +1.16% | +1.29% |
simd_query_batch_dot_product/simd_batch/384d_16c |
+1.58% | +1.56% | +1.59% |
simd_query_batch_dot_product/simd_batch/384d_256c |
+0.48% | +0.46% | +0.50% |
simd_query_batch_dot_product/simd_batch/384d_4c |
+0.46% | +0.43% | +0.48% |
simd_query_batch_dot_product/simd_batch/384d_64c |
+1.28% | +1.27% | +1.29% |
simd_query_batch_dot_product/simd_batch/768d_16c |
+3.90% | +3.85% | +3.95% |
simd_query_batch_dot_product/simd_batch/768d_256c |
-0.47% | -0.57% | -0.36% |
simd_query_batch_dot_product/simd_batch/768d_4c |
-0.01% | -0.03% | +0.02% |
simd_query_batch_dot_product/simd_batch/768d_64c |
+1.49% | +1.45% | +1.52% |
simd_squared_euclidean_fast_path/euclidean_full/1024 |
-0.17% | -0.20% | -0.12% |
simd_squared_euclidean_fast_path/euclidean_full/384 |
-1.05% | -1.12% | -1.01% |
simd_squared_euclidean_fast_path/euclidean_full/768 |
-0.74% | -0.76% | -0.73% |
simd_squared_euclidean_fast_path/squared_euclidean/1024 |
-0.35% | -0.37% | -0.33% |
simd_squared_euclidean_fast_path/squared_euclidean/384 |
+0.07% | +0.04% | +0.11% |
simd_squared_euclidean_fast_path/squared_euclidean/768 |
-0.18% | -0.21% | -0.16% |
simd_throughput_384/cosine_similarity |
+0.88% | +0.73% | +1.04% |
simd_throughput_384/dot_product |
-0.21% | -0.38% | -0.05% |
simd_throughput_384/euclidean_distance |
-1.00% | -1.02% | -0.97% |
simd_throughput_384/normalize |
-1.73% | -1.74% | -1.71% |
softmax_attention/128 |
+0.08% | +0.06% | +0.10% |
softmax_attention/512 |
+0.84% | +0.64% | +1.03% |
tier_prepared_query/binary_query_once_1000 |
-0.01% | -0.06% | +0.04% |
tier_prepared_query/binary_query_per_call_1000 |
+0.00% | -0.01% | +0.01% |
tier_prepared_query/int4_query_once_1000 |
+0.85% | +0.79% | +0.93% |
tier_prepared_query/int4_query_per_call_1000 |
+0.39% | +0.38% | +0.40% |
tier_prepared_query/int8_query_once_1000 |
+3.81% | +3.77% | +3.85% |
tier_prepared_query/int8_query_per_call_1000 |
+0.01% | -0.01% | +0.02% |
Rule: CI-lower of change ≤3.0% passes silently; (3.0%, 7.0%] warns; >7.0% fails. Override via PR label bench-allow-regression.
x86_64-linux — perf regression report
❌ 61 FAIL (regression >7.0% confirmed by 95% CI)
⚠ 46 WARN (regression 3.0-7.0% confirmed)
🚀 78 confirmed improvement
| Bench | Δ point | 95% CI | new ns | base ns | verdict |
|---|---|---|---|---|---|
int8_quantization/quantize/1536 |
+41.21% | [+40.70%, +41.62%] | 11367.8 | 8050.2 | ❌ FAIL |
int8_quantization/quantize/384 |
+40.20% | [+39.89%, +40.56%] | 2830.2 | 2018.6 | ❌ FAIL |
int8_quantization/quantize/768 |
+29.50% | [+29.30%, +29.67%] | 5671.6 | 4379.6 | ❌ FAIL |
int8_quantization/quantize/1024 |
+29.53% | [+29.25%, +29.84%] | 7559.4 | 5836.1 | ❌ FAIL |
layer_norm/4096 |
+26.91% | [+26.34%, +27.63%] | 941.2 | 741.6 | ❌ FAIL |
simd_dot_product/simd/1024 |
+23.49% | [+23.15%, +23.75%] | 79.0 | 64.0 | ❌ FAIL |
simd_normalized_cosine_fast_path/dot_product/1024 |
+22.94% | [+22.78%, +23.13%] | 78.7 | 64.0 | ❌ FAIL |
simd_normalized_cosine_fast_path/dot_product/768 |
+22.70% | [+22.49%, +22.94%] | 60.5 | 49.3 | ❌ FAIL |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c |
+18.35% | [+18.20%, +18.56%] | 486.3 | 410.9 | ❌ FAIL |
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 |
+17.39% | [+16.89%, +17.74%] | 77912.6 | 66369.0 | ❌ FAIL |
simd_query_batch_dot_product/pair_loop/384d_16c |
+13.77% | [+13.46%, +14.11%] | 475.6 | 418.0 | ❌ FAIL |
simd_batch_dot_product/simd_batch/10 |
+13.37% | [+13.19%, +13.63%] | 320.7 | 282.9 | ❌ FAIL |
simd_dot_product/scalar/1536 |
+12.76% | [+12.41%, +13.21%] | 1593.9 | 1413.6 | ❌ FAIL |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c |
+12.58% | [+12.35%, +12.78%] | 1939.5 | 1722.8 | ❌ FAIL |
simd_normalize/scalar/768 |
+12.64% | [+12.35%, +12.99%] | 888.3 | 788.6 | ❌ FAIL |
simd_euclidean_distance/scalar/1536 |
+12.31% | [+12.26%, +12.36%] | 1594.5 | 1419.7 | ❌ FAIL |
simd_euclidean_distance/scalar/1024 |
+12.29% | [+12.09%, +12.58%] | 1057.4 | 941.6 | ❌ FAIL |
simd_normalize/scalar/1536 |
+12.15% | [+11.86%, +12.48%] | 1768.4 | 1576.8 | ❌ FAIL |
simd_normalize/scalar/384 |
+12.23% | [+11.86%, +12.61%] | 444.8 | 396.3 | ❌ FAIL |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c |
+12.01% | [+11.81%, +12.20%] | 127.6 | 113.9 | ❌ FAIL |
simd_dot_product/scalar/1024 |
+11.93% | [+11.81%, +12.15%] | 1047.0 | 935.4 | ❌ FAIL |
simd_euclidean_distance/scalar/768 |
+11.74% | [+11.65%, +11.82%] | 785.6 | 703.0 | ❌ FAIL |
gelu/4096 |
+12.18% | [+11.57%, +12.88%] | 1836.2 | 1636.9 | ❌ FAIL |
simd_cosine_similarity/scalar/1536 |
+11.76% | [+11.51%, +11.91%] | 4731.5 | 4233.5 | ❌ FAIL |
rms_norm/896 |
+12.08% | [+11.47%, +12.49%] | 233.2 | 208.1 | ❌ FAIL |
simd_normalize/scalar/1024 |
+11.71% | [+11.46%, +11.95%] | 1179.0 | 1055.4 | ❌ FAIL |
gelu/896 |
+11.61% | [+11.40%, +11.83%] | 399.1 | 357.6 | ❌ FAIL |
softmax_attention/512 |
+11.44% | [+11.24%, +11.61%] | 75600.2 | 67838.2 | ❌ FAIL |
simd_cosine_similarity/scalar/1024 |
+11.30% | [+11.23%, +11.37%] | 3112.1 | 2796.0 | ❌ FAIL |
simd_query_batch_dot_product/pair_loop/384d_64c |
+11.14% | [+10.93%, +11.30%] | 1915.4 | 1723.4 | ❌ FAIL |
simd_query_batch_dot_product/pair_loop/384d_4c |
+11.11% | [+10.77%, +11.47%] | 128.8 | 115.9 | ❌ FAIL |
simd_euclidean_distance/scalar/384 |
+10.74% | [+10.68%, +10.81%] | 380.8 | 343.9 | ❌ FAIL |
simd_cosine_similarity/scalar/768 |
+10.92% | [+10.55%, +11.36%] | 2308.1 | 2080.9 | ❌ FAIL |
silu_inplace/896 |
+10.89% | [+10.53%, +11.23%] | 3007.7 | 2712.3 | ❌ FAIL |
simd_dot_product/scalar/768 |
+10.78% | [+9.98%, +11.32%] | 775.1 | 699.7 | ❌ FAIL |
simd_normalized_cosine_fast_path/cosine_full/1024 |
+10.09% | [+9.83%, +10.31%] | 90.8 | 82.5 | ❌ FAIL |
simd_batch_dot_product/scalar_loop/1000 |
+10.11% | [+9.78%, +10.48%] | 371624.9 | 337512.0 | ❌ FAIL |
simd_dot_product/scalar/384 |
+9.81% | [+9.64%, +10.04%] | 370.7 | 337.6 | ❌ FAIL |
simd_batch_dot_product/scalar_loop/100 |
+9.62% | [+9.54%, +9.70%] | 36190.7 | 33015.8 | ❌ FAIL |
simd_query_batch_dot_product/simd_batch/384d_64c |
+9.74% | [+9.49%, +10.05%] | 1237.3 | 1127.4 | ❌ FAIL |
rms_norm/4096 |
+10.07% | [+9.46%, +10.57%] | 871.2 | 791.5 | ❌ FAIL |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c |
+9.57% | [+9.44%, +9.68%] | 4143.1 | 3781.3 | ❌ FAIL |
simd_batch_dot_product/scalar_loop/10 |
+9.75% | [+9.39%, +9.99%] | 3617.4 | 3296.1 | ❌ FAIL |
simd_cosine_similarity/scalar/384 |
+9.32% | [+9.25%, +9.39%] | 1088.6 | 995.8 | ❌ FAIL |
silu_inplace/4096 |
+10.16% | [+9.23%, +10.84%] | 13742.4 | 12474.7 | ❌ FAIL |
simd_euclidean_distance/simd/1536 |
+9.31% | [+9.10%, +9.47%] | 102.8 | 94.1 | ❌ FAIL |
simd_batch_cosine/scalar_loop/100 |
+8.95% | [+8.87%, +9.05%] | 107378.3 | 98553.8 | ❌ FAIL |
simd_batch_cosine/scalar_loop/10 |
+8.93% | [+8.80%, +9.10%] | 10746.7 | 9866.0 | ❌ FAIL |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c |
+9.18% | [+8.79%, +9.67%] | 74317.3 | 68067.9 | ❌ FAIL |
simd_cosine_similarity/simd/768 |
+8.95% | [+8.75%, +9.22%] | 72.4 | 66.5 | ❌ FAIL |
simd_query_batch_dot_product/simd_batch/768d_64c |
+9.08% | [+8.73%, +9.35%] | 2380.6 | 2182.4 | ❌ FAIL |
simd_batch_cosine/scalar_loop/1000 |
+8.79% | [+8.73%, +8.84%] | 1076858.4 | 989878.0 | ❌ FAIL |
softmax_attention/128 |
+8.82% | [+8.69%, +8.93%] | 4794.5 | 4405.9 | ❌ FAIL |
add_bias_gelu/896 |
+8.69% | [+8.41%, +8.97%] | 417.1 | 383.8 | ❌ FAIL |
int8_vs_float32_cosine/float32_simd/1536 |
+8.75% | [+8.31%, +9.28%] | 119.9 | 110.2 | ❌ FAIL |
simd_normalized_cosine_fast_path/cosine_full/768 |
+8.49% | [+8.23%, +8.80%] | 72.5 | 66.8 | ❌ FAIL |
simd_query_batch_dot_product/pair_loop/128d_4c |
+8.55% | [+8.19%, +8.88%] | 69.7 | 64.2 | ❌ FAIL |
add_bias_gelu/4096 |
+8.23% | [+8.02%, +8.40%] | 1914.8 | 1769.3 | ❌ FAIL |
simd_throughput_384/normalize |
+8.17% | [+7.99%, +8.37%] | 114.4 | 105.8 | ❌ FAIL |
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 |
+7.59% | [+7.15%, +8.02%] | 30758.9 | 28588.6 | ❌ FAIL |
simd_squared_euclidean_fast_path/squared_euclidean/1024 |
+7.69% | [+7.06%, +8.55%] | 67.3 | 62.5 | ❌ FAIL |
simd_query_batch_dot_product/simd_batch/128d_4c |
+7.02% | [+6.69%, +7.25%] | 46.3 | 43.3 | ⚠ WARN |
simd_squared_euclidean_fast_path/euclidean_full/1024 |
+6.82% | [+6.59%, +6.99%] | 71.7 | 67.1 | ⚠ WARN |
tier_prepared_query/int8_query_per_call_1000 |
+6.42% | [+6.33%, +6.50%] | 2346822.0 | 2205337.5 | ⚠ WARN |
int8_prepared_dot_product/per_call/1024 |
+6.77% | [+6.24%, +7.28%] | 6243.9 | 5847.8 | ⚠ WARN |
int8_prepared_dot_product/per_call/768 |
+6.43% | [+6.23%, +6.75%] | 4674.9 | 4392.5 | ⚠ WARN |
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 |
+6.60% | [+6.17%, +7.00%] | 59515.4 | 55829.2 | ⚠ WARN |
int4_cosine_distance/float32_simd/1536 |
+6.65% | [+6.14%, +7.31%] | 121.5 | 113.9 | ⚠ WARN |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c |
+6.26% | [+6.06%, +6.50%] | 31584.2 | 29723.4 | ⚠ WARN |
int8_prepared_dot_product/per_call/129 |
+6.14% | [+5.92%, +6.45%] | 790.1 | 744.4 | ⚠ WARN |
int8_prepared_dot_product/per_call/128 |
+6.10% | [+5.74%, +6.40%] | 787.5 | 742.2 | ⚠ WARN |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c |
+5.81% | [+5.71%, +5.91%] | 5415.5 | 5118.3 | ⚠ WARN |
simd_query_batch_dot_product/simd_batch/768d_16c |
+5.84% | [+5.67%, +6.07%] | 601.1 | 567.9 | ⚠ WARN |
int8_prepared_dot_product/per_call/384 |
+6.14% | [+5.66%, +6.56%] | 2341.9 | 2206.5 | ⚠ WARN |
binary_cosine_distance/float32_simd/1536 |
+5.83% | [+5.56%, +6.07%] | 120.6 | 114.0 | ⚠ WARN |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c |
+5.76% | [+5.52%, +5.91%] | 1026.2 | 970.3 | ⚠ WARN |
simd_cosine_similarity/simd/1536 |
+5.54% | [+5.32%, +5.71%] | 119.6 | 113.4 | ⚠ WARN |
int8_raw_dot_product/dot_product_i8_raw/127 |
+6.39% | [+5.31%, +7.85%] | 14.4 | 13.6 | ⚠ WARN |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c |
+5.46% | [+5.12%, +5.69%] | 1354.4 | 1284.3 | ⚠ WARN |
binary_cosine_distance/float32_simd/1024 |
+5.33% | [+5.11%, +5.52%] | 86.8 | 82.4 | ⚠ WARN |
int8_prepared_dot_product/per_call/127 |
+5.27% | [+5.09%, +5.49%] | 778.0 | 739.0 | ⚠ WARN |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c |
+5.29% | [+5.04%, +5.51%] | 5457.5 | 5183.4 | ⚠ WARN |
simd_query_batch_dot_product/pair_loop/128d_64c |
+5.28% | [+4.99%, +5.48%] | 848.7 | 806.1 | ⚠ WARN |
int8_vs_float32_cosine/int8/1536 |
+5.26% | [+4.96%, +5.52%] | 51.7 | 49.2 | ⚠ WARN |
simd_squared_euclidean_fast_path/euclidean_full/768 |
+5.12% | [+4.89%, +5.34%] | 56.8 | 54.0 | ⚠ WARN |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c |
+4.78% | [+4.63%, +5.02%] | 1364.1 | 1301.9 | ⚠ WARN |
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 |
+4.72% | [+4.51%, +4.99%] | 90966.0 | 86865.8 | ⚠ WARN |
int8_raw_dot_product/dot_product_i8_raw/384 |
+4.79% | [+4.42%, +5.17%] | 13.6 | 13.0 | ⚠ WARN |
simd_euclidean_distance/simd/768 |
+4.56% | [+4.40%, +4.71%] | 56.9 | 54.4 | ⚠ WARN |
simd_query_batch_dot_product/pair_loop/128d_256c |
+4.83% | [+4.40%, +5.31%] | 3297.6 | 3145.6 | ⚠ WARN |
tier_prepared_query/int4_query_once_1000 |
+4.81% | [+4.40%, +5.09%] | 1463454.3 | 1396312.4 | ⚠ WARN |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c |
+4.69% | [+4.37%, +4.91%] | 338.3 | 323.2 | ⚠ WARN |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c |
+4.60% | [+4.33%, +4.97%] | 7626.8 | 7291.2 | ⚠ WARN |
int4_cosine_distance/float32_simd/1024 |
+4.64% | [+4.25%, +5.11%] | 87.2 | 83.3 | ⚠ WARN |
simd_squared_euclidean_fast_path/squared_euclidean/768 |
+4.99% | [+4.25%, +5.64%] | 52.0 | 49.5 | ⚠ WARN |
int8_vs_float32_cosine/float32_simd/1024 |
+4.29% | [+4.15%, +4.44%] | 85.6 | 82.0 | ⚠ WARN |
simd_query_batch_dot_product/pair_loop/384d_256c |
+4.51% | [+4.12%, +4.81%] | 7660.2 | 7329.8 | ⚠ WARN |
simd_query_batch_dot_product/simd_batch/384d_16c |
+4.21% | [+4.00%, +4.49%] | 258.3 | 247.9 | ⚠ WARN |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c |
+4.09% | [+3.95%, +4.21%] | 344.2 | 330.6 | ⚠ WARN |
tier_prepared_query/int4_query_per_call_1000 |
+4.13% | [+3.85%, +4.43%] | 3783864.7 | 3633651.2 | ⚠ WARN |
binary_cosine_distance/float32_simd/768 |
+4.00% | [+3.83%, +4.15%] | 69.9 | 67.3 | ⚠ WARN |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c |
+5.48% | [+3.83%, +7.62%] | 23253.8 | 22046.4 | ⚠ WARN |
simd_query_batch_dot_product/pair_loop/128d_16c |
+4.01% | [+3.71%, +4.24%] | 216.8 | 208.4 | ⚠ WARN |
int4_cosine_distance/float32_simd/768 |
+3.88% | [+3.69%, +4.04%] | 69.9 | 67.3 | ⚠ WARN |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c |
+3.62% | [+3.35%, +3.85%] | 89406.8 | 86284.7 | ⚠ WARN |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c |
+3.36% | [+3.15%, +3.52%] | 90435.0 | 87496.9 | ⚠ WARN |
int8_vs_float32_cosine/float32_simd/768 |
+3.20% | [+3.02%, +3.41%] | 68.9 | 66.8 | ⚠ WARN |
simd_batch_cosine_normalized_query/simd_batch/768d_64c |
-3.11% | [-3.25%, -2.91%] | 4142.1 | 4274.9 | 🚀 WIN |
simd_normalized_cosine_fast_path/cosine_full/384 |
-3.09% | [-3.30%, -2.84%] | 44.0 | 45.4 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c |
-3.20% | [-3.45%, -3.04%] | 4182.1 | 4320.4 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c |
-3.37% | [-3.54%, -3.23%] | 165.1 | 170.8 | 🚀 WIN |
int8_raw_dot_product/dot_product_i8/127 |
-3.31% | [-3.61%, -2.92%] | 16.4 | 17.0 | 🚀 WIN |
int8_raw_dot_product/dot_product_i8_raw/768 |
-3.37% | [-3.76%, -2.94%] | 24.2 | 25.0 | 🚀 WIN |
int8_batch_cosine/float32_simd/10 |
-3.25% | [-3.80%, -2.62%] | 411.9 | 425.8 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c |
-3.42% | [-3.82%, -3.09%] | 266.6 | 276.0 | 🚀 WIN |
simd_throughput_384/dot_product |
-3.87% | [-4.12%, -3.64%] | 27.5 | 28.6 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c |
-4.40% | [-4.54%, -4.25%] | 624.7 | 653.5 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/768d_16c |
-4.39% | [-4.59%, -4.21%] | 1037.2 | 1084.9 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/384d_16c |
-4.56% | [-4.69%, -4.45%] | 623.4 | 653.2 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/768d_1000c |
-4.36% | [-4.72%, -4.02%] | 67962.1 | 71060.2 | 🚀 WIN |
int8_prepared_dot_product/prepared/127 |
-4.37% | [-4.81%, -3.92%] | 16.1 | 16.9 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c |
-4.43% | [-4.88%, -4.08%] | 68848.4 | 72041.3 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c |
-4.74% | [-4.95%, -4.49%] | 40980.7 | 43021.0 | 🚀 WIN |
memory_size/search_1000_int8 |
-4.81% | [-5.06%, -4.49%] | 15652.5 | 16442.6 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c |
-4.98% | [-5.11%, -4.83%] | 1047.4 | 1102.3 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c |
-4.87% | [-5.14%, -4.56%] | 56571.8 | 59466.9 | 🚀 WIN |
simd_dot_product/simd/384 |
-4.97% | [-5.20%, -4.74%] | 27.2 | 28.7 | 🚀 WIN |
int8_batch_cosine/int8_loop/10 |
-5.03% | [-5.25%, -4.80%] | 169.1 | 178.0 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c |
-4.79% | [-5.25%, -4.31%] | 169.3 | 177.8 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c |
-5.15% | [-5.34%, -4.94%] | 2534.0 | 2671.6 | 🚀 WIN |
simd_batch_dot_product/simd_batch/100 |
-5.15% | [-5.35%, -4.91%] | 3612.5 | 3808.6 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c |
-5.25% | [-5.43%, -5.04%] | 2469.4 | 2606.2 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c |
-5.26% | [-5.55%, -4.94%] | 640.0 | 675.5 | 🚀 WIN |
tier_prepared_query/int8_query_once_1000 |
-5.09% | [-5.63%, -4.33%] | 17353.5 | 18283.3 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/384d_64c |
-5.58% | [-5.71%, -5.38%] | 2471.6 | 2617.6 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c |
-5.30% | [-5.72%, -4.93%] | 640.4 | 676.2 | 🚀 WIN |
int8_raw_dot_product/dot_product_i8_raw/129 |
-5.39% | [-5.84%, -4.98%] | 7.2 | 7.7 | 🚀 WIN |
simd_batch_cosine/simd_batch/10 |
-5.62% | [-5.96%, -5.21%] | 403.4 | 427.4 | 🚀 WIN |
simd_query_batch_dot_product/pair_loop/768d_4c |
-5.74% | [-5.99%, -5.51%] | 212.7 | 225.7 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c |
-5.87% | [-6.02%, -5.74%] | 2529.2 | 2686.8 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c |
-5.58% | [-6.09%, -5.20%] | 208.2 | 220.5 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c |
-6.02% | [-6.13%, -5.91%] | 167.8 | 178.5 | 🚀 WIN |
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 |
-5.89% | [-6.29%, -5.57%] | 42131.1 | 44768.7 | 🚀 WIN |
simd_query_batch_dot_product/simd_batch/128d_16c |
-5.97% | [-6.33%, -5.68%] | 122.4 | 130.1 | 🚀 WIN |
simd_query_batch_dot_product/pair_loop/768d_64c |
-6.13% | [-6.38%, -5.93%] | 3271.3 | 3485.0 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c |
-6.33% | [-6.54%, -6.07%] | 16771.6 | 17904.7 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/384d_1000c |
-6.41% | [-6.66%, -6.13%] | 39334.0 | 42030.0 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c |
-6.36% | [-6.72%, -6.07%] | 823.7 | 879.7 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c |
-6.24% | [-6.85%, -5.78%] | 3262.9 | 3479.9 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/384d_256c |
-6.64% | [-6.87%, -6.46%] | 9793.8 | 10489.9 | 🚀 WIN |
memory_size/search_1000_float32 |
-6.69% | [-6.90%, -6.45%] | 40044.7 | 42915.4 | 🚀 WIN |
simd_query_batch_dot_product/pair_loop/768d_16c |
-6.94% | [-7.01%, -6.86%] | 835.1 | 897.3 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c |
-6.36% | [-7.03%, -5.64%] | 40913.2 | 43693.8 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c |
-7.03% | [-7.18%, -6.83%] | 9777.0 | 10516.6 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c |
-6.80% | [-7.30%, -6.48%] | 16619.9 | 17831.9 | 🚀 WIN |
int8_batch_cosine/int8_loop/100 |
-6.84% | [-7.35%, -6.39%] | 1696.4 | 1821.0 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c |
-7.84% | [-8.09%, -7.59%] | 10017.4 | 10869.1 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c |
-7.03% | [-8.22%, -6.14%] | 39468.7 | 42451.2 | 🚀 WIN |
int8_raw_dot_product/dot_product_i8_raw/128 |
-8.18% | [-8.80%, -7.61%] | 6.7 | 7.3 | 🚀 WIN |
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c |
-8.64% | [-8.82%, -8.43%] | 10030.0 | 10978.5 | 🚀 WIN |
int8_prepared_dot_product/prepared/768 |
-8.71% | [-8.94%, -8.53%] | 26.9 | 29.4 | 🚀 WIN |
simd_batch_dot_product/simd_batch/1000 |
-8.64% | [-8.95%, -8.35%] | 50191.6 | 54941.1 | 🚀 WIN |
simd_query_batch_dot_product/simd_batch/384d_256c |
-8.36% | [-9.19%, -7.23%] | 4941.7 | 5392.5 | 🚀 WIN |
simd_prepared_query_normalized_cosine/dot_product_loop/384 |
-9.01% | [-9.20%, -8.78%] | 30782.2 | 33831.0 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c |
-9.31% | [-9.47%, -9.09%] | 16765.9 | 18486.4 | 🚀 WIN |
int8_raw_dot_product/dot_product_i8/768 |
-9.38% | [-9.60%, -9.17%] | 26.6 | 29.4 | 🚀 WIN |
simd_batch_cosine_normalized_query/simd_batch/768d_256c |
-9.38% | [-9.60%, -9.17%] | 16641.4 | 18363.5 | 🚀 WIN |
simd_euclidean_distance/simd/1024 |
-10.06% | [-10.20%, -9.88%] | 71.8 | 79.8 | 🚀 WIN |
int8_raw_dot_product/dot_product_i8/128 |
-10.71% | [-11.11%, -10.35%] | 8.9 | 10.0 | 🚀 WIN |
simd_normalize/simd/384 |
-6.04% | [-11.13%, -0.00%] | 73.1 | 77.8 | 🚀 WIN |
int8_batch_cosine/int8_loop/1000 |
-10.92% | [-11.27%, -10.62%] | 16699.2 | 18746.0 | 🚀 WIN |
int8_raw_dot_product/dot_product_i8/129 |
-10.83% | [-11.38%, -10.35%] | 9.4 | 10.6 | 🚀 WIN |
int8_vs_float32_cosine/int8/768 |
-13.62% | [-13.78%, -13.36%] | 29.2 | 33.8 | 🚀 WIN |
int8_prepared_dot_product/prepared/129 |
-15.16% | [-15.50%, -14.78%] | 9.1 | 10.8 | 🚀 WIN |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c |
-15.69% | [-15.90%, -15.47%] | 13033.5 | 15459.2 | 🚀 WIN |
int8_prepared_dot_product/prepared/128 |
-15.72% | [-16.00%, -15.36%] | 8.6 | 10.2 | 🚀 WIN |
simd_euclidean_distance/simd/384 |
-16.54% | [-17.19%, -15.68%] | 29.1 | 34.8 | 🚀 WIN |
layer_norm/896 |
-17.47% | [-17.64%, -17.26%] | 199.9 | 242.2 | 🚀 WIN |
simd_query_batch_dot_product/pair_loop/768d_256c |
-17.91% | [-18.21%, -17.66%] | 12997.9 | 15834.6 | 🚀 WIN |
simd_normalize/simd/768 |
-15.33% | [-18.30%, -12.22%] | 111.5 | 131.7 | 🚀 WIN |
simd_normalized_cosine_fast_path/dot_product/384 |
-18.90% | [-19.08%, -18.72%] | 27.2 | 33.6 | 🚀 WIN |
simd_query_batch_dot_product/simd_batch/768d_256c |
-19.59% | [-19.82%, -19.36%] | 9493.6 | 11806.9 | 🚀 WIN |
simd_normalize/simd/1536 |
-17.37% | [-20.36%, -14.33%] | 194.6 | 235.5 | 🚀 WIN |
elementwise_mul/4096 |
-20.68% | [-21.02%, -20.10%] | 254.8 | 321.2 | 🚀 WIN |
simd_normalize/simd/1024 |
-19.78% | [-23.33%, -16.09%] | 141.3 | 176.2 | 🚀 WIN |
All 247 measurements
| Bench | Δ point | CI-lower | CI-upper |
|---|---|---|---|
add_bias_gelu/4096 |
+8.23% | +8.02% | +8.40% |
add_bias_gelu/896 |
+8.69% | +8.41% | +8.97% |
binary_cosine_distance/binary/1024 |
+0.85% | +0.72% | +1.00% |
binary_cosine_distance/binary/1536 |
+6.32% | +2.95% | +10.55% |
binary_cosine_distance/binary/384 |
-0.53% | -0.74% | -0.37% |
binary_cosine_distance/binary/768 |
+0.47% | +0.26% | +0.77% |
binary_cosine_distance/float32_simd/1024 |
+5.33% | +5.11% | +5.52% |
binary_cosine_distance/float32_simd/1536 |
+5.83% | +5.56% | +6.07% |
binary_cosine_distance/float32_simd/384 |
-0.36% | -0.59% | -0.14% |
binary_cosine_distance/float32_simd/768 |
+4.00% | +3.83% | +4.15% |
elementwise_mul/4096 |
-20.68% | -21.02% | -20.10% |
gelu/4096 |
+12.18% | +11.57% | +12.88% |
gelu/896 |
+11.61% | +11.40% | +11.83% |
int4_cosine_distance/float32_simd/1024 |
+4.64% | +4.25% | +5.11% |
int4_cosine_distance/float32_simd/1536 |
+6.65% | +6.14% | +7.31% |
int4_cosine_distance/float32_simd/384 |
+2.76% | +2.28% | +3.17% |
int4_cosine_distance/float32_simd/768 |
+3.88% | +3.69% | +4.04% |
int4_cosine_distance/int4/1024 |
+2.24% | +1.82% | +2.53% |
int4_cosine_distance/int4/1536 |
+2.20% | +1.77% | +2.53% |
int4_cosine_distance/int4/384 |
+1.71% | +1.16% | +2.35% |
int4_cosine_distance/int4/768 |
+2.30% | +1.96% | +2.56% |
int8_batch_cosine/float32_simd/10 |
-3.25% | -3.80% | -2.62% |
int8_batch_cosine/float32_simd/100 |
-2.61% | -2.85% | -2.30% |
int8_batch_cosine/float32_simd/1000 |
+3.32% | +2.69% | +3.85% |
int8_batch_cosine/int8_loop/10 |
-5.03% | -5.25% | -4.80% |
int8_batch_cosine/int8_loop/100 |
-6.84% | -7.35% | -6.39% |
int8_batch_cosine/int8_loop/1000 |
-10.92% | -11.27% | -10.62% |
int8_prepared_dot_product/per_call/1024 |
+6.77% | +6.24% | +7.28% |
int8_prepared_dot_product/per_call/127 |
+5.27% | +5.09% | +5.49% |
int8_prepared_dot_product/per_call/128 |
+6.10% | +5.74% | +6.40% |
int8_prepared_dot_product/per_call/129 |
+6.14% | +5.92% | +6.45% |
int8_prepared_dot_product/per_call/384 |
+6.14% | +5.66% | +6.56% |
int8_prepared_dot_product/per_call/768 |
+6.43% | +6.23% | +6.75% |
int8_prepared_dot_product/prepared/1024 |
+0.98% | +0.75% | +1.22% |
int8_prepared_dot_product/prepared/127 |
-4.37% | -4.81% | -3.92% |
int8_prepared_dot_product/prepared/128 |
-15.72% | -16.00% | -15.36% |
int8_prepared_dot_product/prepared/129 |
-15.16% | -15.50% | -14.78% |
int8_prepared_dot_product/prepared/384 |
+1.68% | +1.21% | +2.13% |
int8_prepared_dot_product/prepared/768 |
-8.71% | -8.94% | -8.53% |
int8_quantization/quantize/1024 |
+29.53% | +29.25% | +29.84% |
int8_quantization/quantize/1536 |
+41.21% | +40.70% | +41.62% |
int8_quantization/quantize/384 |
+40.20% | +39.89% | +40.56% |
int8_quantization/quantize/768 |
+29.50% | +29.30% | +29.67% |
int8_raw_dot_product/dot_product_i8/1024 |
+0.69% | -0.06% | +1.73% |
int8_raw_dot_product/dot_product_i8/127 |
-3.31% | -3.61% | -2.92% |
int8_raw_dot_product/dot_product_i8/128 |
-10.71% | -11.11% | -10.35% |
int8_raw_dot_product/dot_product_i8/129 |
-10.83% | -11.38% | -10.35% |
int8_raw_dot_product/dot_product_i8/384 |
+1.20% | +0.34% | +1.95% |
int8_raw_dot_product/dot_product_i8/768 |
-9.38% | -9.60% | -9.17% |
int8_raw_dot_product/dot_product_i8_raw/1024 |
-0.57% | -1.03% | +0.05% |
int8_raw_dot_product/dot_product_i8_raw/127 |
+6.39% | +5.31% | +7.85% |
int8_raw_dot_product/dot_product_i8_raw/128 |
-8.18% | -8.80% | -7.61% |
int8_raw_dot_product/dot_product_i8_raw/129 |
-5.39% | -5.84% | -4.98% |
int8_raw_dot_product/dot_product_i8_raw/384 |
+4.79% | +4.42% | +5.17% |
int8_raw_dot_product/dot_product_i8_raw/768 |
-3.37% | -3.76% | -2.94% |
int8_vs_float32_cosine/float32_simd/1024 |
+4.29% | +4.15% | +4.44% |
int8_vs_float32_cosine/float32_simd/1536 |
+8.75% | +8.31% | +9.28% |
int8_vs_float32_cosine/float32_simd/384 |
-1.00% | -1.24% | -0.76% |
int8_vs_float32_cosine/float32_simd/768 |
+3.20% | +3.02% | +3.41% |
int8_vs_float32_cosine/int8/1024 |
-0.13% | -1.25% | +0.81% |
int8_vs_float32_cosine/int8/1536 |
+5.26% | +4.96% | +5.52% |
int8_vs_float32_cosine/int8/384 |
-1.06% | -3.58% | +0.82% |
int8_vs_float32_cosine/int8/768 |
-13.62% | -13.78% | -13.36% |
layer_norm/4096 |
+26.91% | +26.34% | +27.63% |
layer_norm/896 |
-17.47% | -17.64% | -17.26% |
memory_size/search_1000_float32 |
-6.69% | -6.90% | -6.45% |
memory_size/search_1000_int8 |
-4.81% | -5.06% | -4.49% |
rms_norm/4096 |
+10.07% | +9.46% | +10.57% |
rms_norm/896 |
+12.08% | +11.47% | +12.49% |
silu_inplace/4096 |
+10.16% | +9.23% | +10.84% |
silu_inplace/896 |
+10.89% | +10.53% | +11.23% |
simd_batch_cosine/scalar_loop/10 |
+8.93% | +8.80% | +9.10% |
simd_batch_cosine/scalar_loop/100 |
+8.95% | +8.87% | +9.05% |
simd_batch_cosine/scalar_loop/1000 |
+8.79% | +8.73% | +8.84% |
simd_batch_cosine/simd_batch/10 |
-5.62% | -5.96% | -5.21% |
simd_batch_cosine/simd_batch/100 |
-2.04% | -2.22% | -1.80% |
simd_batch_cosine/simd_batch/1000 |
+2.71% | +2.23% | +3.13% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c |
+3.36% | +3.15% | +3.52% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c |
+4.78% | +4.63% | +5.02% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c |
+5.48% | +3.83% | +7.62% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c |
+4.09% | +3.95% | +4.21% |
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c |
+5.29% | +5.04% | +5.51% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c |
-4.74% | -4.95% | -4.49% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c |
-5.26% | -5.55% | -4.94% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c |
-8.64% | -8.82% | -8.43% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c |
-4.79% | -5.25% | -4.31% |
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c |
-5.15% | -5.34% | -4.94% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c |
-0.22% | -0.56% | +0.15% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c |
-1.40% | -1.56% | -1.19% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c |
-6.33% | -6.54% | -6.07% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c |
-2.46% | -2.71% | -2.23% |
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c |
-0.72% | -0.86% | -0.61% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c |
+3.62% | +3.35% | +3.85% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c |
+5.46% | +5.12% | +5.69% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c |
+2.61% | +2.38% | +2.77% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c |
+4.69% | +4.37% | +4.91% |
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c |
+5.81% | +5.71% | +5.91% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c |
-7.03% | -8.22% | -6.14% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c |
-4.40% | -4.54% | -4.25% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c |
-7.03% | -7.18% | -6.83% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c |
-3.37% | -3.54% | -3.23% |
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c |
-5.25% | -5.43% | -5.04% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c |
-1.47% | -1.79% | -1.18% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c |
-0.65% | -0.85% | -0.44% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c |
-6.80% | -7.30% | -6.48% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c |
+0.77% | +0.49% | +1.05% |
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c |
+0.12% | -0.09% | +0.26% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c |
+1.02% | +0.72% | +1.36% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c |
+0.16% | +0.07% | +0.25% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c |
+1.24% | +0.20% | +2.49% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c |
+0.29% | +0.01% | +0.59% |
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c |
+1.54% | +1.34% | +1.87% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c |
-6.36% | -7.03% | -5.64% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c |
-5.30% | -5.72% | -4.93% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c |
-7.84% | -8.09% | -7.59% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c |
-6.02% | -6.13% | -5.91% |
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c |
-5.87% | -6.02% | -5.74% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c |
-4.43% | -4.88% | -4.08% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c |
-4.98% | -5.11% | -4.83% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c |
-9.31% | -9.47% | -9.09% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c |
-3.42% | -3.82% | -3.09% |
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c |
-3.20% | -3.45% | -3.04% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c |
+9.18% | +8.79% | +9.67% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c |
+5.76% | +5.52% | +5.91% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c |
-2.09% | -2.36% | -1.79% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c |
+2.43% | +2.03% | +2.77% |
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c |
+9.57% | +9.44% | +9.68% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c |
+6.26% | +6.06% | +6.50% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c |
+18.35% | +18.20% | +18.56% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c |
+4.60% | +4.33% | +4.97% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c |
+12.01% | +11.81% | +12.20% |
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c |
+12.58% | +12.35% | +12.78% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c |
-4.87% | -5.14% | -4.56% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c |
-6.36% | -6.72% | -6.07% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c |
-15.69% | -15.90% | -15.47% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c |
-5.58% | -6.09% | -5.20% |
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c |
-6.24% | -6.85% | -5.78% |
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c |
+1.29% | +1.17% | +1.43% |
simd_batch_cosine_normalized_query/simd_batch/1024d_16c |
+0.76% | +0.41% | +0.99% |
simd_batch_cosine_normalized_query/simd_batch/1024d_256c |
+0.58% | +0.40% | +0.81% |
simd_batch_cosine_normalized_query/simd_batch/1024d_4c |
+2.55% | +2.40% | +2.74% |
simd_batch_cosine_normalized_query/simd_batch/1024d_64c |
+3.75% | +2.60% | +5.32% |
simd_batch_cosine_normalized_query/simd_batch/384d_1000c |
-6.41% | -6.66% | -6.13% |
simd_batch_cosine_normalized_query/simd_batch/384d_16c |
-4.56% | -4.69% | -4.45% |
simd_batch_cosine_normalized_query/simd_batch/384d_256c |
-6.64% | -6.87% | -6.46% |
simd_batch_cosine_normalized_query/simd_batch/384d_4c |
-2.54% | -2.69% | -2.30% |
simd_batch_cosine_normalized_query/simd_batch/384d_64c |
-5.58% | -5.71% | -5.38% |
simd_batch_cosine_normalized_query/simd_batch/768d_1000c |
-4.36% | -4.72% | -4.02% |
simd_batch_cosine_normalized_query/simd_batch/768d_16c |
-4.39% | -4.59% | -4.21% |
simd_batch_cosine_normalized_query/simd_batch/768d_256c |
-9.38% | -9.60% | -9.17% |
simd_batch_cosine_normalized_query/simd_batch/768d_4c |
-1.63% | -1.86% | -1.46% |
simd_batch_cosine_normalized_query/simd_batch/768d_64c |
-3.11% | -3.25% | -2.91% |
simd_batch_dot_product/scalar_loop/10 |
+9.75% | +9.39% | +9.99% |
simd_batch_dot_product/scalar_loop/100 |
+9.62% | +9.54% | +9.70% |
simd_batch_dot_product/scalar_loop/1000 |
+10.11% | +9.78% | +10.48% |
simd_batch_dot_product/simd_batch/10 |
+13.37% | +13.19% | +13.63% |
simd_batch_dot_product/simd_batch/100 |
-5.15% | -5.35% | -4.91% |
simd_batch_dot_product/simd_batch/1000 |
-8.64% | -8.95% | -8.35% |
simd_cosine_similarity/scalar/1024 |
+11.30% | +11.23% | +11.37% |
simd_cosine_similarity/scalar/1536 |
+11.76% | +11.51% | +11.91% |
simd_cosine_similarity/scalar/384 |
+9.32% | +9.25% | +9.39% |
simd_cosine_similarity/scalar/768 |
+10.92% | +10.55% | +11.36% |
simd_cosine_similarity/simd/1024 |
-0.35% | -0.47% | -0.24% |
simd_cosine_similarity/simd/1536 |
+5.54% | +5.32% | +5.71% |
simd_cosine_similarity/simd/384 |
-1.14% | -1.37% | -0.88% |
simd_cosine_similarity/simd/768 |
+8.95% | +8.75% | +9.22% |
simd_dot_product/scalar/1024 |
+11.93% | +11.81% | +12.15% |
simd_dot_product/scalar/1536 |
+12.76% | +12.41% | +13.21% |
simd_dot_product/scalar/384 |
+9.81% | +9.64% | +10.04% |
simd_dot_product/scalar/768 |
+10.78% | +9.98% | +11.32% |
simd_dot_product/simd/1024 |
+23.49% | +23.15% | +23.75% |
simd_dot_product/simd/1536 |
+2.36% | +2.18% | +2.60% |
simd_dot_product/simd/384 |
-4.97% | -5.20% | -4.74% |
simd_dot_product/simd/768 |
+3.11% | +2.93% | +3.33% |
simd_euclidean_distance/scalar/1024 |
+12.29% | +12.09% | +12.58% |
simd_euclidean_distance/scalar/1536 |
+12.31% | +12.26% | +12.36% |
simd_euclidean_distance/scalar/384 |
+10.74% | +10.68% | +10.81% |
simd_euclidean_distance/scalar/768 |
+11.74% | +11.65% | +11.82% |
simd_euclidean_distance/simd/1024 |
-10.06% | -10.20% | -9.88% |
simd_euclidean_distance/simd/1536 |
+9.31% | +9.10% | +9.47% |
simd_euclidean_distance/simd/384 |
-16.54% | -17.19% | -15.68% |
simd_euclidean_distance/simd/768 |
+4.56% | +4.40% | +4.71% |
simd_normalize/scalar/1024 |
+11.71% | +11.46% | +11.95% |
simd_normalize/scalar/1536 |
+12.15% | +11.86% | +12.48% |
simd_normalize/scalar/384 |
+12.23% | +11.86% | +12.61% |
simd_normalize/scalar/768 |
+12.64% | +12.35% | +12.99% |
simd_normalize/simd/1024 |
-19.78% | -23.33% | -16.09% |
simd_normalize/simd/1536 |
-17.37% | -20.36% | -14.33% |
simd_normalize/simd/384 |
-6.04% | -11.13% | -0.00% |
simd_normalize/simd/768 |
-15.33% | -18.30% | -12.22% |
simd_normalized_cosine_fast_path/cosine_full/1024 |
+10.09% | +9.83% | +10.31% |
simd_normalized_cosine_fast_path/cosine_full/384 |
-3.09% | -3.30% | -2.84% |
simd_normalized_cosine_fast_path/cosine_full/768 |
+8.49% | +8.23% | +8.80% |
simd_normalized_cosine_fast_path/dot_product/1024 |
+22.94% | +22.78% | +23.13% |
simd_normalized_cosine_fast_path/dot_product/384 |
-18.90% | -19.08% | -18.72% |
simd_normalized_cosine_fast_path/dot_product/768 |
+22.70% | +22.49% | +22.94% |
simd_prepared_query_normalized_cosine/dot_product_loop/1024 |
-2.14% | -2.48% | -1.79% |
simd_prepared_query_normalized_cosine/dot_product_loop/384 |
-9.01% | -9.20% | -8.78% |
simd_prepared_query_normalized_cosine/dot_product_loop/768 |
-1.53% | -1.87% | -1.19% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 |
+4.72% | +4.51% | +4.99% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 |
-5.89% | -6.29% | -5.57% |
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 |
-0.90% | -1.22% | -0.62% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 |
+17.39% | +16.89% | +17.74% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 |
+7.59% | +7.15% | +8.02% |
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 |
+6.60% | +6.17% | +7.00% |
simd_query_batch_dot_product/pair_loop/128d_16c |
+4.01% | +3.71% | +4.24% |
simd_query_batch_dot_product/pair_loop/128d_256c |
+4.83% | +4.40% | +5.31% |
simd_query_batch_dot_product/pair_loop/128d_4c |
+8.55% | +8.19% | +8.88% |
simd_query_batch_dot_product/pair_loop/128d_64c |
+5.28% | +4.99% | +5.48% |
simd_query_batch_dot_product/pair_loop/384d_16c |
+13.77% | +13.46% | +14.11% |
simd_query_batch_dot_product/pair_loop/384d_256c |
+4.51% | +4.12% | +4.81% |
simd_query_batch_dot_product/pair_loop/384d_4c |
+11.11% | +10.77% | +11.47% |
simd_query_batch_dot_product/pair_loop/384d_64c |
+11.14% | +10.93% | +11.30% |
simd_query_batch_dot_product/pair_loop/768d_16c |
-6.94% | -7.01% | -6.86% |
simd_query_batch_dot_product/pair_loop/768d_256c |
-17.91% | -18.21% | -17.66% |
simd_query_batch_dot_product/pair_loop/768d_4c |
-5.74% | -5.99% | -5.51% |
simd_query_batch_dot_product/pair_loop/768d_64c |
-6.13% | -6.38% | -5.93% |
simd_query_batch_dot_product/simd_batch/128d_16c |
-5.97% | -6.33% | -5.68% |
simd_query_batch_dot_product/simd_batch/128d_256c |
+1.23% | +0.94% | +1.42% |
simd_query_batch_dot_product/simd_batch/128d_4c |
+7.02% | +6.69% | +7.25% |
simd_query_batch_dot_product/simd_batch/128d_64c |
-0.66% | -1.01% | -0.20% |
simd_query_batch_dot_product/simd_batch/384d_16c |
+4.21% | +4.00% | +4.49% |
simd_query_batch_dot_product/simd_batch/384d_256c |
-8.36% | -9.19% | -7.23% |
simd_query_batch_dot_product/simd_batch/384d_4c |
+2.86% | +2.59% | +3.21% |
simd_query_batch_dot_product/simd_batch/384d_64c |
+9.74% | +9.49% | +10.05% |
simd_query_batch_dot_product/simd_batch/768d_16c |
+5.84% | +5.67% | +6.07% |
simd_query_batch_dot_product/simd_batch/768d_256c |
-19.59% | -19.82% | -19.36% |
simd_query_batch_dot_product/simd_batch/768d_4c |
+1.51% | +1.20% | +1.72% |
simd_query_batch_dot_product/simd_batch/768d_64c |
+9.08% | +8.73% | +9.35% |
simd_squared_euclidean_fast_path/euclidean_full/1024 |
+6.82% | +6.59% | +6.99% |
simd_squared_euclidean_fast_path/euclidean_full/384 |
-1.59% | -1.79% | -1.40% |
simd_squared_euclidean_fast_path/euclidean_full/768 |
+5.12% | +4.89% | +5.34% |
simd_squared_euclidean_fast_path/squared_euclidean/1024 |
+7.69% | +7.06% | +8.55% |
simd_squared_euclidean_fast_path/squared_euclidean/384 |
-2.57% | -2.82% | -2.29% |
simd_squared_euclidean_fast_path/squared_euclidean/768 |
+4.99% | +4.25% | +5.64% |
simd_throughput_384/cosine_similarity |
-0.06% | -0.31% | +0.16% |
simd_throughput_384/dot_product |
-3.87% | -4.12% | -3.64% |
simd_throughput_384/euclidean_distance |
-1.46% | -1.64% | -1.25% |
simd_throughput_384/normalize |
+8.17% | +7.99% | +8.37% |
softmax_attention/128 |
+8.82% | +8.69% | +8.93% |
softmax_attention/512 |
+11.44% | +11.24% | +11.61% |
tier_prepared_query/binary_query_once_1000 |
-0.30% | -0.68% | +0.11% |
tier_prepared_query/binary_query_per_call_1000 |
+2.69% | +2.02% | +3.63% |
tier_prepared_query/int4_query_once_1000 |
+4.81% | +4.40% | +5.09% |
tier_prepared_query/int4_query_per_call_1000 |
+4.13% | +3.85% | +4.43% |
tier_prepared_query/int8_query_once_1000 |
-5.09% | -5.63% | -4.33% |
tier_prepared_query/int8_query_per_call_1000 |
+6.42% | +6.33% | +6.50% |
Rule: CI-lower of change ≤3.0% passes silently; (3.0%, 7.0%] warns; >7.0% fails. Override via PR label bench-allow-regression.
Gate is in advisory mode (Rollout step 3, ADR-058 §Rollout). Failures do not block merge for the first 7 days.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Fix
Single cfg attribute change in `crates/inference/src/forward/cpu/tiled_avx2.rs`:
```diff
-#[cfg(target_arch = "x86_64")]
+#[cfg(all(target_arch = "x86_64", not(target_os = "macos")))]
```
Three sites: the two `use` imports and the function gate. Matches the existing pattern in `tiled_neon.rs` (line 13, 16). The AVX2 microkernel is dead on macOS regardless — `matmul_bt_tiled` (the only caller) is itself gated `#[cfg(not(target_os = "macos"))]` because Accelerate is selected on macOS.
Pure cfg-fix; no SIMD logic, semantics, or numerics changed. No behavioral change on any platform that compiled v0.2.3 successfully.
Why This Wasn't Caught
CI runs on aarch64-macos (`macos-latest`). `target_arch = "x86_64"` is false there, so the broken imports never get evaluated. The bug only surfaces when:
Both happen in downstream CI matrices (e.g. khive monorepo).
Test plan
Note on v0.2.3
v0.2.3 will be yanked after v0.2.4 publishes. Existing pinned users get a yank warning on next `cargo update`; new `cargo add` users go directly to 0.2.4. GitHub tag stays for history.
🤖 Generated with Claude Code