Skip to content

release: v0.2.4 — fix AVX2 cfg build break on macOS x86_64#100

Merged
ohdearquant merged 1 commit into
mainfrom
fix/avx2-cfg-macos-0.2.4
May 25, 2026
Merged

release: v0.2.4 — fix AVX2 cfg build break on macOS x86_64#100
ohdearquant merged 1 commit into
mainfrom
fix/avx2-cfg-macos-0.2.4

Conversation

@ohdearquant
Copy link
Copy Markdown
Owner

Summary

  • Build fix: v0.2.3 fails to compile on macOS x86_64 (Intel Mac, GitHub Actions `macos-13` runners, cross-compile from aarch64-macos) — `tiled_avx2.rs` imports `TILE_I/J/K` constants gated out by `#[cfg(not(target_os = "macos"))]` in `tiled.rs`.
  • Bump-and-yank: Bumps workspace + 3 inter-crate path-dep refs to 0.2.4 per the recipe in CLAUDE.md. After merge: tag, publish 5 crates, yank 0.2.3.
  • Surfaced by: `khive-storage` pulling `lattice-inference 0.2.3` from crates.io and hitting `E0432: unresolved imports super::tiled::TILE_I, TILE_J, TILE_K`.

The Fix

Single cfg attribute change in `crates/inference/src/forward/cpu/tiled_avx2.rs`:

```diff
-#[cfg(target_arch = "x86_64")]
+#[cfg(all(target_arch = "x86_64", not(target_os = "macos")))]
```

Three sites: the two `use` imports and the function gate. Matches the existing pattern in `tiled_neon.rs` (line 13, 16). The AVX2 microkernel is dead on macOS regardless — `matmul_bt_tiled` (the only caller) is itself gated `#[cfg(not(target_os = "macos"))]` because Accelerate is selected on macOS.

Pure cfg-fix; no SIMD logic, semantics, or numerics changed. No behavioral change on any platform that compiled v0.2.3 successfully.

Why This Wasn't Caught

CI runs on aarch64-macos (`macos-latest`). `target_arch = "x86_64"` is false there, so the broken imports never get evaluated. The bug only surfaces when:

  1. Building on macOS x86_64 (Intel Mac, `macos-13` runner), or
  2. Cross-compiling `x86_64-apple-darwin` from aarch64-macos.

Both happen in downstream CI matrices (e.g. khive monorepo).

Test plan

  • `cargo fmt --all -- --check` clean
  • `cargo clippy --workspace -- -D warnings` clean
  • `cargo check --workspace` clean
  • `cargo test --workspace --lib --no-run` clean
  • CI green on PR (aarch64-macos default runner — proves no regression on the working path)
  • After merge: tag v0.2.4, gh release, `make publish` (5 crates)
  • After publish: `cargo yank --version 0.2.3` × 5 crates

Note on v0.2.3

v0.2.3 will be yanked after v0.2.4 publishes. Existing pinned users get a yank warning on next `cargo update`; new `cargo add` users go directly to 0.2.4. GitHub tag stays for history.

🤖 Generated with Claude Code

v0.2.3 fails to compile on macOS x86_64 (Intel Mac runners, cross-compile
from aarch64-macos) because tiled_avx2.rs imports TILE_I/J/K from tiled.rs,
but those constants are gated `#[cfg(not(target_os = "macos"))]`. The AVX2
import gate was only `#[cfg(target_arch = "x86_64")]` — on macOS x86_64
both conditions hold and the import unresolves.

Fix: align tiled_avx2.rs cfg with tiled_neon.rs pattern:
  #[cfg(all(target_arch = "x86_64", not(target_os = "macos")))]

The AVX2 microkernel is dead on macOS regardless — matmul_bt_tiled (the
only caller) is already gated `#[cfg(not(target_os = "macos"))]` because
Accelerate is selected for macOS. Pure cfg-fix, no SIMD logic touched.

Bumps workspace + 3 inter-crate path-dep refs to 0.2.4. v0.2.3 to be
yanked after this lands per the bump-and-yank recipe in CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ohdearquant ohdearquant merged commit aa111bd into main May 25, 2026
3 of 5 checks passed
@ohdearquant ohdearquant deleted the fix/avx2-cfg-macos-0.2.4 branch May 25, 2026 05:02
@github-actions
Copy link
Copy Markdown

Perf regression report (ADR-058)

aarch64-linux — perf regression report

❌ 1 FAIL (regression >7.0% confirmed by 95% CI)
⚠ 7 WARN (regression 3.0-7.0% confirmed)
🚀 5 confirmed improvement

Bench Δ point 95% CI new ns base ns verdict
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c +8.35% [+8.25%, +8.46%] 1031.8 952.2 ❌ FAIL
simd_dot_product/simd/1536 +5.20% [+5.13%, +5.29%] 106.9 101.6 ⚠ WARN
int8_batch_cosine/int8_loop/1000 +4.65% [+4.35%, +4.93%] 20099.9 19206.5 ⚠ WARN
simd_query_batch_dot_product/simd_batch/768d_16c +3.90% [+3.85%, +3.95%] 737.2 709.5 ⚠ WARN
tier_prepared_query/int8_query_once_1000 +3.81% [+3.77%, +3.85%] 18576.6 17895.6 ⚠ WARN
simd_normalize/simd/768 +5.59% [+3.52%, +7.85%] 123.1 116.5 ⚠ WARN
simd_batch_cosine_normalized_query/simd_batch/1024d_256c +3.68% [+3.48%, +3.88%] 36741.6 35437.9 ⚠ WARN
simd_query_batch_dot_product/pair_loop/384d_64c +3.24% [+3.23%, +3.25%] 2507.0 2428.4 ⚠ WARN
simd_normalized_cosine_fast_path/dot_product/768 -3.96% [-4.23%, -3.66%] 56.0 58.3 🚀 WIN
simd_dot_product/simd/1024 -4.34% [-4.40%, -4.28%] 70.3 73.4 🚀 WIN
simd_batch_cosine/simd_batch/1000 -6.51% [-6.57%, -6.45%] 80338.7 85928.5 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_256c -6.95% [-7.04%, -6.86%] 19847.6 21329.0 🚀 WIN
simd_batch_dot_product/simd_batch/1000 -14.51% [-14.58%, -14.44%] 75708.6 88556.6 🚀 WIN
All 247 measurements
Bench Δ point CI-lower CI-upper
add_bias_gelu/4096 +0.01% -0.00% +0.02%
add_bias_gelu/896 -0.01% -0.03% +0.01%
binary_cosine_distance/binary/1024 +1.07% +1.06% +1.08%
binary_cosine_distance/binary/1536 -0.04% -0.05% -0.02%
binary_cosine_distance/binary/384 -0.19% -0.21% -0.16%
binary_cosine_distance/binary/768 -0.21% -0.22% -0.19%
binary_cosine_distance/float32_simd/1024 +0.03% -0.00% +0.08%
binary_cosine_distance/float32_simd/1536 -0.04% -0.05% -0.02%
binary_cosine_distance/float32_simd/384 +0.13% +0.09% +0.18%
binary_cosine_distance/float32_simd/768 +0.19% +0.18% +0.21%
elementwise_mul/4096 -2.25% -2.30% -2.19%
gelu/4096 -0.00% -0.03% +0.02%
gelu/896 +0.00% -0.03% +0.02%
int4_cosine_distance/float32_simd/1024 -0.04% -0.08% -0.01%
int4_cosine_distance/float32_simd/1536 +0.09% +0.06% +0.14%
int4_cosine_distance/float32_simd/384 -0.53% -0.55% -0.51%
int4_cosine_distance/float32_simd/768 +0.14% +0.12% +0.15%
int4_cosine_distance/int4/1024 -0.39% -0.47% -0.32%
int4_cosine_distance/int4/1536 -0.35% -0.37% -0.34%
int4_cosine_distance/int4/384 +0.00% -0.04% +0.05%
int4_cosine_distance/int4/768 +0.22% +0.20% +0.25%
int8_batch_cosine/float32_simd/10 -0.02% -0.04% -0.01%
int8_batch_cosine/float32_simd/100 +0.57% +0.56% +0.58%
int8_batch_cosine/float32_simd/1000 +2.49% +2.43% +2.56%
int8_batch_cosine/int8_loop/10 +0.83% +0.79% +0.88%
int8_batch_cosine/int8_loop/100 +0.44% +0.41% +0.47%
int8_batch_cosine/int8_loop/1000 +4.65% +4.35% +4.93%
int8_prepared_dot_product/per_call/1024 -0.02% -0.03% -0.00%
int8_prepared_dot_product/per_call/127 -0.08% -0.10% -0.07%
int8_prepared_dot_product/per_call/128 +0.02% +0.02% +0.03%
int8_prepared_dot_product/per_call/129 +0.06% +0.05% +0.07%
int8_prepared_dot_product/per_call/384 +0.00% -0.01% +0.01%
int8_prepared_dot_product/per_call/768 -0.00% -0.01% +0.01%
int8_prepared_dot_product/prepared/1024 -0.60% -0.63% -0.58%
int8_prepared_dot_product/prepared/127 +0.78% +0.76% +0.79%
int8_prepared_dot_product/prepared/128 +0.49% +0.45% +0.54%
int8_prepared_dot_product/prepared/129 +0.53% +0.50% +0.56%
int8_prepared_dot_product/prepared/384 -0.44% -0.48% -0.40%
int8_prepared_dot_product/prepared/768 +1.38% +1.30% +1.46%
int8_quantization/quantize/1024 +0.00% -0.01% +0.01%
int8_quantization/quantize/1536 -0.22% -0.23% -0.21%
int8_quantization/quantize/384 +0.01% -0.00% +0.02%
int8_quantization/quantize/768 +0.00% -0.01% +0.01%
int8_raw_dot_product/dot_product_i8/1024 +0.98% +0.95% +1.01%
int8_raw_dot_product/dot_product_i8/127 +0.57% +0.56% +0.59%
int8_raw_dot_product/dot_product_i8/128 -0.79% -0.89% -0.67%
int8_raw_dot_product/dot_product_i8/129 +0.72% +0.69% +0.76%
int8_raw_dot_product/dot_product_i8/384 -0.44% -0.45% -0.42%
int8_raw_dot_product/dot_product_i8/768 -0.89% -0.94% -0.84%
int8_raw_dot_product/dot_product_i8_raw/1024 +0.03% +0.01% +0.05%
int8_raw_dot_product/dot_product_i8_raw/127 -0.09% -0.12% -0.06%
int8_raw_dot_product/dot_product_i8_raw/128 +0.18% +0.15% +0.20%
int8_raw_dot_product/dot_product_i8_raw/129 +0.29% +0.23% +0.36%
int8_raw_dot_product/dot_product_i8_raw/384 +0.04% +0.02% +0.07%
int8_raw_dot_product/dot_product_i8_raw/768 +0.25% +0.23% +0.26%
int8_vs_float32_cosine/float32_simd/1024 -0.10% -0.11% -0.08%
int8_vs_float32_cosine/float32_simd/1536 -0.04% -0.06% -0.00%
int8_vs_float32_cosine/float32_simd/384 -0.43% -0.54% -0.31%
int8_vs_float32_cosine/float32_simd/768 +0.42% +0.35% +0.48%
int8_vs_float32_cosine/int8/1024 +0.55% +0.51% +0.59%
int8_vs_float32_cosine/int8/1536 -0.27% -0.31% -0.23%
int8_vs_float32_cosine/int8/384 -1.34% -1.50% -1.19%
int8_vs_float32_cosine/int8/768 -1.23% -1.32% -1.14%
layer_norm/4096 -0.31% -0.33% -0.29%
layer_norm/896 +0.01% -0.02% +0.03%
memory_size/search_1000_float32 +0.33% +0.31% +0.36%
memory_size/search_1000_int8 -0.98% -1.05% -0.91%
rms_norm/4096 +0.79% +0.76% +0.82%
rms_norm/896 -0.14% -0.18% -0.10%
silu_inplace/4096 +0.02% -0.00% +0.05%
silu_inplace/896 +0.01% -0.02% +0.05%
simd_batch_cosine/scalar_loop/10 -0.00% -0.01% +0.01%
simd_batch_cosine/scalar_loop/100 -0.08% -0.10% -0.06%
simd_batch_cosine/scalar_loop/1000 -0.42% -0.46% -0.38%
simd_batch_cosine/simd_batch/10 -0.05% -0.10% -0.02%
simd_batch_cosine/simd_batch/100 +0.45% +0.43% +0.46%
simd_batch_cosine/simd_batch/1000 -6.51% -6.57% -6.45%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c -0.12% -0.15% -0.08%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c -1.16% -1.17% -1.14%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c +0.84% +0.62% +1.06%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c +0.00% -0.01% +0.01%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c -0.12% -0.16% -0.10%
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c +0.44% +0.38% +0.49%
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c +0.65% +0.63% +0.68%
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c -2.21% -2.25% -2.18%
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c -0.08% -0.09% -0.07%
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c -0.15% -0.16% -0.14%
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c +0.36% +0.30% +0.43%
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c -0.98% -0.98% -0.97%
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c -1.05% -1.16% -0.94%
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c -0.05% -0.07% -0.04%
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c +0.06% +0.04% +0.07%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c -0.05% -0.11% +0.05%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c -1.15% -1.16% -1.14%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c -2.33% -2.50% -2.15%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c +0.20% +0.19% +0.22%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c -0.11% -0.13% -0.10%
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c +0.13% +0.10% +0.16%
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c +0.37% +0.34% +0.40%
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c -2.40% -2.43% -2.36%
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c +0.03% +0.02% +0.04%
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c +0.04% +0.02% +0.05%
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c +0.18% +0.12% +0.26%
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c +0.32% +0.30% +0.33%
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -1.96% -2.06% -1.87%
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c +0.04% +0.03% +0.04%
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c -0.02% -0.07% +0.02%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c +0.03% -0.00% +0.07%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c -0.55% -0.56% -0.54%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c +0.48% +0.31% +0.66%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c -0.03% -0.05% -0.01%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c -0.11% -0.13% -0.10%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c +0.54% +0.51% +0.57%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c +0.18% +0.17% +0.19%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c +0.14% +0.13% +0.16%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c +0.04% +0.02% +0.06%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c +0.78% +0.77% +0.79%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c -0.85% -0.89% -0.81%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c +2.22% +2.21% +2.24%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c -0.52% -0.62% -0.42%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c +0.42% +0.39% +0.45%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c +0.49% +0.48% +0.50%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c +0.44% +0.35% +0.54%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c -1.03% -1.08% -0.99%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c +1.57% +1.19% +1.93%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c +0.19% +0.16% +0.23%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c +1.04% +0.96% +1.17%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c +2.13% +2.09% +2.17%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c +1.82% +1.78% +1.85%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c +0.43% +0.41% +0.44%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c +0.13% +0.08% +0.18%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c +2.25% +2.24% +2.26%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c +1.70% +1.65% +1.75%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c +8.35% +8.25% +8.46%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c -1.70% -1.83% -1.58%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c +1.49% +1.42% +1.55%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c +1.21% +1.20% +1.23%
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c +0.10% +0.07% +0.14%
simd_batch_cosine_normalized_query/simd_batch/1024d_16c -0.43% -0.44% -0.41%
simd_batch_cosine_normalized_query/simd_batch/1024d_256c +3.68% +3.48% +3.88%
simd_batch_cosine_normalized_query/simd_batch/1024d_4c +0.06% +0.05% +0.07%
simd_batch_cosine_normalized_query/simd_batch/1024d_64c -0.10% -0.12% -0.09%
simd_batch_cosine_normalized_query/simd_batch/384d_1000c +0.51% +0.49% +0.54%
simd_batch_cosine_normalized_query/simd_batch/384d_16c +0.22% +0.21% +0.23%
simd_batch_cosine_normalized_query/simd_batch/384d_256c +0.22% +0.19% +0.24%
simd_batch_cosine_normalized_query/simd_batch/384d_4c +0.14% +0.13% +0.15%
simd_batch_cosine_normalized_query/simd_batch/384d_64c +0.96% +0.95% +0.97%
simd_batch_cosine_normalized_query/simd_batch/768d_1000c -0.74% -0.79% -0.69%
simd_batch_cosine_normalized_query/simd_batch/768d_16c +1.40% +1.38% +1.42%
simd_batch_cosine_normalized_query/simd_batch/768d_256c -1.51% -1.62% -1.40%
simd_batch_cosine_normalized_query/simd_batch/768d_4c +0.43% +0.42% +0.44%
simd_batch_cosine_normalized_query/simd_batch/768d_64c +0.48% +0.47% +0.49%
simd_batch_dot_product/scalar_loop/10 -0.26% -0.28% -0.24%
simd_batch_dot_product/scalar_loop/100 +0.38% +0.35% +0.41%
simd_batch_dot_product/scalar_loop/1000 -0.54% -0.60% -0.50%
simd_batch_dot_product/simd_batch/10 -1.73% -1.82% -1.64%
simd_batch_dot_product/simd_batch/100 +0.11% +0.09% +0.12%
simd_batch_dot_product/simd_batch/1000 -14.51% -14.58% -14.44%
simd_cosine_similarity/scalar/1024 -0.01% -0.03% +0.01%
simd_cosine_similarity/scalar/1536 -0.04% -0.05% -0.02%
simd_cosine_similarity/scalar/384 -0.22% -0.26% -0.20%
simd_cosine_similarity/scalar/768 +0.05% +0.04% +0.07%
simd_cosine_similarity/simd/1024 +0.00% -0.01% +0.01%
simd_cosine_similarity/simd/1536 -0.11% -0.13% -0.09%
simd_cosine_similarity/simd/384 +0.12% +0.00% +0.24%
simd_cosine_similarity/simd/768 +0.37% +0.34% +0.39%
simd_dot_product/scalar/1024 -0.00% -0.03% +0.02%
simd_dot_product/scalar/1536 -0.03% -0.04% -0.01%
simd_dot_product/scalar/384 +0.01% -0.01% +0.03%
simd_dot_product/scalar/768 +0.02% +0.01% +0.03%
simd_dot_product/simd/1024 -4.34% -4.40% -4.28%
simd_dot_product/simd/1536 +5.20% +5.13% +5.29%
simd_dot_product/simd/384 -0.22% -0.28% -0.16%
simd_dot_product/simd/768 +0.01% -0.11% +0.14%
simd_euclidean_distance/scalar/1024 +1.04% +1.00% +1.07%
simd_euclidean_distance/scalar/1536 +0.14% +0.11% +0.18%
simd_euclidean_distance/scalar/384 +0.47% +0.43% +0.52%
simd_euclidean_distance/scalar/768 +0.34% +0.31% +0.38%
simd_euclidean_distance/simd/1024 +0.13% +0.11% +0.16%
simd_euclidean_distance/simd/1536 -0.11% -0.12% -0.10%
simd_euclidean_distance/simd/384 -0.67% -0.70% -0.64%
simd_euclidean_distance/simd/768 -0.42% -0.45% -0.40%
simd_normalize/scalar/1024 +0.45% +0.16% +0.80%
simd_normalize/scalar/1536 +0.13% -0.05% +0.33%
simd_normalize/scalar/384 -0.05% -0.43% +0.33%
simd_normalize/scalar/768 +0.30% +0.10% +0.50%
simd_normalize/simd/1024 +1.55% +0.59% +2.58%
simd_normalize/simd/1536 +1.46% +0.66% +2.27%
simd_normalize/simd/384 +2.75% +0.82% +4.65%
simd_normalize/simd/768 +5.59% +3.52% +7.85%
simd_normalized_cosine_fast_path/cosine_full/1024 +0.38% +0.34% +0.44%
simd_normalized_cosine_fast_path/cosine_full/384 +0.46% +0.37% +0.55%
simd_normalized_cosine_fast_path/cosine_full/768 +0.42% +0.37% +0.46%
simd_normalized_cosine_fast_path/dot_product/1024 -2.89% -3.06% -2.72%
simd_normalized_cosine_fast_path/dot_product/384 -1.12% -1.23% -1.01%
simd_normalized_cosine_fast_path/dot_product/768 -3.96% -4.23% -3.66%
simd_prepared_query_normalized_cosine/dot_product_loop/1024 +0.07% -0.02% +0.15%
simd_prepared_query_normalized_cosine/dot_product_loop/384 +0.71% +0.66% +0.75%
simd_prepared_query_normalized_cosine/dot_product_loop/768 +2.17% +1.68% +2.81%
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 +0.13% +0.08% +0.18%
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 +0.63% +0.61% +0.65%
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 -0.31% -0.36% -0.27%
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 -1.38% -1.52% -1.25%
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 +0.78% +0.73% +0.84%
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 +0.01% -0.04% +0.06%
simd_query_batch_dot_product/pair_loop/128d_16c -0.48% -0.53% -0.44%
simd_query_batch_dot_product/pair_loop/128d_256c +1.72% +1.57% +1.84%
simd_query_batch_dot_product/pair_loop/128d_4c +0.38% +0.29% +0.47%
simd_query_batch_dot_product/pair_loop/128d_64c -0.41% -0.48% -0.35%
simd_query_batch_dot_product/pair_loop/384d_16c +2.31% +2.29% +2.33%
simd_query_batch_dot_product/pair_loop/384d_256c -0.36% -0.41% -0.31%
simd_query_batch_dot_product/pair_loop/384d_4c +0.91% +0.81% +1.01%
simd_query_batch_dot_product/pair_loop/384d_64c +3.24% +3.23% +3.25%
simd_query_batch_dot_product/pair_loop/768d_16c +0.57% +0.51% +0.64%
simd_query_batch_dot_product/pair_loop/768d_256c -6.95% -7.04% -6.86%
simd_query_batch_dot_product/pair_loop/768d_4c -0.62% -0.74% -0.52%
simd_query_batch_dot_product/pair_loop/768d_64c +0.88% +0.86% +0.90%
simd_query_batch_dot_product/simd_batch/128d_16c -0.98% -1.02% -0.94%
simd_query_batch_dot_product/simd_batch/128d_256c +0.83% +0.79% +0.88%
simd_query_batch_dot_product/simd_batch/128d_4c +0.29% +0.24% +0.34%
simd_query_batch_dot_product/simd_batch/128d_64c +1.23% +1.16% +1.29%
simd_query_batch_dot_product/simd_batch/384d_16c +1.58% +1.56% +1.59%
simd_query_batch_dot_product/simd_batch/384d_256c +0.48% +0.46% +0.50%
simd_query_batch_dot_product/simd_batch/384d_4c +0.46% +0.43% +0.48%
simd_query_batch_dot_product/simd_batch/384d_64c +1.28% +1.27% +1.29%
simd_query_batch_dot_product/simd_batch/768d_16c +3.90% +3.85% +3.95%
simd_query_batch_dot_product/simd_batch/768d_256c -0.47% -0.57% -0.36%
simd_query_batch_dot_product/simd_batch/768d_4c -0.01% -0.03% +0.02%
simd_query_batch_dot_product/simd_batch/768d_64c +1.49% +1.45% +1.52%
simd_squared_euclidean_fast_path/euclidean_full/1024 -0.17% -0.20% -0.12%
simd_squared_euclidean_fast_path/euclidean_full/384 -1.05% -1.12% -1.01%
simd_squared_euclidean_fast_path/euclidean_full/768 -0.74% -0.76% -0.73%
simd_squared_euclidean_fast_path/squared_euclidean/1024 -0.35% -0.37% -0.33%
simd_squared_euclidean_fast_path/squared_euclidean/384 +0.07% +0.04% +0.11%
simd_squared_euclidean_fast_path/squared_euclidean/768 -0.18% -0.21% -0.16%
simd_throughput_384/cosine_similarity +0.88% +0.73% +1.04%
simd_throughput_384/dot_product -0.21% -0.38% -0.05%
simd_throughput_384/euclidean_distance -1.00% -1.02% -0.97%
simd_throughput_384/normalize -1.73% -1.74% -1.71%
softmax_attention/128 +0.08% +0.06% +0.10%
softmax_attention/512 +0.84% +0.64% +1.03%
tier_prepared_query/binary_query_once_1000 -0.01% -0.06% +0.04%
tier_prepared_query/binary_query_per_call_1000 +0.00% -0.01% +0.01%
tier_prepared_query/int4_query_once_1000 +0.85% +0.79% +0.93%
tier_prepared_query/int4_query_per_call_1000 +0.39% +0.38% +0.40%
tier_prepared_query/int8_query_once_1000 +3.81% +3.77% +3.85%
tier_prepared_query/int8_query_per_call_1000 +0.01% -0.01% +0.02%

Rule: CI-lower of change ≤3.0% passes silently; (3.0%, 7.0%] warns; >7.0% fails. Override via PR label bench-allow-regression.

x86_64-linux — perf regression report

❌ 61 FAIL (regression >7.0% confirmed by 95% CI)
⚠ 46 WARN (regression 3.0-7.0% confirmed)
🚀 78 confirmed improvement

Bench Δ point 95% CI new ns base ns verdict
int8_quantization/quantize/1536 +41.21% [+40.70%, +41.62%] 11367.8 8050.2 ❌ FAIL
int8_quantization/quantize/384 +40.20% [+39.89%, +40.56%] 2830.2 2018.6 ❌ FAIL
int8_quantization/quantize/768 +29.50% [+29.30%, +29.67%] 5671.6 4379.6 ❌ FAIL
int8_quantization/quantize/1024 +29.53% [+29.25%, +29.84%] 7559.4 5836.1 ❌ FAIL
layer_norm/4096 +26.91% [+26.34%, +27.63%] 941.2 741.6 ❌ FAIL
simd_dot_product/simd/1024 +23.49% [+23.15%, +23.75%] 79.0 64.0 ❌ FAIL
simd_normalized_cosine_fast_path/dot_product/1024 +22.94% [+22.78%, +23.13%] 78.7 64.0 ❌ FAIL
simd_normalized_cosine_fast_path/dot_product/768 +22.70% [+22.49%, +22.94%] 60.5 49.3 ❌ FAIL
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c +18.35% [+18.20%, +18.56%] 486.3 410.9 ❌ FAIL
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 +17.39% [+16.89%, +17.74%] 77912.6 66369.0 ❌ FAIL
simd_query_batch_dot_product/pair_loop/384d_16c +13.77% [+13.46%, +14.11%] 475.6 418.0 ❌ FAIL
simd_batch_dot_product/simd_batch/10 +13.37% [+13.19%, +13.63%] 320.7 282.9 ❌ FAIL
simd_dot_product/scalar/1536 +12.76% [+12.41%, +13.21%] 1593.9 1413.6 ❌ FAIL
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c +12.58% [+12.35%, +12.78%] 1939.5 1722.8 ❌ FAIL
simd_normalize/scalar/768 +12.64% [+12.35%, +12.99%] 888.3 788.6 ❌ FAIL
simd_euclidean_distance/scalar/1536 +12.31% [+12.26%, +12.36%] 1594.5 1419.7 ❌ FAIL
simd_euclidean_distance/scalar/1024 +12.29% [+12.09%, +12.58%] 1057.4 941.6 ❌ FAIL
simd_normalize/scalar/1536 +12.15% [+11.86%, +12.48%] 1768.4 1576.8 ❌ FAIL
simd_normalize/scalar/384 +12.23% [+11.86%, +12.61%] 444.8 396.3 ❌ FAIL
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c +12.01% [+11.81%, +12.20%] 127.6 113.9 ❌ FAIL
simd_dot_product/scalar/1024 +11.93% [+11.81%, +12.15%] 1047.0 935.4 ❌ FAIL
simd_euclidean_distance/scalar/768 +11.74% [+11.65%, +11.82%] 785.6 703.0 ❌ FAIL
gelu/4096 +12.18% [+11.57%, +12.88%] 1836.2 1636.9 ❌ FAIL
simd_cosine_similarity/scalar/1536 +11.76% [+11.51%, +11.91%] 4731.5 4233.5 ❌ FAIL
rms_norm/896 +12.08% [+11.47%, +12.49%] 233.2 208.1 ❌ FAIL
simd_normalize/scalar/1024 +11.71% [+11.46%, +11.95%] 1179.0 1055.4 ❌ FAIL
gelu/896 +11.61% [+11.40%, +11.83%] 399.1 357.6 ❌ FAIL
softmax_attention/512 +11.44% [+11.24%, +11.61%] 75600.2 67838.2 ❌ FAIL
simd_cosine_similarity/scalar/1024 +11.30% [+11.23%, +11.37%] 3112.1 2796.0 ❌ FAIL
simd_query_batch_dot_product/pair_loop/384d_64c +11.14% [+10.93%, +11.30%] 1915.4 1723.4 ❌ FAIL
simd_query_batch_dot_product/pair_loop/384d_4c +11.11% [+10.77%, +11.47%] 128.8 115.9 ❌ FAIL
simd_euclidean_distance/scalar/384 +10.74% [+10.68%, +10.81%] 380.8 343.9 ❌ FAIL
simd_cosine_similarity/scalar/768 +10.92% [+10.55%, +11.36%] 2308.1 2080.9 ❌ FAIL
silu_inplace/896 +10.89% [+10.53%, +11.23%] 3007.7 2712.3 ❌ FAIL
simd_dot_product/scalar/768 +10.78% [+9.98%, +11.32%] 775.1 699.7 ❌ FAIL
simd_normalized_cosine_fast_path/cosine_full/1024 +10.09% [+9.83%, +10.31%] 90.8 82.5 ❌ FAIL
simd_batch_dot_product/scalar_loop/1000 +10.11% [+9.78%, +10.48%] 371624.9 337512.0 ❌ FAIL
simd_dot_product/scalar/384 +9.81% [+9.64%, +10.04%] 370.7 337.6 ❌ FAIL
simd_batch_dot_product/scalar_loop/100 +9.62% [+9.54%, +9.70%] 36190.7 33015.8 ❌ FAIL
simd_query_batch_dot_product/simd_batch/384d_64c +9.74% [+9.49%, +10.05%] 1237.3 1127.4 ❌ FAIL
rms_norm/4096 +10.07% [+9.46%, +10.57%] 871.2 791.5 ❌ FAIL
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c +9.57% [+9.44%, +9.68%] 4143.1 3781.3 ❌ FAIL
simd_batch_dot_product/scalar_loop/10 +9.75% [+9.39%, +9.99%] 3617.4 3296.1 ❌ FAIL
simd_cosine_similarity/scalar/384 +9.32% [+9.25%, +9.39%] 1088.6 995.8 ❌ FAIL
silu_inplace/4096 +10.16% [+9.23%, +10.84%] 13742.4 12474.7 ❌ FAIL
simd_euclidean_distance/simd/1536 +9.31% [+9.10%, +9.47%] 102.8 94.1 ❌ FAIL
simd_batch_cosine/scalar_loop/100 +8.95% [+8.87%, +9.05%] 107378.3 98553.8 ❌ FAIL
simd_batch_cosine/scalar_loop/10 +8.93% [+8.80%, +9.10%] 10746.7 9866.0 ❌ FAIL
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c +9.18% [+8.79%, +9.67%] 74317.3 68067.9 ❌ FAIL
simd_cosine_similarity/simd/768 +8.95% [+8.75%, +9.22%] 72.4 66.5 ❌ FAIL
simd_query_batch_dot_product/simd_batch/768d_64c +9.08% [+8.73%, +9.35%] 2380.6 2182.4 ❌ FAIL
simd_batch_cosine/scalar_loop/1000 +8.79% [+8.73%, +8.84%] 1076858.4 989878.0 ❌ FAIL
softmax_attention/128 +8.82% [+8.69%, +8.93%] 4794.5 4405.9 ❌ FAIL
add_bias_gelu/896 +8.69% [+8.41%, +8.97%] 417.1 383.8 ❌ FAIL
int8_vs_float32_cosine/float32_simd/1536 +8.75% [+8.31%, +9.28%] 119.9 110.2 ❌ FAIL
simd_normalized_cosine_fast_path/cosine_full/768 +8.49% [+8.23%, +8.80%] 72.5 66.8 ❌ FAIL
simd_query_batch_dot_product/pair_loop/128d_4c +8.55% [+8.19%, +8.88%] 69.7 64.2 ❌ FAIL
add_bias_gelu/4096 +8.23% [+8.02%, +8.40%] 1914.8 1769.3 ❌ FAIL
simd_throughput_384/normalize +8.17% [+7.99%, +8.37%] 114.4 105.8 ❌ FAIL
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 +7.59% [+7.15%, +8.02%] 30758.9 28588.6 ❌ FAIL
simd_squared_euclidean_fast_path/squared_euclidean/1024 +7.69% [+7.06%, +8.55%] 67.3 62.5 ❌ FAIL
simd_query_batch_dot_product/simd_batch/128d_4c +7.02% [+6.69%, +7.25%] 46.3 43.3 ⚠ WARN
simd_squared_euclidean_fast_path/euclidean_full/1024 +6.82% [+6.59%, +6.99%] 71.7 67.1 ⚠ WARN
tier_prepared_query/int8_query_per_call_1000 +6.42% [+6.33%, +6.50%] 2346822.0 2205337.5 ⚠ WARN
int8_prepared_dot_product/per_call/1024 +6.77% [+6.24%, +7.28%] 6243.9 5847.8 ⚠ WARN
int8_prepared_dot_product/per_call/768 +6.43% [+6.23%, +6.75%] 4674.9 4392.5 ⚠ WARN
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 +6.60% [+6.17%, +7.00%] 59515.4 55829.2 ⚠ WARN
int4_cosine_distance/float32_simd/1536 +6.65% [+6.14%, +7.31%] 121.5 113.9 ⚠ WARN
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c +6.26% [+6.06%, +6.50%] 31584.2 29723.4 ⚠ WARN
int8_prepared_dot_product/per_call/129 +6.14% [+5.92%, +6.45%] 790.1 744.4 ⚠ WARN
int8_prepared_dot_product/per_call/128 +6.10% [+5.74%, +6.40%] 787.5 742.2 ⚠ WARN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c +5.81% [+5.71%, +5.91%] 5415.5 5118.3 ⚠ WARN
simd_query_batch_dot_product/simd_batch/768d_16c +5.84% [+5.67%, +6.07%] 601.1 567.9 ⚠ WARN
int8_prepared_dot_product/per_call/384 +6.14% [+5.66%, +6.56%] 2341.9 2206.5 ⚠ WARN
binary_cosine_distance/float32_simd/1536 +5.83% [+5.56%, +6.07%] 120.6 114.0 ⚠ WARN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c +5.76% [+5.52%, +5.91%] 1026.2 970.3 ⚠ WARN
simd_cosine_similarity/simd/1536 +5.54% [+5.32%, +5.71%] 119.6 113.4 ⚠ WARN
int8_raw_dot_product/dot_product_i8_raw/127 +6.39% [+5.31%, +7.85%] 14.4 13.6 ⚠ WARN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c +5.46% [+5.12%, +5.69%] 1354.4 1284.3 ⚠ WARN
binary_cosine_distance/float32_simd/1024 +5.33% [+5.11%, +5.52%] 86.8 82.4 ⚠ WARN
int8_prepared_dot_product/per_call/127 +5.27% [+5.09%, +5.49%] 778.0 739.0 ⚠ WARN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c +5.29% [+5.04%, +5.51%] 5457.5 5183.4 ⚠ WARN
simd_query_batch_dot_product/pair_loop/128d_64c +5.28% [+4.99%, +5.48%] 848.7 806.1 ⚠ WARN
int8_vs_float32_cosine/int8/1536 +5.26% [+4.96%, +5.52%] 51.7 49.2 ⚠ WARN
simd_squared_euclidean_fast_path/euclidean_full/768 +5.12% [+4.89%, +5.34%] 56.8 54.0 ⚠ WARN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c +4.78% [+4.63%, +5.02%] 1364.1 1301.9 ⚠ WARN
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 +4.72% [+4.51%, +4.99%] 90966.0 86865.8 ⚠ WARN
int8_raw_dot_product/dot_product_i8_raw/384 +4.79% [+4.42%, +5.17%] 13.6 13.0 ⚠ WARN
simd_euclidean_distance/simd/768 +4.56% [+4.40%, +4.71%] 56.9 54.4 ⚠ WARN
simd_query_batch_dot_product/pair_loop/128d_256c +4.83% [+4.40%, +5.31%] 3297.6 3145.6 ⚠ WARN
tier_prepared_query/int4_query_once_1000 +4.81% [+4.40%, +5.09%] 1463454.3 1396312.4 ⚠ WARN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c +4.69% [+4.37%, +4.91%] 338.3 323.2 ⚠ WARN
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c +4.60% [+4.33%, +4.97%] 7626.8 7291.2 ⚠ WARN
int4_cosine_distance/float32_simd/1024 +4.64% [+4.25%, +5.11%] 87.2 83.3 ⚠ WARN
simd_squared_euclidean_fast_path/squared_euclidean/768 +4.99% [+4.25%, +5.64%] 52.0 49.5 ⚠ WARN
int8_vs_float32_cosine/float32_simd/1024 +4.29% [+4.15%, +4.44%] 85.6 82.0 ⚠ WARN
simd_query_batch_dot_product/pair_loop/384d_256c +4.51% [+4.12%, +4.81%] 7660.2 7329.8 ⚠ WARN
simd_query_batch_dot_product/simd_batch/384d_16c +4.21% [+4.00%, +4.49%] 258.3 247.9 ⚠ WARN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c +4.09% [+3.95%, +4.21%] 344.2 330.6 ⚠ WARN
tier_prepared_query/int4_query_per_call_1000 +4.13% [+3.85%, +4.43%] 3783864.7 3633651.2 ⚠ WARN
binary_cosine_distance/float32_simd/768 +4.00% [+3.83%, +4.15%] 69.9 67.3 ⚠ WARN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c +5.48% [+3.83%, +7.62%] 23253.8 22046.4 ⚠ WARN
simd_query_batch_dot_product/pair_loop/128d_16c +4.01% [+3.71%, +4.24%] 216.8 208.4 ⚠ WARN
int4_cosine_distance/float32_simd/768 +3.88% [+3.69%, +4.04%] 69.9 67.3 ⚠ WARN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c +3.62% [+3.35%, +3.85%] 89406.8 86284.7 ⚠ WARN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c +3.36% [+3.15%, +3.52%] 90435.0 87496.9 ⚠ WARN
int8_vs_float32_cosine/float32_simd/768 +3.20% [+3.02%, +3.41%] 68.9 66.8 ⚠ WARN
simd_batch_cosine_normalized_query/simd_batch/768d_64c -3.11% [-3.25%, -2.91%] 4142.1 4274.9 🚀 WIN
simd_normalized_cosine_fast_path/cosine_full/384 -3.09% [-3.30%, -2.84%] 44.0 45.4 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c -3.20% [-3.45%, -3.04%] 4182.1 4320.4 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c -3.37% [-3.54%, -3.23%] 165.1 170.8 🚀 WIN
int8_raw_dot_product/dot_product_i8/127 -3.31% [-3.61%, -2.92%] 16.4 17.0 🚀 WIN
int8_raw_dot_product/dot_product_i8_raw/768 -3.37% [-3.76%, -2.94%] 24.2 25.0 🚀 WIN
int8_batch_cosine/float32_simd/10 -3.25% [-3.80%, -2.62%] 411.9 425.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c -3.42% [-3.82%, -3.09%] 266.6 276.0 🚀 WIN
simd_throughput_384/dot_product -3.87% [-4.12%, -3.64%] 27.5 28.6 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c -4.40% [-4.54%, -4.25%] 624.7 653.5 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_16c -4.39% [-4.59%, -4.21%] 1037.2 1084.9 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/384d_16c -4.56% [-4.69%, -4.45%] 623.4 653.2 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_1000c -4.36% [-4.72%, -4.02%] 67962.1 71060.2 🚀 WIN
int8_prepared_dot_product/prepared/127 -4.37% [-4.81%, -3.92%] 16.1 16.9 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c -4.43% [-4.88%, -4.08%] 68848.4 72041.3 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c -4.74% [-4.95%, -4.49%] 40980.7 43021.0 🚀 WIN
memory_size/search_1000_int8 -4.81% [-5.06%, -4.49%] 15652.5 16442.6 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c -4.98% [-5.11%, -4.83%] 1047.4 1102.3 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c -4.87% [-5.14%, -4.56%] 56571.8 59466.9 🚀 WIN
simd_dot_product/simd/384 -4.97% [-5.20%, -4.74%] 27.2 28.7 🚀 WIN
int8_batch_cosine/int8_loop/10 -5.03% [-5.25%, -4.80%] 169.1 178.0 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c -4.79% [-5.25%, -4.31%] 169.3 177.8 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c -5.15% [-5.34%, -4.94%] 2534.0 2671.6 🚀 WIN
simd_batch_dot_product/simd_batch/100 -5.15% [-5.35%, -4.91%] 3612.5 3808.6 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c -5.25% [-5.43%, -5.04%] 2469.4 2606.2 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c -5.26% [-5.55%, -4.94%] 640.0 675.5 🚀 WIN
tier_prepared_query/int8_query_once_1000 -5.09% [-5.63%, -4.33%] 17353.5 18283.3 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/384d_64c -5.58% [-5.71%, -5.38%] 2471.6 2617.6 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c -5.30% [-5.72%, -4.93%] 640.4 676.2 🚀 WIN
int8_raw_dot_product/dot_product_i8_raw/129 -5.39% [-5.84%, -4.98%] 7.2 7.7 🚀 WIN
simd_batch_cosine/simd_batch/10 -5.62% [-5.96%, -5.21%] 403.4 427.4 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_4c -5.74% [-5.99%, -5.51%] 212.7 225.7 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c -5.87% [-6.02%, -5.74%] 2529.2 2686.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c -5.58% [-6.09%, -5.20%] 208.2 220.5 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c -6.02% [-6.13%, -5.91%] 167.8 178.5 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 -5.89% [-6.29%, -5.57%] 42131.1 44768.7 🚀 WIN
simd_query_batch_dot_product/simd_batch/128d_16c -5.97% [-6.33%, -5.68%] 122.4 130.1 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_64c -6.13% [-6.38%, -5.93%] 3271.3 3485.0 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c -6.33% [-6.54%, -6.07%] 16771.6 17904.7 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/384d_1000c -6.41% [-6.66%, -6.13%] 39334.0 42030.0 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c -6.36% [-6.72%, -6.07%] 823.7 879.7 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c -6.24% [-6.85%, -5.78%] 3262.9 3479.9 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/384d_256c -6.64% [-6.87%, -6.46%] 9793.8 10489.9 🚀 WIN
memory_size/search_1000_float32 -6.69% [-6.90%, -6.45%] 40044.7 42915.4 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_16c -6.94% [-7.01%, -6.86%] 835.1 897.3 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c -6.36% [-7.03%, -5.64%] 40913.2 43693.8 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c -7.03% [-7.18%, -6.83%] 9777.0 10516.6 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -6.80% [-7.30%, -6.48%] 16619.9 17831.9 🚀 WIN
int8_batch_cosine/int8_loop/100 -6.84% [-7.35%, -6.39%] 1696.4 1821.0 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c -7.84% [-8.09%, -7.59%] 10017.4 10869.1 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c -7.03% [-8.22%, -6.14%] 39468.7 42451.2 🚀 WIN
int8_raw_dot_product/dot_product_i8_raw/128 -8.18% [-8.80%, -7.61%] 6.7 7.3 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c -8.64% [-8.82%, -8.43%] 10030.0 10978.5 🚀 WIN
int8_prepared_dot_product/prepared/768 -8.71% [-8.94%, -8.53%] 26.9 29.4 🚀 WIN
simd_batch_dot_product/simd_batch/1000 -8.64% [-8.95%, -8.35%] 50191.6 54941.1 🚀 WIN
simd_query_batch_dot_product/simd_batch/384d_256c -8.36% [-9.19%, -7.23%] 4941.7 5392.5 🚀 WIN
simd_prepared_query_normalized_cosine/dot_product_loop/384 -9.01% [-9.20%, -8.78%] 30782.2 33831.0 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c -9.31% [-9.47%, -9.09%] 16765.9 18486.4 🚀 WIN
int8_raw_dot_product/dot_product_i8/768 -9.38% [-9.60%, -9.17%] 26.6 29.4 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_256c -9.38% [-9.60%, -9.17%] 16641.4 18363.5 🚀 WIN
simd_euclidean_distance/simd/1024 -10.06% [-10.20%, -9.88%] 71.8 79.8 🚀 WIN
int8_raw_dot_product/dot_product_i8/128 -10.71% [-11.11%, -10.35%] 8.9 10.0 🚀 WIN
simd_normalize/simd/384 -6.04% [-11.13%, -0.00%] 73.1 77.8 🚀 WIN
int8_batch_cosine/int8_loop/1000 -10.92% [-11.27%, -10.62%] 16699.2 18746.0 🚀 WIN
int8_raw_dot_product/dot_product_i8/129 -10.83% [-11.38%, -10.35%] 9.4 10.6 🚀 WIN
int8_vs_float32_cosine/int8/768 -13.62% [-13.78%, -13.36%] 29.2 33.8 🚀 WIN
int8_prepared_dot_product/prepared/129 -15.16% [-15.50%, -14.78%] 9.1 10.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c -15.69% [-15.90%, -15.47%] 13033.5 15459.2 🚀 WIN
int8_prepared_dot_product/prepared/128 -15.72% [-16.00%, -15.36%] 8.6 10.2 🚀 WIN
simd_euclidean_distance/simd/384 -16.54% [-17.19%, -15.68%] 29.1 34.8 🚀 WIN
layer_norm/896 -17.47% [-17.64%, -17.26%] 199.9 242.2 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_256c -17.91% [-18.21%, -17.66%] 12997.9 15834.6 🚀 WIN
simd_normalize/simd/768 -15.33% [-18.30%, -12.22%] 111.5 131.7 🚀 WIN
simd_normalized_cosine_fast_path/dot_product/384 -18.90% [-19.08%, -18.72%] 27.2 33.6 🚀 WIN
simd_query_batch_dot_product/simd_batch/768d_256c -19.59% [-19.82%, -19.36%] 9493.6 11806.9 🚀 WIN
simd_normalize/simd/1536 -17.37% [-20.36%, -14.33%] 194.6 235.5 🚀 WIN
elementwise_mul/4096 -20.68% [-21.02%, -20.10%] 254.8 321.2 🚀 WIN
simd_normalize/simd/1024 -19.78% [-23.33%, -16.09%] 141.3 176.2 🚀 WIN
All 247 measurements
Bench Δ point CI-lower CI-upper
add_bias_gelu/4096 +8.23% +8.02% +8.40%
add_bias_gelu/896 +8.69% +8.41% +8.97%
binary_cosine_distance/binary/1024 +0.85% +0.72% +1.00%
binary_cosine_distance/binary/1536 +6.32% +2.95% +10.55%
binary_cosine_distance/binary/384 -0.53% -0.74% -0.37%
binary_cosine_distance/binary/768 +0.47% +0.26% +0.77%
binary_cosine_distance/float32_simd/1024 +5.33% +5.11% +5.52%
binary_cosine_distance/float32_simd/1536 +5.83% +5.56% +6.07%
binary_cosine_distance/float32_simd/384 -0.36% -0.59% -0.14%
binary_cosine_distance/float32_simd/768 +4.00% +3.83% +4.15%
elementwise_mul/4096 -20.68% -21.02% -20.10%
gelu/4096 +12.18% +11.57% +12.88%
gelu/896 +11.61% +11.40% +11.83%
int4_cosine_distance/float32_simd/1024 +4.64% +4.25% +5.11%
int4_cosine_distance/float32_simd/1536 +6.65% +6.14% +7.31%
int4_cosine_distance/float32_simd/384 +2.76% +2.28% +3.17%
int4_cosine_distance/float32_simd/768 +3.88% +3.69% +4.04%
int4_cosine_distance/int4/1024 +2.24% +1.82% +2.53%
int4_cosine_distance/int4/1536 +2.20% +1.77% +2.53%
int4_cosine_distance/int4/384 +1.71% +1.16% +2.35%
int4_cosine_distance/int4/768 +2.30% +1.96% +2.56%
int8_batch_cosine/float32_simd/10 -3.25% -3.80% -2.62%
int8_batch_cosine/float32_simd/100 -2.61% -2.85% -2.30%
int8_batch_cosine/float32_simd/1000 +3.32% +2.69% +3.85%
int8_batch_cosine/int8_loop/10 -5.03% -5.25% -4.80%
int8_batch_cosine/int8_loop/100 -6.84% -7.35% -6.39%
int8_batch_cosine/int8_loop/1000 -10.92% -11.27% -10.62%
int8_prepared_dot_product/per_call/1024 +6.77% +6.24% +7.28%
int8_prepared_dot_product/per_call/127 +5.27% +5.09% +5.49%
int8_prepared_dot_product/per_call/128 +6.10% +5.74% +6.40%
int8_prepared_dot_product/per_call/129 +6.14% +5.92% +6.45%
int8_prepared_dot_product/per_call/384 +6.14% +5.66% +6.56%
int8_prepared_dot_product/per_call/768 +6.43% +6.23% +6.75%
int8_prepared_dot_product/prepared/1024 +0.98% +0.75% +1.22%
int8_prepared_dot_product/prepared/127 -4.37% -4.81% -3.92%
int8_prepared_dot_product/prepared/128 -15.72% -16.00% -15.36%
int8_prepared_dot_product/prepared/129 -15.16% -15.50% -14.78%
int8_prepared_dot_product/prepared/384 +1.68% +1.21% +2.13%
int8_prepared_dot_product/prepared/768 -8.71% -8.94% -8.53%
int8_quantization/quantize/1024 +29.53% +29.25% +29.84%
int8_quantization/quantize/1536 +41.21% +40.70% +41.62%
int8_quantization/quantize/384 +40.20% +39.89% +40.56%
int8_quantization/quantize/768 +29.50% +29.30% +29.67%
int8_raw_dot_product/dot_product_i8/1024 +0.69% -0.06% +1.73%
int8_raw_dot_product/dot_product_i8/127 -3.31% -3.61% -2.92%
int8_raw_dot_product/dot_product_i8/128 -10.71% -11.11% -10.35%
int8_raw_dot_product/dot_product_i8/129 -10.83% -11.38% -10.35%
int8_raw_dot_product/dot_product_i8/384 +1.20% +0.34% +1.95%
int8_raw_dot_product/dot_product_i8/768 -9.38% -9.60% -9.17%
int8_raw_dot_product/dot_product_i8_raw/1024 -0.57% -1.03% +0.05%
int8_raw_dot_product/dot_product_i8_raw/127 +6.39% +5.31% +7.85%
int8_raw_dot_product/dot_product_i8_raw/128 -8.18% -8.80% -7.61%
int8_raw_dot_product/dot_product_i8_raw/129 -5.39% -5.84% -4.98%
int8_raw_dot_product/dot_product_i8_raw/384 +4.79% +4.42% +5.17%
int8_raw_dot_product/dot_product_i8_raw/768 -3.37% -3.76% -2.94%
int8_vs_float32_cosine/float32_simd/1024 +4.29% +4.15% +4.44%
int8_vs_float32_cosine/float32_simd/1536 +8.75% +8.31% +9.28%
int8_vs_float32_cosine/float32_simd/384 -1.00% -1.24% -0.76%
int8_vs_float32_cosine/float32_simd/768 +3.20% +3.02% +3.41%
int8_vs_float32_cosine/int8/1024 -0.13% -1.25% +0.81%
int8_vs_float32_cosine/int8/1536 +5.26% +4.96% +5.52%
int8_vs_float32_cosine/int8/384 -1.06% -3.58% +0.82%
int8_vs_float32_cosine/int8/768 -13.62% -13.78% -13.36%
layer_norm/4096 +26.91% +26.34% +27.63%
layer_norm/896 -17.47% -17.64% -17.26%
memory_size/search_1000_float32 -6.69% -6.90% -6.45%
memory_size/search_1000_int8 -4.81% -5.06% -4.49%
rms_norm/4096 +10.07% +9.46% +10.57%
rms_norm/896 +12.08% +11.47% +12.49%
silu_inplace/4096 +10.16% +9.23% +10.84%
silu_inplace/896 +10.89% +10.53% +11.23%
simd_batch_cosine/scalar_loop/10 +8.93% +8.80% +9.10%
simd_batch_cosine/scalar_loop/100 +8.95% +8.87% +9.05%
simd_batch_cosine/scalar_loop/1000 +8.79% +8.73% +8.84%
simd_batch_cosine/simd_batch/10 -5.62% -5.96% -5.21%
simd_batch_cosine/simd_batch/100 -2.04% -2.22% -1.80%
simd_batch_cosine/simd_batch/1000 +2.71% +2.23% +3.13%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c +3.36% +3.15% +3.52%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c +4.78% +4.63% +5.02%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c +5.48% +3.83% +7.62%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c +4.09% +3.95% +4.21%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c +5.29% +5.04% +5.51%
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c -4.74% -4.95% -4.49%
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c -5.26% -5.55% -4.94%
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c -8.64% -8.82% -8.43%
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c -4.79% -5.25% -4.31%
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c -5.15% -5.34% -4.94%
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c -0.22% -0.56% +0.15%
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c -1.40% -1.56% -1.19%
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c -6.33% -6.54% -6.07%
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c -2.46% -2.71% -2.23%
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c -0.72% -0.86% -0.61%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c +3.62% +3.35% +3.85%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c +5.46% +5.12% +5.69%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c +2.61% +2.38% +2.77%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c +4.69% +4.37% +4.91%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c +5.81% +5.71% +5.91%
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c -7.03% -8.22% -6.14%
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c -4.40% -4.54% -4.25%
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c -7.03% -7.18% -6.83%
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c -3.37% -3.54% -3.23%
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c -5.25% -5.43% -5.04%
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c -1.47% -1.79% -1.18%
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c -0.65% -0.85% -0.44%
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -6.80% -7.30% -6.48%
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c +0.77% +0.49% +1.05%
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c +0.12% -0.09% +0.26%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c +1.02% +0.72% +1.36%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c +0.16% +0.07% +0.25%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c +1.24% +0.20% +2.49%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c +0.29% +0.01% +0.59%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c +1.54% +1.34% +1.87%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c -6.36% -7.03% -5.64%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c -5.30% -5.72% -4.93%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c -7.84% -8.09% -7.59%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c -6.02% -6.13% -5.91%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c -5.87% -6.02% -5.74%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c -4.43% -4.88% -4.08%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c -4.98% -5.11% -4.83%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c -9.31% -9.47% -9.09%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c -3.42% -3.82% -3.09%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c -3.20% -3.45% -3.04%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c +9.18% +8.79% +9.67%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c +5.76% +5.52% +5.91%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c -2.09% -2.36% -1.79%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c +2.43% +2.03% +2.77%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c +9.57% +9.44% +9.68%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c +6.26% +6.06% +6.50%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c +18.35% +18.20% +18.56%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c +4.60% +4.33% +4.97%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c +12.01% +11.81% +12.20%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c +12.58% +12.35% +12.78%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c -4.87% -5.14% -4.56%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c -6.36% -6.72% -6.07%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c -15.69% -15.90% -15.47%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c -5.58% -6.09% -5.20%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c -6.24% -6.85% -5.78%
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c +1.29% +1.17% +1.43%
simd_batch_cosine_normalized_query/simd_batch/1024d_16c +0.76% +0.41% +0.99%
simd_batch_cosine_normalized_query/simd_batch/1024d_256c +0.58% +0.40% +0.81%
simd_batch_cosine_normalized_query/simd_batch/1024d_4c +2.55% +2.40% +2.74%
simd_batch_cosine_normalized_query/simd_batch/1024d_64c +3.75% +2.60% +5.32%
simd_batch_cosine_normalized_query/simd_batch/384d_1000c -6.41% -6.66% -6.13%
simd_batch_cosine_normalized_query/simd_batch/384d_16c -4.56% -4.69% -4.45%
simd_batch_cosine_normalized_query/simd_batch/384d_256c -6.64% -6.87% -6.46%
simd_batch_cosine_normalized_query/simd_batch/384d_4c -2.54% -2.69% -2.30%
simd_batch_cosine_normalized_query/simd_batch/384d_64c -5.58% -5.71% -5.38%
simd_batch_cosine_normalized_query/simd_batch/768d_1000c -4.36% -4.72% -4.02%
simd_batch_cosine_normalized_query/simd_batch/768d_16c -4.39% -4.59% -4.21%
simd_batch_cosine_normalized_query/simd_batch/768d_256c -9.38% -9.60% -9.17%
simd_batch_cosine_normalized_query/simd_batch/768d_4c -1.63% -1.86% -1.46%
simd_batch_cosine_normalized_query/simd_batch/768d_64c -3.11% -3.25% -2.91%
simd_batch_dot_product/scalar_loop/10 +9.75% +9.39% +9.99%
simd_batch_dot_product/scalar_loop/100 +9.62% +9.54% +9.70%
simd_batch_dot_product/scalar_loop/1000 +10.11% +9.78% +10.48%
simd_batch_dot_product/simd_batch/10 +13.37% +13.19% +13.63%
simd_batch_dot_product/simd_batch/100 -5.15% -5.35% -4.91%
simd_batch_dot_product/simd_batch/1000 -8.64% -8.95% -8.35%
simd_cosine_similarity/scalar/1024 +11.30% +11.23% +11.37%
simd_cosine_similarity/scalar/1536 +11.76% +11.51% +11.91%
simd_cosine_similarity/scalar/384 +9.32% +9.25% +9.39%
simd_cosine_similarity/scalar/768 +10.92% +10.55% +11.36%
simd_cosine_similarity/simd/1024 -0.35% -0.47% -0.24%
simd_cosine_similarity/simd/1536 +5.54% +5.32% +5.71%
simd_cosine_similarity/simd/384 -1.14% -1.37% -0.88%
simd_cosine_similarity/simd/768 +8.95% +8.75% +9.22%
simd_dot_product/scalar/1024 +11.93% +11.81% +12.15%
simd_dot_product/scalar/1536 +12.76% +12.41% +13.21%
simd_dot_product/scalar/384 +9.81% +9.64% +10.04%
simd_dot_product/scalar/768 +10.78% +9.98% +11.32%
simd_dot_product/simd/1024 +23.49% +23.15% +23.75%
simd_dot_product/simd/1536 +2.36% +2.18% +2.60%
simd_dot_product/simd/384 -4.97% -5.20% -4.74%
simd_dot_product/simd/768 +3.11% +2.93% +3.33%
simd_euclidean_distance/scalar/1024 +12.29% +12.09% +12.58%
simd_euclidean_distance/scalar/1536 +12.31% +12.26% +12.36%
simd_euclidean_distance/scalar/384 +10.74% +10.68% +10.81%
simd_euclidean_distance/scalar/768 +11.74% +11.65% +11.82%
simd_euclidean_distance/simd/1024 -10.06% -10.20% -9.88%
simd_euclidean_distance/simd/1536 +9.31% +9.10% +9.47%
simd_euclidean_distance/simd/384 -16.54% -17.19% -15.68%
simd_euclidean_distance/simd/768 +4.56% +4.40% +4.71%
simd_normalize/scalar/1024 +11.71% +11.46% +11.95%
simd_normalize/scalar/1536 +12.15% +11.86% +12.48%
simd_normalize/scalar/384 +12.23% +11.86% +12.61%
simd_normalize/scalar/768 +12.64% +12.35% +12.99%
simd_normalize/simd/1024 -19.78% -23.33% -16.09%
simd_normalize/simd/1536 -17.37% -20.36% -14.33%
simd_normalize/simd/384 -6.04% -11.13% -0.00%
simd_normalize/simd/768 -15.33% -18.30% -12.22%
simd_normalized_cosine_fast_path/cosine_full/1024 +10.09% +9.83% +10.31%
simd_normalized_cosine_fast_path/cosine_full/384 -3.09% -3.30% -2.84%
simd_normalized_cosine_fast_path/cosine_full/768 +8.49% +8.23% +8.80%
simd_normalized_cosine_fast_path/dot_product/1024 +22.94% +22.78% +23.13%
simd_normalized_cosine_fast_path/dot_product/384 -18.90% -19.08% -18.72%
simd_normalized_cosine_fast_path/dot_product/768 +22.70% +22.49% +22.94%
simd_prepared_query_normalized_cosine/dot_product_loop/1024 -2.14% -2.48% -1.79%
simd_prepared_query_normalized_cosine/dot_product_loop/384 -9.01% -9.20% -8.78%
simd_prepared_query_normalized_cosine/dot_product_loop/768 -1.53% -1.87% -1.19%
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 +4.72% +4.51% +4.99%
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 -5.89% -6.29% -5.57%
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 -0.90% -1.22% -0.62%
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 +17.39% +16.89% +17.74%
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 +7.59% +7.15% +8.02%
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 +6.60% +6.17% +7.00%
simd_query_batch_dot_product/pair_loop/128d_16c +4.01% +3.71% +4.24%
simd_query_batch_dot_product/pair_loop/128d_256c +4.83% +4.40% +5.31%
simd_query_batch_dot_product/pair_loop/128d_4c +8.55% +8.19% +8.88%
simd_query_batch_dot_product/pair_loop/128d_64c +5.28% +4.99% +5.48%
simd_query_batch_dot_product/pair_loop/384d_16c +13.77% +13.46% +14.11%
simd_query_batch_dot_product/pair_loop/384d_256c +4.51% +4.12% +4.81%
simd_query_batch_dot_product/pair_loop/384d_4c +11.11% +10.77% +11.47%
simd_query_batch_dot_product/pair_loop/384d_64c +11.14% +10.93% +11.30%
simd_query_batch_dot_product/pair_loop/768d_16c -6.94% -7.01% -6.86%
simd_query_batch_dot_product/pair_loop/768d_256c -17.91% -18.21% -17.66%
simd_query_batch_dot_product/pair_loop/768d_4c -5.74% -5.99% -5.51%
simd_query_batch_dot_product/pair_loop/768d_64c -6.13% -6.38% -5.93%
simd_query_batch_dot_product/simd_batch/128d_16c -5.97% -6.33% -5.68%
simd_query_batch_dot_product/simd_batch/128d_256c +1.23% +0.94% +1.42%
simd_query_batch_dot_product/simd_batch/128d_4c +7.02% +6.69% +7.25%
simd_query_batch_dot_product/simd_batch/128d_64c -0.66% -1.01% -0.20%
simd_query_batch_dot_product/simd_batch/384d_16c +4.21% +4.00% +4.49%
simd_query_batch_dot_product/simd_batch/384d_256c -8.36% -9.19% -7.23%
simd_query_batch_dot_product/simd_batch/384d_4c +2.86% +2.59% +3.21%
simd_query_batch_dot_product/simd_batch/384d_64c +9.74% +9.49% +10.05%
simd_query_batch_dot_product/simd_batch/768d_16c +5.84% +5.67% +6.07%
simd_query_batch_dot_product/simd_batch/768d_256c -19.59% -19.82% -19.36%
simd_query_batch_dot_product/simd_batch/768d_4c +1.51% +1.20% +1.72%
simd_query_batch_dot_product/simd_batch/768d_64c +9.08% +8.73% +9.35%
simd_squared_euclidean_fast_path/euclidean_full/1024 +6.82% +6.59% +6.99%
simd_squared_euclidean_fast_path/euclidean_full/384 -1.59% -1.79% -1.40%
simd_squared_euclidean_fast_path/euclidean_full/768 +5.12% +4.89% +5.34%
simd_squared_euclidean_fast_path/squared_euclidean/1024 +7.69% +7.06% +8.55%
simd_squared_euclidean_fast_path/squared_euclidean/384 -2.57% -2.82% -2.29%
simd_squared_euclidean_fast_path/squared_euclidean/768 +4.99% +4.25% +5.64%
simd_throughput_384/cosine_similarity -0.06% -0.31% +0.16%
simd_throughput_384/dot_product -3.87% -4.12% -3.64%
simd_throughput_384/euclidean_distance -1.46% -1.64% -1.25%
simd_throughput_384/normalize +8.17% +7.99% +8.37%
softmax_attention/128 +8.82% +8.69% +8.93%
softmax_attention/512 +11.44% +11.24% +11.61%
tier_prepared_query/binary_query_once_1000 -0.30% -0.68% +0.11%
tier_prepared_query/binary_query_per_call_1000 +2.69% +2.02% +3.63%
tier_prepared_query/int4_query_once_1000 +4.81% +4.40% +5.09%
tier_prepared_query/int4_query_per_call_1000 +4.13% +3.85% +4.43%
tier_prepared_query/int8_query_once_1000 -5.09% -5.63% -4.33%
tier_prepared_query/int8_query_per_call_1000 +6.42% +6.33% +6.50%

Rule: CI-lower of change ≤3.0% passes silently; (3.0%, 7.0%] warns; >7.0% fails. Override via PR label bench-allow-regression.

Gate is in advisory mode (Rollout step 3, ADR-058 §Rollout). Failures do not block merge for the first 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant