Skip to content

release: v0.2.3 — ship RoPE fix to crates.io (yank 0.2.2)#98

Merged
ohdearquant merged 1 commit into
mainfrom
release/v0.2.3
May 25, 2026
Merged

release: v0.2.3 — ship RoPE fix to crates.io (yank 0.2.2)#98
ohdearquant merged 1 commit into
mainfrom
release/v0.2.3

Conversation

@ohdearquant
Copy link
Copy Markdown
Owner

Summary

crates.io v0.2.2 was published 2026-05-20, before the RoPE pairing fix landed (PR #96, merged today). Cannot republish 0.2.2 (immutable on crates.io), so bumping to 0.2.3 to ship the fix. v0.2.2 will be yanked on crates.io post-publish.

Changes

  • Workspace version 0.2.2 → 0.2.3
  • Internal path-dep minimum versions bumped to 0.2.3 (embed, tune)
  • docs/releases/v0.2.2.mdv0.2.3.md with yank notice

Post-merge

git checkout main && git pull
git tag -a v0.2.3 -m "v0.2.3"
git push origin v0.2.3
gh release create v0.2.3 --title "v0.2.3 — MLX-parity quality" --notes-file docs/releases/v0.2.3.md
make publish
# Then yank broken 0.2.2:
for c in lattice-inference lattice-fann lattice-transport lattice-embed lattice-tune; do
  cargo yank --version 0.2.2 -p "$c"
done

🤖 Generated with Claude Code

crates.io v0.2.2 was published 2026-05-20, before the RoPE pairing fix
landed (PR #96, merged today). Cannot republish 0.2.2 (immutable on
crates.io), so bumping to 0.2.3 to ship the fix. v0.2.2 will be yanked
on crates.io post-publish to prevent new installs from getting the
broken interleaved RoPE.

- Workspace version 0.2.2 → 0.2.3
- Internal path-dep minimum versions bumped to 0.2.3
- Release notes renamed v0.2.2.md → v0.2.3.md with yank notice
- GitHub tag v0.2.2 left in place for history

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ohdearquant ohdearquant merged commit 3fc2c67 into main May 25, 2026
3 of 5 checks passed
@ohdearquant ohdearquant deleted the release/v0.2.3 branch May 25, 2026 03:19
@github-actions
Copy link
Copy Markdown

Perf regression report (ADR-058)

aarch64-linux — perf regression report

❌ 4 FAIL (regression >7.0% confirmed by 95% CI)
⚠ 4 WARN (regression 3.0-7.0% confirmed)
🚀 8 confirmed improvement

Bench Δ point 95% CI new ns base ns verdict
simd_query_batch_dot_product/pair_loop/768d_256c +9.85% [+9.73%, +9.97%] 22011.7 20037.2 ❌ FAIL
simd_query_batch_dot_product/simd_batch/768d_256c +9.18% [+9.06%, +9.30%] 18560.3 16999.2 ❌ FAIL
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c +7.92% [+7.53%, +8.34%] 38869.6 36016.0 ❌ FAIL
simd_batch_cosine_normalized_query/simd_batch/1024d_256c +7.50% [+7.15%, +7.81%] 38907.4 36194.0 ❌ FAIL
simd_query_batch_dot_product/simd_batch/768d_16c +5.97% [+5.95%, +5.98%] 661.7 624.5 ⚠ WARN
int8_vs_float32_cosine/int8/384 +5.27% [+4.94%, +5.65%] 17.2 16.3 ⚠ WARN
int8_batch_cosine/int8_loop/1000 +5.08% [+4.87%, +5.28%] 19070.7 18148.9 ⚠ WARN
simd_query_batch_dot_product/simd_batch/128d_64c +3.92% [+3.88%, +3.97%] 538.9 518.6 ⚠ WARN
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -3.24% [-3.36%, -3.13%] 28436.5 29389.1 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c -3.29% [-3.64%, -2.94%] 35245.0 36444.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c -3.74% [-3.78%, -3.68%] 946.3 983.0 🚀 WIN
simd_dot_product/simd/1024 -4.12% [-4.19%, -4.04%] 70.6 73.6 🚀 WIN
simd_throughput_384/dot_product -4.75% [-4.91%, -4.60%] 31.5 33.1 🚀 WIN
simd_batch_cosine/simd_batch/1000 -6.18% [-6.26%, -6.10%] 81374.1 86738.6 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c -8.01% [-8.10%, -7.92%] 1385.7 1506.4 🚀 WIN
simd_batch_dot_product/simd_batch/1000 -14.56% [-14.61%, -14.50%] 73353.0 85852.2 🚀 WIN
All 247 measurements
Bench Δ point CI-lower CI-upper
add_bias_gelu/4096 -0.01% -0.02% +0.00%
add_bias_gelu/896 +0.00% -0.01% +0.02%
binary_cosine_distance/binary/1024 +0.14% +0.12% +0.15%
binary_cosine_distance/binary/1536 +0.48% +0.46% +0.50%
binary_cosine_distance/binary/384 +0.33% +0.31% +0.35%
binary_cosine_distance/binary/768 +0.15% +0.13% +0.16%
binary_cosine_distance/float32_simd/1024 -0.05% -0.08% -0.02%
binary_cosine_distance/float32_simd/1536 +0.03% +0.02% +0.04%
binary_cosine_distance/float32_simd/384 +0.22% +0.19% +0.25%
binary_cosine_distance/float32_simd/768 +0.15% +0.13% +0.17%
elementwise_mul/4096 -2.51% -2.55% -2.47%
gelu/4096 +0.00% -0.02% +0.02%
gelu/896 +0.00% -0.01% +0.01%
int4_cosine_distance/float32_simd/1024 +0.30% +0.26% +0.34%
int4_cosine_distance/float32_simd/1536 +0.05% +0.04% +0.06%
int4_cosine_distance/float32_simd/384 +0.16% +0.13% +0.19%
int4_cosine_distance/float32_simd/768 +0.07% +0.05% +0.09%
int4_cosine_distance/int4/1024 -0.04% -0.06% -0.01%
int4_cosine_distance/int4/1536 -0.10% -0.13% -0.07%
int4_cosine_distance/int4/384 +0.12% +0.09% +0.16%
int4_cosine_distance/int4/768 +0.46% +0.43% +0.49%
int8_batch_cosine/float32_simd/10 -0.06% -0.07% -0.05%
int8_batch_cosine/float32_simd/100 +0.45% +0.42% +0.47%
int8_batch_cosine/float32_simd/1000 -1.04% -1.11% -0.97%
int8_batch_cosine/int8_loop/10 -0.02% -0.05% +0.02%
int8_batch_cosine/int8_loop/100 +0.51% +0.49% +0.54%
int8_batch_cosine/int8_loop/1000 +5.08% +4.87% +5.28%
int8_prepared_dot_product/per_call/1024 -0.00% -0.02% +0.01%
int8_prepared_dot_product/per_call/127 -0.05% -0.06% -0.04%
int8_prepared_dot_product/per_call/128 -0.01% -0.01% +0.01%
int8_prepared_dot_product/per_call/129 -0.02% -0.03% -0.01%
int8_prepared_dot_product/per_call/384 -0.01% -0.04% +0.01%
int8_prepared_dot_product/per_call/768 +0.01% -0.00% +0.02%
int8_prepared_dot_product/prepared/1024 -0.78% -0.85% -0.71%
int8_prepared_dot_product/prepared/127 -0.40% -0.43% -0.38%
int8_prepared_dot_product/prepared/128 +1.19% +0.86% +1.50%
int8_prepared_dot_product/prepared/129 -0.32% -0.35% -0.29%
int8_prepared_dot_product/prepared/384 -2.64% -2.69% -2.59%
int8_prepared_dot_product/prepared/768 +0.04% -0.02% +0.08%
int8_quantization/quantize/1024 -0.00% -0.02% +0.01%
int8_quantization/quantize/1536 +0.11% +0.09% +0.12%
int8_quantization/quantize/384 +0.00% -0.00% +0.01%
int8_quantization/quantize/768 -0.01% -0.03% -0.00%
int8_raw_dot_product/dot_product_i8/1024 +0.59% +0.51% +0.68%
int8_raw_dot_product/dot_product_i8/127 -0.17% -0.20% -0.14%
int8_raw_dot_product/dot_product_i8/128 +1.61% +1.54% +1.68%
int8_raw_dot_product/dot_product_i8/129 +0.36% +0.35% +0.38%
int8_raw_dot_product/dot_product_i8/384 -1.19% -1.29% -1.10%
int8_raw_dot_product/dot_product_i8/768 -1.17% -1.22% -1.13%
int8_raw_dot_product/dot_product_i8_raw/1024 +0.01% -0.03% +0.04%
int8_raw_dot_product/dot_product_i8_raw/127 -0.82% -0.84% -0.80%
int8_raw_dot_product/dot_product_i8_raw/128 +0.03% -0.04% +0.10%
int8_raw_dot_product/dot_product_i8_raw/129 +0.77% +0.71% +0.83%
int8_raw_dot_product/dot_product_i8_raw/384 -0.33% -0.42% -0.25%
int8_raw_dot_product/dot_product_i8_raw/768 -0.40% -0.42% -0.38%
int8_vs_float32_cosine/float32_simd/1024 +0.21% +0.19% +0.22%
int8_vs_float32_cosine/float32_simd/1536 +0.02% +0.00% +0.04%
int8_vs_float32_cosine/float32_simd/384 -0.19% -0.31% -0.08%
int8_vs_float32_cosine/float32_simd/768 +0.09% +0.04% +0.14%
int8_vs_float32_cosine/int8/1024 -0.89% -0.93% -0.85%
int8_vs_float32_cosine/int8/1536 -0.23% -0.34% -0.14%
int8_vs_float32_cosine/int8/384 +5.27% +4.94% +5.65%
int8_vs_float32_cosine/int8/768 -0.42% -0.51% -0.34%
layer_norm/4096 +0.11% +0.08% +0.13%
layer_norm/896 -0.06% -0.10% -0.03%
memory_size/search_1000_float32 +0.46% +0.43% +0.49%
memory_size/search_1000_int8 +1.13% +1.05% +1.21%
rms_norm/4096 +1.23% +1.18% +1.27%
rms_norm/896 -0.11% -0.14% -0.08%
silu_inplace/4096 +0.01% -0.01% +0.02%
silu_inplace/896 +0.00% -0.02% +0.02%
simd_batch_cosine/scalar_loop/10 -0.05% -0.07% -0.04%
simd_batch_cosine/scalar_loop/100 +0.12% +0.09% +0.14%
simd_batch_cosine/scalar_loop/1000 -0.27% -0.30% -0.24%
simd_batch_cosine/simd_batch/10 +0.26% +0.25% +0.27%
simd_batch_cosine/simd_batch/100 +0.73% +0.71% +0.75%
simd_batch_cosine/simd_batch/1000 -6.18% -6.26% -6.10%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c +0.20% +0.17% +0.24%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c +1.04% +1.03% +1.05%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c +0.82% +0.43% +1.17%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c +0.12% +0.11% +0.13%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c +0.58% +0.57% +0.59%
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c +0.46% +0.42% +0.49%
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c -0.07% -0.09% -0.06%
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c +0.13% +0.11% +0.14%
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c +0.29% +0.25% +0.35%
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c +0.36% +0.35% +0.37%
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c -0.68% -0.90% -0.35%
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c +0.30% +0.29% +0.31%
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c -2.92% -3.02% -2.81%
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c +0.19% +0.18% +0.19%
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c +0.66% +0.65% +0.67%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c -0.17% -0.23% -0.12%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c +1.04% +1.02% +1.05%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c -3.29% -3.64% -2.94%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c +0.12% +0.11% +0.14%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c +0.41% +0.40% +0.43%
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c -0.14% -0.20% -0.07%
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c +0.03% +0.02% +0.04%
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c -0.02% -0.04% -0.01%
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c +0.10% +0.09% +0.12%
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c +0.38% +0.37% +0.39%
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c -1.04% -1.09% -0.99%
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c +0.49% +0.48% +0.50%
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -3.24% -3.36% -3.13%
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c +0.05% +0.04% +0.05%
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c +0.49% +0.48% +0.51%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c -1.17% -1.31% -0.96%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c -0.77% -0.79% -0.76%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c +7.92% +7.53% +8.34%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c -0.02% -0.03% -0.01%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c +0.27% +0.26% +0.29%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c +0.75% +0.71% +0.80%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c -0.03% -0.04% -0.01%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c +0.11% +0.09% +0.13%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c +0.01% -0.01% +0.02%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c +0.80% +0.79% +0.81%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c +2.15% +2.12% +2.19%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c -0.74% -0.75% -0.72%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c -0.61% -0.70% -0.52%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c +0.03% +0.02% +0.05%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c +0.54% +0.53% +0.55%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c +1.53% +1.47% +1.58%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c -8.01% -8.10% -7.92%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c +1.86% +1.38% +2.33%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c +2.66% +2.56% +2.74%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c +0.27% +0.25% +0.29%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c +1.62% +1.58% +1.66%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c -0.47% -0.52% -0.41%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c +0.71% +0.68% +0.75%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c +0.98% +0.90% +1.06%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c +1.10% +1.09% +1.12%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c +3.45% +2.99% +4.06%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c -3.74% -3.78% -3.68%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c -2.43% -2.51% -2.35%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c +0.83% +0.79% +0.88%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c -0.73% -0.85% -0.62%
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c -1.27% -1.31% -1.24%
simd_batch_cosine_normalized_query/simd_batch/1024d_16c -0.73% -0.74% -0.71%
simd_batch_cosine_normalized_query/simd_batch/1024d_256c +7.50% +7.15% +7.81%
simd_batch_cosine_normalized_query/simd_batch/1024d_4c +0.04% +0.03% +0.05%
simd_batch_cosine_normalized_query/simd_batch/1024d_64c +0.17% +0.15% +0.18%
simd_batch_cosine_normalized_query/simd_batch/384d_1000c +0.57% +0.54% +0.60%
simd_batch_cosine_normalized_query/simd_batch/384d_16c -0.34% -0.36% -0.33%
simd_batch_cosine_normalized_query/simd_batch/384d_256c -0.06% -0.08% -0.04%
simd_batch_cosine_normalized_query/simd_batch/384d_4c -0.12% -0.13% -0.11%
simd_batch_cosine_normalized_query/simd_batch/384d_64c +0.63% +0.62% +0.64%
simd_batch_cosine_normalized_query/simd_batch/768d_1000c +1.84% +1.79% +1.88%
simd_batch_cosine_normalized_query/simd_batch/768d_16c -0.38% -0.39% -0.37%
simd_batch_cosine_normalized_query/simd_batch/768d_256c +1.90% +1.73% +2.10%
simd_batch_cosine_normalized_query/simd_batch/768d_4c -0.12% -0.13% -0.10%
simd_batch_cosine_normalized_query/simd_batch/768d_64c +0.53% +0.52% +0.54%
simd_batch_dot_product/scalar_loop/10 +0.05% +0.05% +0.06%
simd_batch_dot_product/scalar_loop/100 -0.26% -0.32% -0.21%
simd_batch_dot_product/scalar_loop/1000 -0.75% -0.82% -0.68%
simd_batch_dot_product/simd_batch/10 -0.23% -0.27% -0.18%
simd_batch_dot_product/simd_batch/100 -0.44% -0.45% -0.42%
simd_batch_dot_product/simd_batch/1000 -14.56% -14.61% -14.50%
simd_cosine_similarity/scalar/1024 -0.00% -0.03% +0.03%
simd_cosine_similarity/scalar/1536 +0.04% +0.03% +0.05%
simd_cosine_similarity/scalar/384 -0.26% -0.32% -0.21%
simd_cosine_similarity/scalar/768 -0.08% -0.10% -0.06%
simd_cosine_similarity/simd/1024 -0.31% -0.33% -0.29%
simd_cosine_similarity/simd/1536 -0.11% -0.13% -0.09%
simd_cosine_similarity/simd/384 +1.35% +1.19% +1.52%
simd_cosine_similarity/simd/768 +0.02% -0.01% +0.06%
simd_dot_product/scalar/1024 +0.01% -0.03% +0.06%
simd_dot_product/scalar/1536 -0.00% -0.01% +0.01%
simd_dot_product/scalar/384 -0.00% -0.02% +0.01%
simd_dot_product/scalar/768 +0.02% +0.00% +0.03%
simd_dot_product/simd/1024 -4.12% -4.19% -4.04%
simd_dot_product/simd/1536 -0.11% -0.17% -0.04%
simd_dot_product/simd/384 +0.00% -0.05% +0.05%
simd_dot_product/simd/768 -0.32% -0.39% -0.25%
simd_euclidean_distance/scalar/1024 +0.02% -0.01% +0.04%
simd_euclidean_distance/scalar/1536 -0.01% -0.03% +0.00%
simd_euclidean_distance/scalar/384 -0.25% -0.29% -0.21%
simd_euclidean_distance/scalar/768 +0.01% -0.04% +0.06%
simd_euclidean_distance/simd/1024 -0.17% -0.20% -0.14%
simd_euclidean_distance/simd/1536 +0.26% +0.25% +0.27%
simd_euclidean_distance/simd/384 +0.89% +0.83% +0.93%
simd_euclidean_distance/simd/768 +0.72% +0.64% +0.77%
simd_normalize/scalar/1024 -0.34% -0.54% -0.15%
simd_normalize/scalar/1536 -0.36% -0.54% -0.16%
simd_normalize/scalar/384 -0.17% -0.56% +0.21%
simd_normalize/scalar/768 -0.48% -0.69% -0.27%
simd_normalize/simd/1024 +0.21% -0.70% +1.15%
simd_normalize/simd/1536 +1.44% +0.60% +2.25%
simd_normalize/simd/384 -0.19% -1.60% +1.31%
simd_normalize/simd/768 +0.94% -0.26% +2.13%
simd_normalized_cosine_fast_path/cosine_full/1024 +0.52% +0.48% +0.55%
simd_normalized_cosine_fast_path/cosine_full/384 +0.32% +0.20% +0.44%
simd_normalized_cosine_fast_path/cosine_full/768 +0.13% +0.07% +0.19%
simd_normalized_cosine_fast_path/dot_product/1024 -1.76% -1.91% -1.61%
simd_normalized_cosine_fast_path/dot_product/384 -1.94% -2.07% -1.78%
simd_normalized_cosine_fast_path/dot_product/768 -1.92% -2.00% -1.84%
simd_prepared_query_normalized_cosine/dot_product_loop/1024 +2.08% +1.97% +2.17%
simd_prepared_query_normalized_cosine/dot_product_loop/384 +1.13% +1.05% +1.21%
simd_prepared_query_normalized_cosine/dot_product_loop/768 +2.53% +2.44% +2.61%
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 +0.16% +0.11% +0.22%
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 +0.74% +0.67% +0.80%
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 +0.10% +0.05% +0.16%
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 +1.72% +1.63% +1.83%
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 +0.86% +0.67% +1.04%
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 +1.43% +1.34% +1.51%
simd_query_batch_dot_product/pair_loop/128d_16c -0.51% -0.55% -0.47%
simd_query_batch_dot_product/pair_loop/128d_256c +0.23% +0.20% +0.25%
simd_query_batch_dot_product/pair_loop/128d_4c -0.26% -0.34% -0.18%
simd_query_batch_dot_product/pair_loop/128d_64c +1.60% +1.53% +1.67%
simd_query_batch_dot_product/pair_loop/384d_16c -0.68% -0.75% -0.61%
simd_query_batch_dot_product/pair_loop/384d_256c +0.34% +0.32% +0.37%
simd_query_batch_dot_product/pair_loop/384d_4c +1.78% +1.65% +1.91%
simd_query_batch_dot_product/pair_loop/384d_64c +1.99% +1.98% +2.01%
simd_query_batch_dot_product/pair_loop/768d_16c -1.39% -1.42% -1.36%
simd_query_batch_dot_product/pair_loop/768d_256c +9.85% +9.73% +9.97%
simd_query_batch_dot_product/pair_loop/768d_4c +0.44% +0.41% +0.48%
simd_query_batch_dot_product/pair_loop/768d_64c -0.66% -0.79% -0.54%
simd_query_batch_dot_product/simd_batch/128d_16c -0.30% -0.34% -0.26%
simd_query_batch_dot_product/simd_batch/128d_256c -0.77% -0.86% -0.68%
simd_query_batch_dot_product/simd_batch/128d_4c +0.02% -0.02% +0.06%
simd_query_batch_dot_product/simd_batch/128d_64c +3.92% +3.88% +3.97%
simd_query_batch_dot_product/simd_batch/384d_16c +0.50% +0.48% +0.52%
simd_query_batch_dot_product/simd_batch/384d_256c +0.69% +0.61% +0.77%
simd_query_batch_dot_product/simd_batch/384d_4c +0.05% +0.02% +0.08%
simd_query_batch_dot_product/simd_batch/384d_64c +2.25% +2.16% +2.35%
simd_query_batch_dot_product/simd_batch/768d_16c +5.97% +5.95% +5.98%
simd_query_batch_dot_product/simd_batch/768d_256c +9.18% +9.06% +9.30%
simd_query_batch_dot_product/simd_batch/768d_4c +0.09% +0.07% +0.11%
simd_query_batch_dot_product/simd_batch/768d_64c +1.58% +1.51% +1.65%
simd_squared_euclidean_fast_path/euclidean_full/1024 +0.06% +0.03% +0.08%
simd_squared_euclidean_fast_path/euclidean_full/384 +0.82% +0.77% +0.87%
simd_squared_euclidean_fast_path/euclidean_full/768 +0.29% +0.26% +0.31%
simd_squared_euclidean_fast_path/squared_euclidean/1024 +0.23% +0.20% +0.25%
simd_squared_euclidean_fast_path/squared_euclidean/384 -0.17% -0.22% -0.13%
simd_squared_euclidean_fast_path/squared_euclidean/768 +0.05% +0.03% +0.08%
simd_throughput_384/cosine_similarity +0.27% +0.19% +0.35%
simd_throughput_384/dot_product -4.75% -4.91% -4.60%
simd_throughput_384/euclidean_distance +1.28% +1.25% +1.31%
simd_throughput_384/normalize -1.31% -1.32% -1.29%
softmax_attention/128 -0.06% -0.08% -0.05%
softmax_attention/512 +0.85% +0.73% +0.99%
tier_prepared_query/binary_query_once_1000 -0.09% -0.11% -0.06%
tier_prepared_query/binary_query_per_call_1000 +0.00% -0.01% +0.01%
tier_prepared_query/int4_query_once_1000 -0.11% -0.14% -0.08%
tier_prepared_query/int4_query_per_call_1000 -0.13% -0.14% -0.12%
tier_prepared_query/int8_query_once_1000 +0.19% +0.16% +0.21%
tier_prepared_query/int8_query_per_call_1000 +0.04% +0.03% +0.06%

Rule: CI-lower of change ≤3.0% passes silently; (3.0%, 7.0%] warns; >7.0% fails. Override via PR label bench-allow-regression.

x86_64-linux — perf regression report

❌ 64 FAIL (regression >7.0% confirmed by 95% CI)
⚠ 8 WARN (regression 3.0-7.0% confirmed)
🚀 141 confirmed improvement

Bench Δ point 95% CI new ns base ns verdict
int8_raw_dot_product/dot_product_i8/768 +43.67% [+43.41%, +43.94%] 30.2 21.0 ❌ FAIL
simd_throughput_384/cosine_similarity +43.30% [+42.68%, +43.84%] 44.6 31.1 ❌ FAIL
int8_vs_float32_cosine/int8/768 +39.18% [+38.44%, +39.72%] 33.5 24.0 ❌ FAIL
simd_throughput_384/euclidean_distance +38.77% [+38.27%, +39.30%] 35.3 25.4 ❌ FAIL
int8_prepared_dot_product/prepared/768 +37.41% [+36.57%, +38.05%] 29.7 21.6 ❌ FAIL
simd_query_batch_dot_product/pair_loop/384d_16c +35.09% [+34.75%, +35.41%] 480.1 355.4 ❌ FAIL
simd_query_batch_dot_product/pair_loop/384d_4c +34.70% [+34.50%, +35.02%] 128.8 95.6 ❌ FAIL
simd_throughput_384/dot_product +32.84% [+32.53%, +33.17%] 28.7 21.6 ❌ FAIL
int8_raw_dot_product/dot_product_i8_raw/768 +29.99% [+29.42%, +30.38%] 24.4 18.8 ❌ FAIL
int4_cosine_distance/float32_simd/1024 +27.24% [+26.70%, +27.82%] 87.3 68.6 ❌ FAIL
layer_norm/4096 +26.77% [+26.09%, +27.25%] 874.0 689.5 ❌ FAIL
simd_normalized_cosine_fast_path/cosine_full/384 +26.57% [+25.75%, +27.29%] 44.2 34.9 ❌ FAIL
simd_batch_cosine/simd_batch/10 +25.92% [+25.30%, +26.51%] 429.1 340.7 ❌ FAIL
binary_cosine_distance/float32_simd/384 +25.83% [+25.01%, +26.56%] 46.4 36.9 ❌ FAIL
int4_cosine_distance/float32_simd/1536 +24.81% [+24.29%, +25.34%] 119.8 96.0 ❌ FAIL
simd_euclidean_distance/simd/1024 +24.73% [+24.24%, +25.22%] 79.6 63.8 ❌ FAIL
simd_squared_euclidean_fast_path/euclidean_full/384 +24.63% [+24.03%, +25.22%] 35.2 28.3 ❌ FAIL
int8_raw_dot_product/dot_product_i8_raw/1024 +23.90% [+23.50%, +24.31%] 30.8 24.9 ❌ FAIL
simd_squared_euclidean_fast_path/euclidean_full/1024 +23.53% [+23.05%, +24.04%] 79.7 64.6 ❌ FAIL
int8_batch_cosine/float32_simd/10 +23.85% [+22.99%, +24.59%] 438.4 353.9 ❌ FAIL
int8_vs_float32_cosine/int8/384 +23.28% [+22.87%, +23.69%] 18.2 14.8 ❌ FAIL
int8_batch_cosine/int8_loop/1000 +23.33% [+22.46%, +24.20%] 20409.0 16548.2 ❌ FAIL
int8_vs_float32_cosine/float32_simd/1536 +22.95% [+22.21%, +23.64%] 118.6 96.5 ❌ FAIL
simd_batch_cosine_normalized_query/simd_batch/384d_16c +22.62% [+21.87%, +23.29%] 676.8 552.0 ❌ FAIL
int8_prepared_dot_product/prepared/1024 +22.08% [+21.84%, +22.34%] 34.7 28.5 ❌ FAIL
int4_cosine_distance/float32_simd/384 +21.29% [+20.76%, +21.83%] 46.6 38.4 ❌ FAIL
simd_normalized_cosine_fast_path/cosine_full/1024 +21.26% [+20.76%, +21.76%] 82.0 67.6 ❌ FAIL
int8_vs_float32_cosine/int8/1024 +21.12% [+20.45%, +21.59%] 38.1 31.4 ❌ FAIL
simd_query_batch_dot_product/simd_batch/384d_16c +20.40% [+19.77%, +20.94%] 261.3 217.0 ❌ FAIL
simd_query_batch_dot_product/simd_batch/384d_4c +19.54% [+19.39%, +19.68%] 75.6 63.3 ❌ FAIL
int8_vs_float32_cosine/int8/1536 +19.46% [+18.56%, +20.14%] 49.5 41.5 ❌ FAIL
binary_cosine_distance/binary/384 +18.31% [+18.00%, +18.58%] 49.6 41.9 ❌ FAIL
simd_cosine_similarity/simd/1536 +18.55% [+17.79%, +19.24%] 112.4 94.8 ❌ FAIL
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c +17.96% [+17.44%, +18.49%] 659.1 558.7 ❌ FAIL
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c +16.79% [+16.26%, +17.35%] 695.5 595.5 ❌ FAIL
simd_euclidean_distance/simd/768 +16.38% [+15.83%, +16.92%] 54.6 46.9 ❌ FAIL
simd_dot_product/simd/1536 +16.08% [+15.73%, +16.35%] 94.4 81.3 ❌ FAIL
int8_batch_cosine/int8_loop/10 +16.00% [+15.64%, +16.40%] 177.7 153.2 ❌ FAIL
memory_size/search_1000_int8 +15.95% [+15.26%, +16.48%] 16717.3 14417.7 ❌ FAIL
simd_squared_euclidean_fast_path/squared_euclidean/1024 +15.44% [+14.96%, +15.93%] 75.0 65.0 ❌ FAIL
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c +15.13% [+14.69%, +15.59%] 680.9 591.4 ❌ FAIL
simd_dot_product/simd/1024 +14.40% [+14.08%, +14.74%] 64.1 56.0 ❌ FAIL
int8_raw_dot_product/dot_product_i8_raw/384 +14.40% [+13.84%, +14.97%] 13.0 11.3 ❌ FAIL
int8_vs_float32_cosine/float32_simd/384 +14.61% [+13.77%, +15.39%] 44.9 39.1 ❌ FAIL
int8_raw_dot_product/dot_product_i8/384 +13.58% [+13.14%, +14.01%] 15.4 13.6 ❌ FAIL
simd_batch_cosine_normalized_query/simd_batch/384d_4c +13.42% [+12.98%, +13.86%] 176.8 155.9 ❌ FAIL
simd_normalized_cosine_fast_path/dot_product/384 +12.97% [+12.78%, +13.11%] 28.6 25.3 ❌ FAIL
int8_batch_cosine/int8_loop/100 +13.09% [+12.48%, +13.56%] 1799.9 1591.6 ❌ FAIL
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c +12.20% [+11.83%, +12.58%] 169.7 151.3 ❌ FAIL
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c +12.33% [+11.70%, +12.98%] 183.6 163.4 ❌ FAIL
int8_prepared_dot_product/prepared/384 +11.93% [+11.38%, +12.66%] 15.8 14.1 ❌ FAIL
tier_prepared_query/int8_query_once_1000 +11.46% [+10.67%, +11.98%] 18584.2 16674.0 ❌ FAIL
tier_prepared_query/binary_query_once_1000 +10.82% [+10.53%, +11.19%] 48538.3 43801.2 ❌ FAIL
binary_cosine_distance/binary/768 +10.66% [+10.40%, +10.98%] 87.1 78.7 ❌ FAIL
int8_raw_dot_product/dot_product_i8/1024 +10.69% [+10.15%, +11.05%] 34.7 31.3 ❌ FAIL
simd_squared_euclidean_fast_path/euclidean_full/768 +10.21% [+9.80%, +10.63%] 54.5 49.5 ❌ FAIL
simd_query_batch_dot_product/simd_batch/768d_4c +10.12% [+9.40%, +10.69%] 131.7 119.6 ❌ FAIL
simd_normalized_cosine_fast_path/dot_product/1024 +9.60% [+9.32%, +9.91%] 62.7 57.2 ❌ FAIL
int8_raw_dot_product/dot_product_i8_raw/128 +9.26% [+8.56%, +9.79%] 7.2 6.6 ❌ FAIL
binary_cosine_distance/binary/1024 +8.52% [+8.23%, +8.90%] 111.9 103.1 ❌ FAIL
simd_euclidean_distance/simd/384 +8.42% [+7.83%, +8.93%] 35.3 32.6 ❌ FAIL
simd_query_batch_dot_product/simd_batch/128d_64c +8.11% [+7.51%, +8.88%] 501.6 464.0 ❌ FAIL
simd_cosine_similarity/simd/384 +7.71% [+7.14%, +8.31%] 45.1 41.8 ❌ FAIL
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c +7.55% [+7.12%, +7.96%] 179.2 166.6 ❌ FAIL
binary_cosine_distance/binary/1536 +6.53% [+6.27%, +6.74%] 161.8 151.9 ⚠ WARN
int8_prepared_dot_product/prepared/129 +6.64% [+6.10%, +7.03%] 11.0 10.3 ⚠ WARN
simd_squared_euclidean_fast_path/squared_euclidean/384 +6.44% [+5.95%, +6.90%] 30.3 28.5 ⚠ WARN
simd_query_batch_dot_product/simd_batch/128d_4c +6.38% [+5.95%, +6.78%] 42.2 39.6 ⚠ WARN
int8_raw_dot_product/dot_product_i8/128 +5.06% [+4.83%, +5.31%] 9.8 9.3 ⚠ WARN
int8_raw_dot_product/dot_product_i8/129 +5.52% [+4.80%, +6.19%] 10.4 9.8 ⚠ WARN
int8_raw_dot_product/dot_product_i8_raw/129 +4.45% [+4.14%, +4.74%] 7.6 7.2 ⚠ WARN
int8_prepared_dot_product/prepared/128 +4.90% [+4.07%, +5.53%] 10.0 9.5 ⚠ WARN
int8_vs_float32_cosine/float32_simd/768 -3.27% [-3.71%, -2.80%] 67.2 69.4 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c -3.51% [-3.89%, -3.10%] 2630.8 2726.3 🚀 WIN
simd_query_batch_dot_product/simd_batch/384d_256c -3.84% [-4.06%, -3.55%] 5376.3 5590.7 🚀 WIN
simd_cosine_similarity/simd/1024 -4.44% [-4.77%, -4.11%] 82.1 85.9 🚀 WIN
silu_inplace/896 -4.19% [-4.77%, -3.53%] 2711.4 2829.8 🚀 WIN
tier_prepared_query/binary_query_per_call_1000 -4.58% [-4.80%, -4.37%] 883183.9 925593.6 🚀 WIN
int8_vs_float32_cosine/float32_simd/1024 -5.49% [-5.86%, -5.10%] 82.6 87.4 🚀 WIN
add_bias_gelu/896 -6.20% [-6.59%, -5.82%] 378.0 403.0 🚀 WIN
binary_cosine_distance/float32_simd/1536 -6.25% [-6.68%, -5.87%] 119.9 127.9 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c -6.70% [-7.08%, -6.32%] 262.3 281.1 🚀 WIN
silu_inplace/4096 -5.64% [-7.25%, -4.31%] 12444.1 13188.4 🚀 WIN
simd_query_batch_dot_product/simd_batch/768d_16c -7.97% [-8.29%, -7.62%] 554.0 602.0 🚀 WIN
simd_dot_product/simd/384 -8.36% [-8.50%, -8.21%] 28.7 31.3 🚀 WIN
simd_throughput_384/normalize -8.57% [-9.00%, -8.20%] 106.0 115.9 🚀 WIN
int8_batch_cosine/float32_simd/100 -8.89% [-9.23%, -8.55%] 4387.7 4815.8 🚀 WIN
simd_normalize/simd/384 -6.12% [-9.55%, -2.61%] 70.9 75.6 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c -9.36% [-9.62%, -9.09%] 270.2 298.0 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_4c -9.88% [-9.95%, -9.81%] 195.4 216.8 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c -10.35% [-10.86%, -9.84%] 333.2 371.7 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c -10.63% [-10.96%, -10.27%] 324.7 363.3 🚀 WIN
int8_prepared_dot_product/prepared/127 -10.71% [-11.00%, -10.52%] 17.5 19.6 🚀 WIN
simd_dot_product/simd/768 -11.02% [-11.18%, -10.82%] 49.5 55.6 🚀 WIN
int8_raw_dot_product/dot_product_i8/127 -11.45% [-11.92%, -11.01%] 17.2 19.4 🚀 WIN
simd_dot_product/scalar/384 -12.08% [-12.20%, -11.96%] 337.9 384.3 🚀 WIN
simd_batch_cosine/simd_batch/100 -12.07% [-12.67%, -11.55%] 4371.2 4971.3 🚀 WIN
simd_normalized_cosine_fast_path/dot_product/768 -12.97% [-13.17%, -12.75%] 49.4 56.8 🚀 WIN
simd_query_batch_dot_product/simd_batch/128d_256c -12.89% [-13.47%, -12.10%] 2038.6 2340.3 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_4c -13.28% [-13.53%, -13.00%] 323.7 373.3 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_16c -13.22% [-13.53%, -12.90%] 1076.9 1240.9 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_256c -13.40% [-13.69%, -13.13%] 18127.9 20932.2 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 -12.75% [-13.79%, -11.91%] 45318.5 51938.8 🚀 WIN
simd_cosine_similarity/scalar/384 -13.83% [-14.10%, -13.54%] 996.7 1156.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c -13.92% [-14.31%, -13.54%] 1089.8 1266.1 🚀 WIN
simd_batch_dot_product/scalar_loop/100 -14.25% [-14.41%, -14.07%] 33063.9 38560.1 🚀 WIN
memory_size/search_1000_float32 -14.21% [-14.51%, -13.90%] 43508.7 50716.0 🚀 WIN
simd_batch_cosine/scalar_loop/10 -14.47% [-14.57%, -14.39%] 9882.7 11554.3 🚀 WIN
simd_batch_dot_product/scalar_loop/10 -14.54% [-14.59%, -14.48%] 3293.3 3853.6 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c -14.32% [-14.62%, -14.05%] 42002.8 49025.1 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c -14.52% [-14.84%, -14.22%] 332.3 388.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c -14.92% [-15.22%, -14.57%] 18178.7 21365.4 🚀 WIN
simd_euclidean_distance/scalar/384 -15.10% [-15.32%, -14.86%] 344.4 405.6 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c -15.28% [-15.55%, -15.02%] 43635.3 51507.5 🚀 WIN
simd_batch_cosine/scalar_loop/1000 -15.53% [-15.67%, -15.44%] 990843.3 1173058.3 🚀 WIN
simd_dot_product/scalar/768 -15.59% [-15.77%, -15.32%] 697.6 826.4 🚀 WIN
simd_euclidean_distance/simd/1536 -15.29% [-15.80%, -14.82%] 95.6 112.9 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c -15.49% [-15.83%, -15.17%] 44257.8 52370.4 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -15.56% [-15.97%, -15.09%] 17404.6 20612.3 🚀 WIN
simd_batch_cosine/scalar_loop/100 -15.25% [-16.11%, -14.56%] 98761.0 116538.8 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_64c -16.22% [-16.51%, -15.92%] 4276.5 5104.5 🚀 WIN
simd_cosine_similarity/scalar/768 -16.34% [-16.51%, -16.24%] 2078.7 2484.8 🚀 WIN
simd_query_batch_dot_product/pair_loop/128d_256c -16.41% [-16.59%, -16.18%] 3586.3 4290.4 🚀 WIN
tier_prepared_query/int4_query_per_call_1000 -16.48% [-16.70%, -16.26%] 3618804.6 4332724.7 🚀 WIN
simd_batch_dot_product/scalar_loop/1000 -16.80% [-16.99%, -16.66%] 337678.9 405845.5 🚀 WIN
simd_euclidean_distance/scalar/768 -17.08% [-17.13%, -17.00%] 703.0 847.8 🚀 WIN
simd_dot_product/scalar/1024 -17.15% [-17.28%, -17.07%] 934.8 1128.4 🚀 WIN
simd_cosine_similarity/scalar/1024 -17.31% [-17.46%, -17.21%] 2796.4 3381.7 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c -17.14% [-17.57%, -16.78%] 17590.6 21230.2 🚀 WIN
simd_normalize/scalar/1536 -17.29% [-17.59%, -17.08%] 1574.0 1903.0 🚀 WIN
simd_normalize/scalar/1024 -17.15% [-17.72%, -16.70%] 1057.1 1276.0 🚀 WIN
simd_normalize/scalar/768 -17.60% [-17.92%, -17.32%] 788.2 956.5 🚀 WIN
simd_cosine_similarity/scalar/1536 -17.85% [-17.94%, -17.79%] 4230.4 5149.7 🚀 WIN
simd_normalize/scalar/384 -17.76% [-18.04%, -17.50%] 396.5 482.2 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c -17.97% [-18.08%, -17.88%] 7906.1 9637.9 🚀 WIN
simd_euclidean_distance/scalar/1024 -17.90% [-18.13%, -17.77%] 941.3 1146.6 🚀 WIN
simd_euclidean_distance/scalar/1536 -18.20% [-18.30%, -18.13%] 1420.3 1736.3 🚀 WIN
simd_dot_product/scalar/1536 -18.00% [-18.49%, -17.67%] 1413.9 1724.4 🚀 WIN
int8_prepared_dot_product/per_call/129 -17.43% [-18.52%, -16.05%] 755.2 914.5 🚀 WIN
int8_prepared_dot_product/per_call/128 -18.39% [-18.65%, -18.14%] 741.5 908.6 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/384d_1000c -18.46% [-18.72%, -18.24%] 43349.7 53164.1 🚀 WIN
int8_prepared_dot_product/per_call/127 -18.35% [-18.79%, -17.74%] 742.4 909.2 🚀 WIN
int8_quantization/quantize/1024 -18.48% [-18.83%, -18.23%] 5874.1 7205.9 🚀 WIN
tier_prepared_query/int8_query_per_call_1000 -18.75% [-18.83%, -18.69%] 2202059.3 2710291.2 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c -18.70% [-18.83%, -18.51%] 15062.9 18528.5 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c -18.63% [-18.86%, -18.40%] 1048.4 1288.5 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c -18.61% [-18.89%, -18.34%] 4333.7 5324.6 🚀 WIN
int8_quantization/quantize/768 -18.69% [-19.00%, -18.41%] 4390.7 5399.9 🚀 WIN
simd_query_batch_dot_product/simd_batch/384d_64c -19.04% [-19.33%, -18.81%] 1127.7 1392.9 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c -19.26% [-19.62%, -18.93%] 4126.0 5109.9 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c -19.86% [-19.93%, -19.78%] 868.6 1083.8 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c -19.65% [-19.98%, -19.31%] 22202.7 27632.4 🚀 WIN
int8_quantization/quantize/384 -19.80% [-20.44%, -19.26%] 2158.2 2691.1 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c -20.54% [-20.90%, -20.18%] 21598.5 27182.4 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c -20.70% [-20.96%, -20.41%] 1911.6 2410.6 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c -21.21% [-21.53%, -20.87%] 1065.5 1352.3 🚀 WIN
int8_prepared_dot_product/per_call/768 -21.33% [-22.08%, -20.41%] 4287.3 5449.9 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c -22.23% [-22.46%, -22.00%] 4224.7 5432.2 🚀 WIN
simd_query_batch_dot_product/simd_batch/768d_64c -22.41% [-22.52%, -22.23%] 2152.1 2773.7 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_256c -22.68% [-22.93%, -22.41%] 21710.3 28077.7 🚀 WIN
int8_prepared_dot_product/per_call/384 -22.25% [-22.94%, -21.59%] 2105.5 2708.1 🚀 WIN
int8_prepared_dot_product/per_call/1024 -22.36% [-23.02%, -21.65%] 5607.8 7223.2 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c -22.87% [-23.13%, -22.59%] 5181.9 6718.4 🚀 WIN
gelu/4096 -15.76% [-23.70%, -9.00%] 1641.5 1948.6 🚀 WIN
int8_raw_dot_product/dot_product_i8_raw/127 -23.33% [-23.73%, -22.99%] 13.6 17.7 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c -23.53% [-23.84%, -23.19%] 1300.4 1700.6 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c -24.04% [-24.27%, -23.79%] 21947.2 28894.7 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_16c -24.31% [-24.56%, -24.07%] 1308.1 1728.3 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c -24.44% [-24.71%, -24.12%] 18862.0 24963.9 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c -24.46% [-24.72%, -24.07%] 3480.1 4607.2 🚀 WIN
elementwise_mul/4096 -22.65% [-24.72%, -21.08%] 245.2 317.0 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c -25.03% [-25.29%, -24.79%] 5253.9 7008.0 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c -25.49% [-25.81%, -25.16%] 1317.0 1767.6 🚀 WIN
simd_batch_dot_product/simd_batch/100 -25.82% [-25.95%, -25.66%] 3337.8 4499.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c -25.76% [-25.97%, -25.51%] 1322.4 1781.3 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_64c -25.62% [-26.03%, -25.25%] 5137.3 6906.8 🚀 WIN
int8_quantization/quantize/1536 -25.82% [-26.06%, -25.58%] 8051.8 10855.1 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c -25.84% [-26.10%, -25.52%] 235.8 317.9 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c -26.24% [-26.54%, -25.93%] 5221.4 7079.1 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_256c -29.56% [-29.69%, -29.42%] 12981.7 18429.3 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_16c -29.61% [-29.72%, -29.47%] 762.0 1082.5 🚀 WIN
simd_normalize/simd/768 -28.56% [-30.91%, -25.96%] 123.6 172.9 🚀 WIN
softmax_attention/128 -30.60% [-30.92%, -30.18%] 4144.6 5971.8 🚀 WIN
simd_normalize/simd/1024 -28.21% [-30.97%, -25.26%] 160.1 223.1 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 -33.93% [-34.12%, -33.75%] 72282.8 109404.3 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c -33.99% [-34.21%, -33.78%] 72706.6 110146.7 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c -34.08% [-34.70%, -33.16%] 32237.3 48905.1 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_64c -34.86% [-34.98%, -34.70%] 2973.8 4565.4 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_1000c -35.11% [-35.27%, -34.94%] 71072.6 109521.4 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 -35.53% [-35.78%, -35.23%] 90228.0 139960.5 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c -35.82% [-36.02%, -35.62%] 980.2 1527.4 🚀 WIN
simd_prepared_query_normalized_cosine/dot_product_loop/384 -36.00% [-36.13%, -35.84%] 30082.5 47004.4 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 -37.34% [-37.47%, -37.23%] 28590.9 45626.2 🚀 WIN
simd_normalize/simd/1536 -35.68% [-37.67%, -33.63%] 213.4 331.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c -38.05% [-38.16%, -37.89%] 3800.9 6135.2 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c -38.54% [-38.70%, -38.37%] 69900.0 113729.7 🚀 WIN
softmax_attention/512 -38.50% [-38.95%, -38.21%] 60835.8 98919.8 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c -39.09% [-39.27%, -38.93%] 68453.2 112390.4 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c -39.75% [-39.92%, -39.58%] 86544.2 143631.9 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c -40.51% [-40.64%, -40.38%] 85159.0 143136.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c -40.70% [-40.81%, -40.59%] 60556.6 102119.2 🚀 WIN
simd_batch_cosine/simd_batch/1000 -45.09% [-45.30%, -44.85%] 56613.5 103101.3 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 -46.73% [-46.83%, -46.61%] 54533.0 102362.5 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c -46.80% [-47.05%, -46.44%] 86601.8 162789.4 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c -47.60% [-47.71%, -47.50%] 85201.9 162612.4 🚀 WIN
simd_prepared_query_normalized_cosine/dot_product_loop/768 -48.83% [-48.96%, -48.66%] 51297.2 100239.1 🚀 WIN
int8_batch_cosine/float32_simd/1000 -50.11% [-50.28%, -49.92%] 52928.5 106097.9 🚀 WIN
simd_prepared_query_normalized_cosine/dot_product_loop/1024 -52.05% [-52.21%, -51.89%] 65455.4 136498.7 🚀 WIN
simd_batch_dot_product/simd_batch/1000 -54.08% [-54.21%, -53.93%] 45092.7 98205.1 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c -54.42% [-54.84%, -54.02%] 65814.1 144397.5 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 -55.38% [-55.48%, -55.21%] 62917.5 140993.5 🚀 WIN
rms_norm/896 -97.45% [-97.48%, -97.42%] 206.4 8103.3 🚀 WIN
rms_norm/4096 -97.73% [-97.81%, -97.63%] 768.9 33940.0 🚀 WIN
All 247 measurements
Bench Δ point CI-lower CI-upper
add_bias_gelu/4096 -0.82% -1.02% -0.67%
add_bias_gelu/896 -6.20% -6.59% -5.82%
binary_cosine_distance/binary/1024 +8.52% +8.23% +8.90%
binary_cosine_distance/binary/1536 +6.53% +6.27% +6.74%
binary_cosine_distance/binary/384 +18.31% +18.00% +18.58%
binary_cosine_distance/binary/768 +10.66% +10.40% +10.98%
binary_cosine_distance/float32_simd/1024 -2.56% -2.88% -2.21%
binary_cosine_distance/float32_simd/1536 -6.25% -6.68% -5.87%
binary_cosine_distance/float32_simd/384 +25.83% +25.01% +26.56%
binary_cosine_distance/float32_simd/768 +0.41% +0.02% +0.82%
elementwise_mul/4096 -22.65% -24.72% -21.08%
gelu/4096 -15.76% -23.70% -9.00%
gelu/896 -1.27% -3.70% +0.57%
int4_cosine_distance/float32_simd/1024 +27.24% +26.70% +27.82%
int4_cosine_distance/float32_simd/1536 +24.81% +24.29% +25.34%
int4_cosine_distance/float32_simd/384 +21.29% +20.76% +21.83%
int4_cosine_distance/float32_simd/768 +0.37% -0.11% +0.85%
int4_cosine_distance/int4/1024 +0.62% -0.08% +1.26%
int4_cosine_distance/int4/1536 +0.12% -0.00% +0.26%
int4_cosine_distance/int4/384 -2.15% -2.89% -1.43%
int4_cosine_distance/int4/768 -1.82% -2.64% -1.05%
int8_batch_cosine/float32_simd/10 +23.85% +22.99% +24.59%
int8_batch_cosine/float32_simd/100 -8.89% -9.23% -8.55%
int8_batch_cosine/float32_simd/1000 -50.11% -50.28% -49.92%
int8_batch_cosine/int8_loop/10 +16.00% +15.64% +16.40%
int8_batch_cosine/int8_loop/100 +13.09% +12.48% +13.56%
int8_batch_cosine/int8_loop/1000 +23.33% +22.46% +24.20%
int8_prepared_dot_product/per_call/1024 -22.36% -23.02% -21.65%
int8_prepared_dot_product/per_call/127 -18.35% -18.79% -17.74%
int8_prepared_dot_product/per_call/128 -18.39% -18.65% -18.14%
int8_prepared_dot_product/per_call/129 -17.43% -18.52% -16.05%
int8_prepared_dot_product/per_call/384 -22.25% -22.94% -21.59%
int8_prepared_dot_product/per_call/768 -21.33% -22.08% -20.41%
int8_prepared_dot_product/prepared/1024 +22.08% +21.84% +22.34%
int8_prepared_dot_product/prepared/127 -10.71% -11.00% -10.52%
int8_prepared_dot_product/prepared/128 +4.90% +4.07% +5.53%
int8_prepared_dot_product/prepared/129 +6.64% +6.10% +7.03%
int8_prepared_dot_product/prepared/384 +11.93% +11.38% +12.66%
int8_prepared_dot_product/prepared/768 +37.41% +36.57% +38.05%
int8_quantization/quantize/1024 -18.48% -18.83% -18.23%
int8_quantization/quantize/1536 -25.82% -26.06% -25.58%
int8_quantization/quantize/384 -19.80% -20.44% -19.26%
int8_quantization/quantize/768 -18.69% -19.00% -18.41%
int8_raw_dot_product/dot_product_i8/1024 +10.69% +10.15% +11.05%
int8_raw_dot_product/dot_product_i8/127 -11.45% -11.92% -11.01%
int8_raw_dot_product/dot_product_i8/128 +5.06% +4.83% +5.31%
int8_raw_dot_product/dot_product_i8/129 +5.52% +4.80% +6.19%
int8_raw_dot_product/dot_product_i8/384 +13.58% +13.14% +14.01%
int8_raw_dot_product/dot_product_i8/768 +43.67% +43.41% +43.94%
int8_raw_dot_product/dot_product_i8_raw/1024 +23.90% +23.50% +24.31%
int8_raw_dot_product/dot_product_i8_raw/127 -23.33% -23.73% -22.99%
int8_raw_dot_product/dot_product_i8_raw/128 +9.26% +8.56% +9.79%
int8_raw_dot_product/dot_product_i8_raw/129 +4.45% +4.14% +4.74%
int8_raw_dot_product/dot_product_i8_raw/384 +14.40% +13.84% +14.97%
int8_raw_dot_product/dot_product_i8_raw/768 +29.99% +29.42% +30.38%
int8_vs_float32_cosine/float32_simd/1024 -5.49% -5.86% -5.10%
int8_vs_float32_cosine/float32_simd/1536 +22.95% +22.21% +23.64%
int8_vs_float32_cosine/float32_simd/384 +14.61% +13.77% +15.39%
int8_vs_float32_cosine/float32_simd/768 -3.27% -3.71% -2.80%
int8_vs_float32_cosine/int8/1024 +21.12% +20.45% +21.59%
int8_vs_float32_cosine/int8/1536 +19.46% +18.56% +20.14%
int8_vs_float32_cosine/int8/384 +23.28% +22.87% +23.69%
int8_vs_float32_cosine/int8/768 +39.18% +38.44% +39.72%
layer_norm/4096 +26.77% +26.09% +27.25%
layer_norm/896 +0.13% -0.31% +0.50%
memory_size/search_1000_float32 -14.21% -14.51% -13.90%
memory_size/search_1000_int8 +15.95% +15.26% +16.48%
rms_norm/4096 -97.73% -97.81% -97.63%
rms_norm/896 -97.45% -97.48% -97.42%
silu_inplace/4096 -5.64% -7.25% -4.31%
silu_inplace/896 -4.19% -4.77% -3.53%
simd_batch_cosine/scalar_loop/10 -14.47% -14.57% -14.39%
simd_batch_cosine/scalar_loop/100 -15.25% -16.11% -14.56%
simd_batch_cosine/scalar_loop/1000 -15.53% -15.67% -15.44%
simd_batch_cosine/simd_batch/10 +25.92% +25.30% +26.51%
simd_batch_cosine/simd_batch/100 -12.07% -12.67% -11.55%
simd_batch_cosine/simd_batch/1000 -45.09% -45.30% -44.85%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c -39.75% -39.92% -39.58%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c -25.49% -25.81% -25.16%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c -19.65% -19.98% -19.31%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c -10.35% -10.86% -9.84%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c -25.03% -25.29% -24.79%
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c -15.28% -15.55% -15.02%
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c +15.13% +14.69% +15.59%
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c +1.59% +1.16% +2.07%
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c +7.55% +7.12% +7.96%
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c -1.96% -2.32% -1.59%
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c -38.54% -38.70% -38.37%
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c -21.21% -21.53% -20.87%
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c -17.14% -17.57% -16.78%
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c -9.36% -9.62% -9.09%
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c -22.23% -22.46% -22.00%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c -40.51% -40.64% -40.38%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c -23.53% -23.84% -23.19%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c -20.54% -20.90% -20.18%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c -10.63% -10.96% -10.27%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c -22.87% -23.13% -22.59%
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c -14.32% -14.62% -14.05%
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c +17.96% +17.44% +18.49%
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c -0.84% -1.20% -0.48%
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c +12.20% +11.83% +12.58%
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c -3.51% -3.89% -3.10%
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c -39.09% -39.27% -38.93%
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c -18.63% -18.86% -18.40%
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -15.56% -15.97% -15.09%
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c -6.70% -7.08% -6.32%
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c -19.26% -19.62% -18.93%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c -46.80% -47.05% -46.44%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c -25.76% -25.97% -25.51%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c -24.04% -24.27% -23.79%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c -14.52% -14.84% -14.22%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c -26.24% -26.54% -25.93%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c -15.49% -15.83% -15.17%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c +16.79% +16.26% +17.35%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c +3.11% +2.63% +3.57%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c +12.33% +11.70% +12.98%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c -0.02% -1.08% +1.31%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c -33.99% -34.21% -33.78%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c -13.92% -14.31% -13.54%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c -14.92% -15.22% -14.57%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c -2.76% -3.16% -2.40%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c -18.61% -18.89% -18.34%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c -54.42% -54.84% -54.02%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c -35.82% -36.02% -35.62%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c -24.44% -24.71% -24.12%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c -25.84% -26.10% -25.52%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c -38.05% -38.16% -37.89%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c -34.08% -34.70% -33.16%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c -2.89% -3.19% -2.54%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c -17.97% -18.08% -17.88%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c +2.64% +2.41% +2.78%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c -20.70% -20.96% -20.41%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c -40.70% -40.81% -40.59%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c -19.86% -19.93% -19.78%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c -18.70% -18.83% -18.51%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c +0.12% -0.10% +0.31%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c -24.46% -24.72% -24.07%
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c -47.60% -47.71% -47.50%
simd_batch_cosine_normalized_query/simd_batch/1024d_16c -24.31% -24.56% -24.07%
simd_batch_cosine_normalized_query/simd_batch/1024d_256c -22.68% -22.93% -22.41%
simd_batch_cosine_normalized_query/simd_batch/1024d_4c -13.28% -13.53% -13.00%
simd_batch_cosine_normalized_query/simd_batch/1024d_64c -25.62% -26.03% -25.25%
simd_batch_cosine_normalized_query/simd_batch/384d_1000c -18.46% -18.72% -18.24%
simd_batch_cosine_normalized_query/simd_batch/384d_16c +22.62% +21.87% +23.29%
simd_batch_cosine_normalized_query/simd_batch/384d_256c +0.16% -0.23% +0.59%
simd_batch_cosine_normalized_query/simd_batch/384d_4c +13.42% +12.98% +13.86%
simd_batch_cosine_normalized_query/simd_batch/384d_64c -2.74% -3.10% -2.38%
simd_batch_cosine_normalized_query/simd_batch/768d_1000c -35.11% -35.27% -34.94%
simd_batch_cosine_normalized_query/simd_batch/768d_16c -13.22% -13.53% -12.90%
simd_batch_cosine_normalized_query/simd_batch/768d_256c -13.40% -13.69% -13.13%
simd_batch_cosine_normalized_query/simd_batch/768d_4c +3.05% +2.37% +3.67%
simd_batch_cosine_normalized_query/simd_batch/768d_64c -16.22% -16.51% -15.92%
simd_batch_dot_product/scalar_loop/10 -14.54% -14.59% -14.48%
simd_batch_dot_product/scalar_loop/100 -14.25% -14.41% -14.07%
simd_batch_dot_product/scalar_loop/1000 -16.80% -16.99% -16.66%
simd_batch_dot_product/simd_batch/10 -0.04% -0.71% +0.50%
simd_batch_dot_product/simd_batch/100 -25.82% -25.95% -25.66%
simd_batch_dot_product/simd_batch/1000 -54.08% -54.21% -53.93%
simd_cosine_similarity/scalar/1024 -17.31% -17.46% -17.21%
simd_cosine_similarity/scalar/1536 -17.85% -17.94% -17.79%
simd_cosine_similarity/scalar/384 -13.83% -14.10% -13.54%
simd_cosine_similarity/scalar/768 -16.34% -16.51% -16.24%
simd_cosine_similarity/simd/1024 -4.44% -4.77% -4.11%
simd_cosine_similarity/simd/1536 +18.55% +17.79% +19.24%
simd_cosine_similarity/simd/384 +7.71% +7.14% +8.31%
simd_cosine_similarity/simd/768 +3.02% +2.62% +3.42%
simd_dot_product/scalar/1024 -17.15% -17.28% -17.07%
simd_dot_product/scalar/1536 -18.00% -18.49% -17.67%
simd_dot_product/scalar/384 -12.08% -12.20% -11.96%
simd_dot_product/scalar/768 -15.59% -15.77% -15.32%
simd_dot_product/simd/1024 +14.40% +14.08% +14.74%
simd_dot_product/simd/1536 +16.08% +15.73% +16.35%
simd_dot_product/simd/384 -8.36% -8.50% -8.21%
simd_dot_product/simd/768 -11.02% -11.18% -10.82%
simd_euclidean_distance/scalar/1024 -17.90% -18.13% -17.77%
simd_euclidean_distance/scalar/1536 -18.20% -18.30% -18.13%
simd_euclidean_distance/scalar/384 -15.10% -15.32% -14.86%
simd_euclidean_distance/scalar/768 -17.08% -17.13% -17.00%
simd_euclidean_distance/simd/1024 +24.73% +24.24% +25.22%
simd_euclidean_distance/simd/1536 -15.29% -15.80% -14.82%
simd_euclidean_distance/simd/384 +8.42% +7.83% +8.93%
simd_euclidean_distance/simd/768 +16.38% +15.83% +16.92%
simd_normalize/scalar/1024 -17.15% -17.72% -16.70%
simd_normalize/scalar/1536 -17.29% -17.59% -17.08%
simd_normalize/scalar/384 -17.76% -18.04% -17.50%
simd_normalize/scalar/768 -17.60% -17.92% -17.32%
simd_normalize/simd/1024 -28.21% -30.97% -25.26%
simd_normalize/simd/1536 -35.68% -37.67% -33.63%
simd_normalize/simd/384 -6.12% -9.55% -2.61%
simd_normalize/simd/768 -28.56% -30.91% -25.96%
simd_normalized_cosine_fast_path/cosine_full/1024 +21.26% +20.76% +21.76%
simd_normalized_cosine_fast_path/cosine_full/384 +26.57% +25.75% +27.29%
simd_normalized_cosine_fast_path/cosine_full/768 +0.17% -0.27% +0.62%
simd_normalized_cosine_fast_path/dot_product/1024 +9.60% +9.32% +9.91%
simd_normalized_cosine_fast_path/dot_product/384 +12.97% +12.78% +13.11%
simd_normalized_cosine_fast_path/dot_product/768 -12.97% -13.17% -12.75%
simd_prepared_query_normalized_cosine/dot_product_loop/1024 -52.05% -52.21% -51.89%
simd_prepared_query_normalized_cosine/dot_product_loop/384 -36.00% -36.13% -35.84%
simd_prepared_query_normalized_cosine/dot_product_loop/768 -48.83% -48.96% -48.66%
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 -35.53% -35.78% -35.23%
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 -12.75% -13.79% -11.91%
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 -33.93% -34.12% -33.75%
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 -55.38% -55.48% -55.21%
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 -37.34% -37.47% -37.23%
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 -46.73% -46.83% -46.61%
simd_query_batch_dot_product/pair_loop/128d_16c +1.89% +1.71% +2.09%
simd_query_batch_dot_product/pair_loop/128d_256c -16.41% -16.59% -16.18%
simd_query_batch_dot_product/pair_loop/128d_4c +0.53% +0.05% +1.10%
simd_query_batch_dot_product/pair_loop/128d_64c +1.20% +0.99% +1.37%
simd_query_batch_dot_product/pair_loop/384d_16c +35.09% +34.75% +35.41%
simd_query_batch_dot_product/pair_loop/384d_256c +3.35% +2.96% +3.80%
simd_query_batch_dot_product/pair_loop/384d_4c +34.70% +34.50% +35.02%
simd_query_batch_dot_product/pair_loop/384d_64c -0.71% -0.87% -0.53%
simd_query_batch_dot_product/pair_loop/768d_16c -29.61% -29.72% -29.47%
simd_query_batch_dot_product/pair_loop/768d_256c -29.56% -29.69% -29.42%
simd_query_batch_dot_product/pair_loop/768d_4c -9.88% -9.95% -9.81%
simd_query_batch_dot_product/pair_loop/768d_64c -34.86% -34.98% -34.70%
simd_query_batch_dot_product/simd_batch/128d_16c +2.92% +2.65% +3.23%
simd_query_batch_dot_product/simd_batch/128d_256c -12.89% -13.47% -12.10%
simd_query_batch_dot_product/simd_batch/128d_4c +6.38% +5.95% +6.78%
simd_query_batch_dot_product/simd_batch/128d_64c +8.11% +7.51% +8.88%
simd_query_batch_dot_product/simd_batch/384d_16c +20.40% +19.77% +20.94%
simd_query_batch_dot_product/simd_batch/384d_256c -3.84% -4.06% -3.55%
simd_query_batch_dot_product/simd_batch/384d_4c +19.54% +19.39% +19.68%
simd_query_batch_dot_product/simd_batch/384d_64c -19.04% -19.33% -18.81%
simd_query_batch_dot_product/simd_batch/768d_16c -7.97% -8.29% -7.62%
simd_query_batch_dot_product/simd_batch/768d_256c +2.41% +2.04% +2.70%
simd_query_batch_dot_product/simd_batch/768d_4c +10.12% +9.40% +10.69%
simd_query_batch_dot_product/simd_batch/768d_64c -22.41% -22.52% -22.23%
simd_squared_euclidean_fast_path/euclidean_full/1024 +23.53% +23.05% +24.04%
simd_squared_euclidean_fast_path/euclidean_full/384 +24.63% +24.03% +25.22%
simd_squared_euclidean_fast_path/euclidean_full/768 +10.21% +9.80% +10.63%
simd_squared_euclidean_fast_path/squared_euclidean/1024 +15.44% +14.96% +15.93%
simd_squared_euclidean_fast_path/squared_euclidean/384 +6.44% +5.95% +6.90%
simd_squared_euclidean_fast_path/squared_euclidean/768 +0.13% -0.28% +0.51%
simd_throughput_384/cosine_similarity +43.30% +42.68% +43.84%
simd_throughput_384/dot_product +32.84% +32.53% +33.17%
simd_throughput_384/euclidean_distance +38.77% +38.27% +39.30%
simd_throughput_384/normalize -8.57% -9.00% -8.20%
softmax_attention/128 -30.60% -30.92% -30.18%
softmax_attention/512 -38.50% -38.95% -38.21%
tier_prepared_query/binary_query_once_1000 +10.82% +10.53% +11.19%
tier_prepared_query/binary_query_per_call_1000 -4.58% -4.80% -4.37%
tier_prepared_query/int4_query_once_1000 +1.06% +0.20% +1.92%
tier_prepared_query/int4_query_per_call_1000 -16.48% -16.70% -16.26%
tier_prepared_query/int8_query_once_1000 +11.46% +10.67% +11.98%
tier_prepared_query/int8_query_per_call_1000 -18.75% -18.83% -18.69%

Rule: CI-lower of change ≤3.0% passes silently; (3.0%, 7.0%] warns; >7.0% fails. Override via PR label bench-allow-regression.

Gate is in advisory mode (Rollout step 3, ADR-058 §Rollout). Failures do not block merge for the first 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant