Skip to content

Add BLAS benchmark: OpenBLAS vs Apple Accelerate on macOS arm64 (#4995)#4995

Closed
alibeklfc wants to merge 2 commits intofacebookresearch:mainfrom
alibeklfc:export-D98331330
Closed

Add BLAS benchmark: OpenBLAS vs Apple Accelerate on macOS arm64 (#4995)#4995
alibeklfc wants to merge 2 commits intofacebookresearch:mainfrom
alibeklfc:export-D98331330

Conversation

@alibeklfc
Copy link
Copy Markdown
Contributor

@alibeklfc alibeklfc commented Mar 26, 2026

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

Benchmark workloads (BLAS-heavy):

  • IndexFlatL2 / IndexFlatIP search (pure sgemm)
  • IVF training (k-means)
  • PQ training (codebook learning)
  • HNSW search (control, not BLAS-bound)

Workflow (bench-blas-macos.yml):

  • Manual trigger only (workflow_dispatch)
  • Runs on macos-14 (M1 Apple Silicon)
  • Builds faiss-cpu twice via pip install . with different BLA_VENDOR
  • Uses FAISS_OPT_LEVEL=generic to speed up builds (SIMD irrelevant for BLAS)
  • Prints comparison table with median times and speedup ratios
  • Uploads JSON results as artifacts

New files:

  • benchs/bench_blas_macos.py: benchmark script
  • .github/workflows/bench-blas-macos.yml: CI workflow

Differential Revision: D98331330

@meta-cla meta-cla bot added the CLA Signed label Mar 26, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Mar 26, 2026

@alibeklfc has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98331330.

@alibeklfc alibeklfc marked this pull request as draft March 26, 2026 17:59
@meta-codesync meta-codesync bot changed the title Add BLAS benchmark: OpenBLAS vs Apple Accelerate on macOS arm64 Add BLAS benchmark: OpenBLAS vs Apple Accelerate on macOS arm64 (#4995) Mar 26, 2026
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Mar 26, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
@alibeklfc alibeklfc force-pushed the export-D98331330 branch 2 times, most recently from 9aa2ffa to 6ecdac5 Compare March 26, 2026 18:10
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Mar 26, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Mar 26, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
@alibeklfc alibeklfc force-pushed the export-D98331330 branch 2 times, most recently from 8031be5 to 804e783 Compare March 26, 2026 19:03
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Mar 26, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Mar 26, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Mar 27, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
@alibeklfc alibeklfc force-pushed the export-D98331330 branch 2 times, most recently from 3d89756 to 650623c Compare March 27, 2026 23:56
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Mar 27, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Mar 28, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Apr 2, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Apr 2, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
@alibeklfc alibeklfc force-pushed the export-D98331330 branch 2 times, most recently from f5e98cb to 2fa7155 Compare April 2, 2026 20:19
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Apr 2, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Apr 2, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Apr 2, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Apr 2, 2026
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
…kresearch#4862)

Summary:

Add official pip wheel packaging for `faiss-cpu`, enabling
`pip install faiss-cpu` from PyPI. Builds binary wheels across Linux
x86_64/aarch64, macOS arm64/x86_64, and Windows x86_64 for Python 3.10-3.14.
Linux and macOS use Python Stable ABI (abi3) — a single cp310-abi3 wheel
per platform works on all Python 3.10+ versions without per-version rebuilds.

**BLAS strategy:**
- Linux x86_64: Intel MKL (`BLA_VENDOR=Intel10_64lp` to avoid the `mkl_rt`
  dlopen chain that auditwheel cannot detect). MKL kernel libraries
  (`libmkl_def`, `libmkl_avx2`, `libmkl_avx512`) are manually bundled via
  `misc/repair_wheel_mkl.sh` with DT_NEEDED rewriting to match
  auditwheel-hashed names.
- Linux aarch64: Threaded OpenBLAS (`libopenblaso`).
- macOS: Apple Accelerate (found automatically).
- Windows: Prebuilt OpenBLAS from GitHub releases.

**SIMD optimization strategy:**
- Linux and macOS: Dynamic Dispatch (`FAISS_OPT_LEVEL=dd`) compiles SIMD-hot
  kernel files at multiple ISA levels (generic + AVX2 + AVX-512 on x86_64,
  NEON + SVE on aarch64) into a single library. Runtime CPUID detection
  selects the optimal code path automatically.
- Windows: MSVC does not support DD, so uses `FAISS_OPT_LEVEL=generic`.
  Shared libraries (`BUILD_SHARED_LIBS=ON`) are used to avoid MSVC's
  single-pass static archive scanning issue.

**New files:**
- `pyproject.toml`: scikit-build-core build backend config with cibuildwheel
  settings. Version is dynamically extracted from CMakeLists.txt. Enables
  abi3 on Linux/macOS via SWIG 4.2+ and NumPy 2.0+ Limited API support.
  Enables LTO and dead-code stripping for smaller wheel sizes.
- `.github/workflows/build-pip.yml`: CI workflow using cibuildwheel to build
  binary wheels on 5 platform runners. Publishes wheels to PyPI via OIDC
  trusted publishers on tag push.
- `misc/install_mkl.sh`: Installs Intel oneAPI MKL in the manylinux
  container and registers library paths with ldconfig.
- `misc/repair_wheel_mkl.sh`: Custom wheel repair that bundles MKL kernel
  libraries (dlopen deps invisible to auditwheel) and rewrites their
  DT_NEEDED entries to reference auditwheel-hashed library names.
- `THIRD_PARTY_NOTICES`: License notices for bundled dependencies (OpenBLAS,
  LLVM OpenMP, Intel oneMKL).
- `tests/test_wheel_smoke.py`: 12-test smoke suite validating import,
  OpenMP, BLAS (FlatL2/FlatIP), index factory (IVF+PQ), HNSW, serialization
  roundtrip, GC safety, contrib imports, and SIMD level detection.

**Modified files:**
- `faiss/CMakeLists.txt`: Added threaded OpenBLAS fallback — prefers
  `libopenblaso` (OpenMP-threaded) or `libopenblasp` (pthreads-threaded)
  over the serial default on RHEL/Fedora-based manylinux containers.
- `python/CMakeLists.txt`: Added `install()` targets for scikit-build-core
  wheel packaging, including all Dynamic Dispatch SWIG variants
  (avx2/avx512/avx512_spr/sve). Added abi3 support: conditionally uses
  `Python::SABIModule`, defines `Py_LIMITED_API`, and sets `.abi3.so` suffix
  when `SKBUILD_SABI_VERSION` is set.
- `.github/workflows/build.yml`: Wired `build-pip.yml` into the root CI
  trigger so pip builds run alongside conda builds.
- `tests/BUCK`: Added `python_pytest` target for the smoke test.

Differential Revision: D95258115
…bookresearch#4995)

Summary:

Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).

This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.

**Benchmark workloads (BLAS-heavy):**
- IndexFlatL2 / IndexFlatIP search (pure sgemm)
- IVF training (k-means)
- PQ training (codebook learning)
- HNSW search (control, not BLAS-bound)

**Workflow (`bench-blas-macos.yml`):**
- Manual trigger only (`workflow_dispatch`)
- Runs on `macos-14` (M1 Apple Silicon)
- Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR`
- Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS)
- Prints comparison table with median times and speedup ratios
- Uploads JSON results as artifacts

**New files:**
- `benchs/bench_blas_macos.py`: benchmark script
- `.github/workflows/bench-blas-macos.yml`: CI workflow

Differential Revision: D98331330
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant