Add BLAS benchmark: OpenBLAS vs Apple Accelerate on macOS arm64 (#4995)#4995
Closed
alibeklfc wants to merge 2 commits intofacebookresearch:mainfrom
Closed
Add BLAS benchmark: OpenBLAS vs Apple Accelerate on macOS arm64 (#4995)#4995alibeklfc wants to merge 2 commits intofacebookresearch:mainfrom
alibeklfc wants to merge 2 commits intofacebookresearch:mainfrom
Conversation
Contributor
|
@alibeklfc has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98331330. |
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Mar 26, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
9aa2ffa to
6ecdac5
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Mar 26, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Mar 26, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
8031be5 to
804e783
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Mar 26, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
804e783 to
6108d16
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Mar 26, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Mar 27, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
3d89756 to
650623c
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Mar 27, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
650623c to
804c819
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Mar 28, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Apr 2, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
804c819 to
8f0e770
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Apr 2, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
f5e98cb to
2fa7155
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Apr 2, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Apr 2, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
2fa7155 to
8809e4b
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Apr 2, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
8809e4b to
2f21e19
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Apr 2, 2026
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
2f21e19 to
e67b629
Compare
…kresearch#4862) Summary: Add official pip wheel packaging for `faiss-cpu`, enabling `pip install faiss-cpu` from PyPI. Builds binary wheels across Linux x86_64/aarch64, macOS arm64/x86_64, and Windows x86_64 for Python 3.10-3.14. Linux and macOS use Python Stable ABI (abi3) — a single cp310-abi3 wheel per platform works on all Python 3.10+ versions without per-version rebuilds. **BLAS strategy:** - Linux x86_64: Intel MKL (`BLA_VENDOR=Intel10_64lp` to avoid the `mkl_rt` dlopen chain that auditwheel cannot detect). MKL kernel libraries (`libmkl_def`, `libmkl_avx2`, `libmkl_avx512`) are manually bundled via `misc/repair_wheel_mkl.sh` with DT_NEEDED rewriting to match auditwheel-hashed names. - Linux aarch64: Threaded OpenBLAS (`libopenblaso`). - macOS: Apple Accelerate (found automatically). - Windows: Prebuilt OpenBLAS from GitHub releases. **SIMD optimization strategy:** - Linux and macOS: Dynamic Dispatch (`FAISS_OPT_LEVEL=dd`) compiles SIMD-hot kernel files at multiple ISA levels (generic + AVX2 + AVX-512 on x86_64, NEON + SVE on aarch64) into a single library. Runtime CPUID detection selects the optimal code path automatically. - Windows: MSVC does not support DD, so uses `FAISS_OPT_LEVEL=generic`. Shared libraries (`BUILD_SHARED_LIBS=ON`) are used to avoid MSVC's single-pass static archive scanning issue. **New files:** - `pyproject.toml`: scikit-build-core build backend config with cibuildwheel settings. Version is dynamically extracted from CMakeLists.txt. Enables abi3 on Linux/macOS via SWIG 4.2+ and NumPy 2.0+ Limited API support. Enables LTO and dead-code stripping for smaller wheel sizes. - `.github/workflows/build-pip.yml`: CI workflow using cibuildwheel to build binary wheels on 5 platform runners. Publishes wheels to PyPI via OIDC trusted publishers on tag push. - `misc/install_mkl.sh`: Installs Intel oneAPI MKL in the manylinux container and registers library paths with ldconfig. - `misc/repair_wheel_mkl.sh`: Custom wheel repair that bundles MKL kernel libraries (dlopen deps invisible to auditwheel) and rewrites their DT_NEEDED entries to reference auditwheel-hashed library names. - `THIRD_PARTY_NOTICES`: License notices for bundled dependencies (OpenBLAS, LLVM OpenMP, Intel oneMKL). - `tests/test_wheel_smoke.py`: 12-test smoke suite validating import, OpenMP, BLAS (FlatL2/FlatIP), index factory (IVF+PQ), HNSW, serialization roundtrip, GC safety, contrib imports, and SIMD level detection. **Modified files:** - `faiss/CMakeLists.txt`: Added threaded OpenBLAS fallback — prefers `libopenblaso` (OpenMP-threaded) or `libopenblasp` (pthreads-threaded) over the serial default on RHEL/Fedora-based manylinux containers. - `python/CMakeLists.txt`: Added `install()` targets for scikit-build-core wheel packaging, including all Dynamic Dispatch SWIG variants (avx2/avx512/avx512_spr/sve). Added abi3 support: conditionally uses `Python::SABIModule`, defines `Py_LIMITED_API`, and sets `.abi3.so` suffix when `SKBUILD_SABI_VERSION` is set. - `.github/workflows/build.yml`: Wired `build-pip.yml` into the root CI trigger so pip builds run alongside conda builds. - `tests/BUCK`: Added `python_pytest` target for the smoke test. Differential Revision: D95258115
…bookresearch#4995) Summary: Add a GitHub Actions workflow and Python benchmark script to compare OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon). This supports the pip wheel packaging decision in D95258115: the pip wheel uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies the performance difference. **Benchmark workloads (BLAS-heavy):** - IndexFlatL2 / IndexFlatIP search (pure sgemm) - IVF training (k-means) - PQ training (codebook learning) - HNSW search (control, not BLAS-bound) **Workflow (`bench-blas-macos.yml`):** - Manual trigger only (`workflow_dispatch`) - Runs on `macos-14` (M1 Apple Silicon) - Builds faiss-cpu twice via `pip install .` with different `BLA_VENDOR` - Uses `FAISS_OPT_LEVEL=generic` to speed up builds (SIMD irrelevant for BLAS) - Prints comparison table with median times and speedup ratios - Uploads JSON results as artifacts **New files:** - `benchs/bench_blas_macos.py`: benchmark script - `.github/workflows/bench-blas-macos.yml`: CI workflow Differential Revision: D98331330
e67b629 to
410b59e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Add a GitHub Actions workflow and Python benchmark script to compare
OpenBLAS vs Apple Accelerate BLAS performance on macOS arm64 (Apple Silicon).
This supports the pip wheel packaging decision in D95258115: the pip wheel
uses Accelerate on macOS while conda uses OpenBLAS. This benchmark quantifies
the performance difference.
Benchmark workloads (BLAS-heavy):
Workflow (
bench-blas-macos.yml):workflow_dispatch)macos-14(M1 Apple Silicon)pip install .with differentBLA_VENDORFAISS_OPT_LEVEL=genericto speed up builds (SIMD irrelevant for BLAS)New files:
benchs/bench_blas_macos.py: benchmark script.github/workflows/bench-blas-macos.yml: CI workflowDifferential Revision: D98331330