Add xsimd::get<>() for optimized compile-time element extraction by DiamonDinoia · Pull Request #1294 · xtensor-stack/xsimd

DiamonDinoia · 2026-04-09T23:17:14Z

Add a free function xsimd::get(batch) API mirroring std::get(tuple) for fast compile-time element extraction from SIMD batches.

Per-architecture optimized kernel::get overloads using the fastest available intrinsics:

SSE2: shuffle/shift + scalar convert

SSE4.1: pextrd/pextrq/pextrb/pextrw, bitcast + pextrd for float

AVX: vextractf128/vextracti128 + SSE4.1 delegate

AVX-512: vextracti64x4/vextractf32x4 + AVX delegate

NEON: vgetq_lane_* (single instruction for all types)

NEON64: vgetq_lane_f64

Also fixes a latent bug in the common fallback for complex batch compile-time get (wrong buffer type).

DiamonDinoia · 2026-04-14T17:27:01Z

Nice thanks for fixing CI!

This is ready for review. Once approved I will rewrite the history. I don't want to trigger a useless CI run.

serge-sans-paille · 2026-04-16T19:36:04Z

+    void check_get_all(batch_type const& res, std::index_sequence<Is...>) const
+    {
+        int dummy[] = { (check_get_element<Is>(res), 0)... };
+        (void)dummy;


you could check that loading the generated array ends up being equal to res, right?

serge-sans-paille

Please fix the testing so that we have a decent confidence in the getter when index != 0

DiamonDinoia · 2026-04-17T14:09:11Z

Please fix the testing so that we have a decent confidence in the getter when index != 0

Yes, I will! I also noticed some small changes I should make. I just did not have time to get to this still.

Introduces get<I>(batch) as a top-level API for extracting a single lane at a compile-time index. Falls back to the runtime get() when per-arch overloads aren't present. Per-arch optimal lowerings: - SSE2: pextrw / byte-shift+movd / swizzle+first by lane width. - SSE4.1: pextrb/w/d/q; I==0 short-circuits to first(). - AVX: I==0 short-circuits to first(); else halve + SSE4.1 path. - AVX-512F: I==0 short-circuits to first(); 32/64-bit lanes use valignd/valignq + first() (2 ops); 8/16-bit halve through AVX. - NEON / NEON64 / RVV: native single-lane extract intrinsics.

DiamonDinoia · 2026-04-20T19:56:11Z

I like how it is now. I tried to minimize new code by re-using existing APIs. Tests check all values.

serge-sans-paille · 2026-04-24T07:55:09Z

+    template <size_t... Is>
+    void test_get_impl(batch_type const& res, std::index_sequence<Is...>) const
+    {
+        array_type extracted = { xsimd::get<Is>(res)... };


Exactly what I had in mind, thanks!

AntoinePrv · 2026-04-24T09:58:45Z

@serge-sans-paille @DiamonDinoia this PR was merged without being properly up to date with master, and ending up failing in master.
I know the CI can be a bit slow but let's try our best to keep PR up to date and the CI 🟢

DiamonDinoia force-pushed the feat/optimize-elem-extraction branch 2 times, most recently from 0b6d85f to c6dd311 Compare April 14, 2026 14:38

DiamonDinoia marked this pull request as ready for review April 14, 2026 17:27

serge-sans-paille requested changes Apr 16, 2026

View reviewed changes

serge-sans-paille requested changes Apr 17, 2026

View reviewed changes

DiamonDinoia force-pushed the feat/optimize-elem-extraction branch 7 times, most recently from 5a371e7 to fd8c743 Compare April 20, 2026 18:42

DiamonDinoia force-pushed the feat/optimize-elem-extraction branch from fd8c743 to f30c5e0 Compare April 20, 2026 19:20

DiamonDinoia requested a review from serge-sans-paille April 20, 2026 19:55

serge-sans-paille reviewed Apr 24, 2026

View reviewed changes

serge-sans-paille merged commit dec12b8 into xtensor-stack:master Apr 24, 2026
74 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xsimd::get<>() for optimized compile-time element extraction#1294

Add xsimd::get<>() for optimized compile-time element extraction#1294
serge-sans-paille merged 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:feat/optimize-elem-extraction

DiamonDinoia commented Apr 9, 2026

Uh oh!

DiamonDinoia commented Apr 14, 2026

Uh oh!

serge-sans-paille Apr 16, 2026 •

edited

Loading

Uh oh!

serge-sans-paille left a comment

Uh oh!

DiamonDinoia commented Apr 17, 2026 •

edited

Loading

Uh oh!

DiamonDinoia commented Apr 20, 2026

Uh oh!

serge-sans-paille Apr 24, 2026

Uh oh!

Uh oh!

AntoinePrv commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DiamonDinoia commented Apr 9, 2026

Uh oh!

DiamonDinoia commented Apr 14, 2026

Uh oh!

serge-sans-paille Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serge-sans-paille left a comment

Choose a reason for hiding this comment

Uh oh!

DiamonDinoia commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DiamonDinoia commented Apr 20, 2026

Uh oh!

serge-sans-paille Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AntoinePrv commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

serge-sans-paille Apr 16, 2026 •

edited

Loading

DiamonDinoia commented Apr 17, 2026 •

edited

Loading