Skip to content

Add xsimd::get<>() for optimized compile-time element extraction#1294

Merged
serge-sans-paille merged 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:feat/optimize-elem-extraction
Apr 24, 2026
Merged

Add xsimd::get<>() for optimized compile-time element extraction#1294
serge-sans-paille merged 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:feat/optimize-elem-extraction

Conversation

@DiamonDinoia
Copy link
Copy Markdown
Contributor

Add a free function xsimd::get(batch) API mirroring std::get(tuple) for fast compile-time element extraction from SIMD batches.

Per-architecture optimized kernel::get overloads using the fastest available intrinsics:

  • SSE2: shuffle/shift + scalar convert
  • SSE4.1: pextrd/pextrq/pextrb/pextrw, bitcast + pextrd for float
  • AVX: vextractf128/vextracti128 + SSE4.1 delegate
  • AVX-512: vextracti64x4/vextractf32x4 + AVX delegate
  • NEON: vgetq_lane_* (single instruction for all types)
  • NEON64: vgetq_lane_f64

Also fixes a latent bug in the common fallback for complex batch compile-time get (wrong buffer type).

@DiamonDinoia DiamonDinoia force-pushed the feat/optimize-elem-extraction branch 2 times, most recently from 0b6d85f to c6dd311 Compare April 14, 2026 14:38
@DiamonDinoia
Copy link
Copy Markdown
Contributor Author

Nice thanks for fixing CI!

This is ready for review. Once approved I will rewrite the history. I don't want to trigger a useless CI run.

@DiamonDinoia DiamonDinoia marked this pull request as ready for review April 14, 2026 17:27
Comment thread test/test_batch_complex.cpp Outdated
void check_get_all(batch_type const& res, std::index_sequence<Is...>) const
{
int dummy[] = { (check_get_element<Is>(res), 0)... };
(void)dummy;
Copy link
Copy Markdown
Contributor

@serge-sans-paille serge-sans-paille Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could check that loading the generated array ends up being equal to res, right?

Copy link
Copy Markdown
Contributor

@serge-sans-paille serge-sans-paille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the testing so that we have a decent confidence in the getter when index != 0

@DiamonDinoia
Copy link
Copy Markdown
Contributor Author

DiamonDinoia commented Apr 17, 2026

Please fix the testing so that we have a decent confidence in the getter when index != 0

Yes, I will! I also noticed some small changes I should make. I just did not have time to get to this still.

@DiamonDinoia DiamonDinoia force-pushed the feat/optimize-elem-extraction branch 7 times, most recently from 5a371e7 to fd8c743 Compare April 20, 2026 18:42
Introduces get<I>(batch) as a top-level API for extracting a single lane
at a compile-time index. Falls back to the runtime get() when per-arch
overloads aren't present.

Per-arch optimal lowerings:
- SSE2:     pextrw / byte-shift+movd / swizzle+first by lane width.
- SSE4.1:   pextrb/w/d/q; I==0 short-circuits to first().
- AVX:      I==0 short-circuits to first(); else halve + SSE4.1 path.
- AVX-512F: I==0 short-circuits to first(); 32/64-bit lanes use
            valignd/valignq + first() (2 ops); 8/16-bit halve through AVX.
- NEON / NEON64 / RVV: native single-lane extract intrinsics.
@DiamonDinoia DiamonDinoia force-pushed the feat/optimize-elem-extraction branch from fd8c743 to f30c5e0 Compare April 20, 2026 19:20
@DiamonDinoia
Copy link
Copy Markdown
Contributor Author

I like how it is now. I tried to minimize new code by re-using existing APIs. Tests check all values.

Comment thread test/test_batch.cpp
template <size_t... Is>
void test_get_impl(batch_type const& res, std::index_sequence<Is...>) const
{
array_type extracted = { xsimd::get<Is>(res)... };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly what I had in mind, thanks!

@serge-sans-paille serge-sans-paille merged commit dec12b8 into xtensor-stack:master Apr 24, 2026
74 checks passed
@AntoinePrv
Copy link
Copy Markdown
Contributor

@serge-sans-paille @DiamonDinoia this PR was merged without being properly up to date with master, and ending up failing in master.
I know the CI can be a bit slow but let's try our best to keep PR up to date and the CI 🟢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants