aarch64: use `read_unaligned` for `vld1_*` #2004

folkertdev · 2026-01-27T18:14:59Z

The custom intrinsics for vld1_* optimize less well than a standard unaligned read.

https://rust.godbolt.org/z/8T6Kr63K4

That seems like something that should be fixed in LLVM, it should be able to eliminate this store to an alloca:

  %val1 = alloca [64 x i8], align 4
  call void @llvm.lifetime.start.p0(i64 64, ptr nonnull %val1)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(64) %val1, ptr noundef nonnull align 4 dereferenceable(64) %val, i64 64, i1 false)
  %0 = call { <4 x float>, <4 x float>, <4 x float>, <4 x float> } @llvm.aarch64.neon.ld1x4.v4f32.p0(ptr noundef nonnull %val1) #4

But for now we can fix it here. The only problem is how to test it. Maybe there is some clever way in the yml format, but the issue is that some vector sizes use ldp, others use ldr. I don't currently see a way to encode that nicely. I have it on good authority (by an arm engineer), that there is no readon to prefer ld1 over two ldps.

This was found in fearless_simd, @Shnatsel and I went on a bughunt, and finally found the cause for their weird codegen in a read that first dereferenced a reference to an array. There is some extra context here linebender/fearless_simd#185 (comment).

cc @adamgemmell @CrooseGit if I'm missing anything, and if we can find a proper way to test this (or just accept testing for ld and not specifying the exact instruction).

edit: also I can't find anywhere whether the intrinsic assumes an aligned pointer or not, so maybe we should be using core::ptr::read instead?

rustbot · 2026-01-27T18:15:06Z

r? @sayantn

rustbot has assigned @sayantn.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Shnatsel · 2026-01-27T18:24:40Z

I think it would be nice to add a comment to the .yaml documenting the rationale and linking to this PR

aarch64: use read_unaligned for vld1_*

fd709ac

rustbot assigned sayantn Jan 27, 2026

Shnatsel mentioned this pull request Jan 27, 2026

Don't use load/store intrinsics linebender/fearless_simd#185

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aarch64: use `read_unaligned` for `vld1_*` #2004

aarch64: use `read_unaligned` for `vld1_*` #2004

folkertdev commented Jan 27, 2026 •

edited

Loading

Uh oh!

rustbot commented Jan 27, 2026

Uh oh!

Shnatsel commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aarch64: use read_unaligned for vld1_* #2004

Are you sure you want to change the base?

aarch64: use read_unaligned for vld1_* #2004

Conversation

folkertdev commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Jan 27, 2026

Uh oh!

Shnatsel commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aarch64: use `read_unaligned` for `vld1_*` #2004

aarch64: use `read_unaligned` for `vld1_*` #2004

folkertdev commented Jan 27, 2026 •

edited

Loading