Allow explicit data transfers to GPUs by ZuseZ4 · Pull Request #156620 · rust-lang/rust

ZuseZ4 · 2026-05-15T20:18:27Z

So far we had our offload intrinsics handle data movement automatically to/from the gpu.
That's convenient (and reasonably fast once our LLVM opts land). However, Rust generally also allows being explicit. That might give perf benefits (where our LLVM opts fail), and it could also be nice for modelling, when passing data around but still preventing CPU users from accesing it.

Tracking:
Tracking Issue for GPU-offload #131513

ZuseZ4 · 2026-05-15T23:48:55Z

Vendoring llvm/llvm-project#198033 for now.

rust-log-analyzer · 2026-06-01T02:05:03Z

The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

fmt check
Diff in /checkout/compiler/rustc_codegen_llvm/src/builder/gpu_helper.rs:1:
-use crate::SimpleCx;
-use crate::builder::Builder;
-use crate::llvm;
-use crate::llvm::{Type, Value};
 use rustc_abi::Align;
 use rustc_codegen_ssa::MemFlags;
 use rustc_codegen_ssa::common::TypeKind;
Diff in /checkout/compiler/rustc_codegen_llvm/src/builder/gpu_helper.rs:8:
 use rustc_codegen_ssa::traits::{BaseTypeCodegenMethods, BuilderMethods};
 use rustc_middle::bug;
 use rustc_middle::ty::offload_meta::{OffloadMetadata, OffloadSize};
+
+use crate::builder::Builder;
+use crate::llvm::{Type, Value};
+use crate::{SimpleCx, llvm};
 
 pub(crate) fn scalar_width<'ll>(cx: &'ll SimpleCx<'_>, ty: &'ll Type) -> u64 {
     match cx.type_kind(ty) {
Diff in /checkout/library/core/src/offload/mod.rs:1:
 // offload module
 #[unstable(feature = "gpu_offload", issue = "131513")]
 pub use crate::macros::builtin::offload_kernel;
+use crate::marker::PhantomData;

oli-obk · 2026-06-01T08:43:30Z

+    cpu_ptr: *const T,
+    _marker: PhantomData<&'a T>,


What's the intent behind this? How is it different from just having a &'a T field?

View changes since the review

No references since they require a valid pointer at all times. However, all writes to it happen to the GPU version, which lives in a different Address space, so we treat the cpu address of the pointer merely as a key. I'm also considering to directly store the GPU address of this pointer which would make this even clearer being UB as a reference

oli-obk · 2026-06-01T08:44:02Z

+        // This exists so MIR creates Drop terminators for PreloadMut.
+        // rustc codegen intercepts those terminators and emits the
+        // offload return mapper.


why is this not just an intrinsic call here?

View changes since the review

Partly just experimenting, partly because intrinsics recently changed a bit, they got updated for more explicit Place handling, about which I didn't want to think for my mvp. I'll update them to intrinsics after my deadline.

oli-obk · 2026-06-01T08:44:49Z

+        return false;
+    };
+
+    Some(adt_def.did()) == tcx.lang_items().preload_mut_type()


use tcx.is_lang_item

View changes since the review

oli-obk · 2026-06-01T08:47:37Z

+
+#[lang = "preload"]
+#[unstable(feature = "offload", issue = "124509")]
+pub fn preload<'a, T: ?Sized>(x: &'a T) -> Preload<'a, T> {


Yea I think these should just be intrinsics instead of catching lang item calls during codegen of call terminators.

View changes since the review

ZuseZ4 added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. F-gpu_offload `#![feature(gpu_offload)]` labels May 15, 2026

rustbot added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 15, 2026