Skip to content

Allow explicit data transfers to GPUs#156620

Draft
ZuseZ4 wants to merge 5 commits into
rust-lang:mainfrom
ZuseZ4:offload-explicit-datatransfer
Draft

Allow explicit data transfers to GPUs#156620
ZuseZ4 wants to merge 5 commits into
rust-lang:mainfrom
ZuseZ4:offload-explicit-datatransfer

Conversation

@ZuseZ4
Copy link
Copy Markdown
Member

@ZuseZ4 ZuseZ4 commented May 15, 2026

View all comments

So far we had our offload intrinsics handle data movement automatically to/from the gpu.
That's convenient (and reasonably fast once our LLVM opts land). However, Rust generally also allows being explicit. That might give perf benefits (where our LLVM opts fail), and it could also be nice for modelling, when passing data around but still preventing CPU users from accesing it.

@ZuseZ4 ZuseZ4 added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. F-gpu_offload `#![feature(gpu_offload)]` labels May 15, 2026
@rustbot rustbot added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 15, 2026
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from e475c46 to da102aa Compare May 15, 2026 22:37
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4
Copy link
Copy Markdown
Member Author

ZuseZ4 commented May 15, 2026

Vendoring llvm/llvm-project#198033 for now.

@rust-log-analyzer

This comment has been minimized.

@rust-bors

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from abc274d to 1d8d1e7 Compare May 29, 2026 01:47
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from 1d8d1e7 to a94ef31 Compare May 29, 2026 02:58
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch 2 times, most recently from 4b77bad to 319ef7d Compare May 31, 2026 00:47
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from fba7eb2 to 358171b Compare May 31, 2026 02:35
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from 2f1d614 to bbe3882 Compare May 31, 2026 20:03
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from bbe3882 to d290591 Compare May 31, 2026 20:32
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from d290591 to e8ad696 Compare May 31, 2026 21:59
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from e8ad696 to ce8db44 Compare May 31, 2026 22:13
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from ce8db44 to fe82262 Compare May 31, 2026 22:18
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from fe82262 to 7a44fd7 Compare May 31, 2026 22:49
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from 7a44fd7 to cf43198 Compare May 31, 2026 23:25
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from cf43198 to fae2f07 Compare May 31, 2026 23:52
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from 4f5c325 to 6c8bec9 Compare June 1, 2026 01:36
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer
Copy link
Copy Markdown
Collaborator

The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
fmt check
Diff in /checkout/compiler/rustc_codegen_llvm/src/builder/gpu_helper.rs:1:
-use crate::SimpleCx;
-use crate::builder::Builder;
-use crate::llvm;
-use crate::llvm::{Type, Value};
 use rustc_abi::Align;
 use rustc_codegen_ssa::MemFlags;
 use rustc_codegen_ssa::common::TypeKind;
Diff in /checkout/compiler/rustc_codegen_llvm/src/builder/gpu_helper.rs:8:
 use rustc_codegen_ssa::traits::{BaseTypeCodegenMethods, BuilderMethods};
 use rustc_middle::bug;
 use rustc_middle::ty::offload_meta::{OffloadMetadata, OffloadSize};
+
+use crate::builder::Builder;
+use crate::llvm::{Type, Value};
+use crate::{SimpleCx, llvm};
 
 pub(crate) fn scalar_width<'ll>(cx: &'ll SimpleCx<'_>, ty: &'ll Type) -> u64 {
     match cx.type_kind(ty) {
Diff in /checkout/library/core/src/offload/mod.rs:1:
 // offload module
 #[unstable(feature = "gpu_offload", issue = "131513")]
 pub use crate::macros::builtin::offload_kernel;
+use crate::marker::PhantomData;

Comment on lines +12 to +13
cpu_ptr: *const T,
_marker: PhantomData<&'a T>,
Copy link
Copy Markdown
Contributor

@oli-obk oli-obk Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intent behind this? How is it different from just having a &'a T field?

View changes since the review

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No references since they require a valid pointer at all times. However, all writes to it happen to the GPU version, which lives in a different Address space, so we treat the cpu address of the pointer merely as a key. I'm also considering to directly store the GPU address of this pointer which would make this even clearer being UB as a reference

Comment on lines +39 to +41
// This exists so MIR creates Drop terminators for PreloadMut.
// rustc codegen intercepts those terminators and emits the
// offload return mapper.
Copy link
Copy Markdown
Contributor

@oli-obk oli-obk Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not just an intrinsic call here?

View changes since the review

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partly just experimenting, partly because intrinsics recently changed a bit, they got updated for more explicit Place handling, about which I didn't want to think for my mvp. I'll update them to intrinsics after my deadline.

return false;
};

Some(adt_def.did()) == tcx.lang_items().preload_mut_type()
Copy link
Copy Markdown
Contributor

@oli-obk oli-obk Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tcx.is_lang_item

View changes since the review


#[lang = "preload"]
#[unstable(feature = "offload", issue = "124509")]
pub fn preload<'a, T: ?Sized>(x: &'a T) -> Preload<'a, T> {
Copy link
Copy Markdown
Contributor

@oli-obk oli-obk Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think these should just be intrinsics instead of catching lang item calls during codegen of call terminators.

View changes since the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. F-gpu_offload `#![feature(gpu_offload)]` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants