diff --git a/api/cl_ext_buffer_device_address.asciidoc b/api/cl_ext_buffer_device_address.asciidoc new file mode 100644 index 000000000..b76998925 --- /dev/null +++ b/api/cl_ext_buffer_device_address.asciidoc @@ -0,0 +1,80 @@ +// Copyright 2024 The Khronos Group Inc. +// SPDX-License-Identifier: CC-BY-4.0 + +include::{generated}/meta/{refprefix}cl_ext_buffer_device_address.txt[] + +=== Other Extension Metadata + +*Last Modified Date*:: + 2025-02-04 +*IP Status*:: + No known IP claims. +*Contributors*:: + - Pekka Jääskeläinen, Intel + + - Karol Herbst, Red Hat + + - Ben Ashbaugh, Intel + + - Kevin Petit, Arm + + - Henry Linjamäki, Intel + + +=== Description + +This extension provides access to raw device pointers for cl_mem buffers +without requiring a shared virtual address space between the host and +the device. + +==== Background + +Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature +that enables raw pointers in the OpenCL standard. Its coarse-grain +variant is relatively simple to implement on various platforms in terms of +coherency requirements, but it requires mapping the buffer's address range +to the host virtual address space. +However, various higher-level heterogeneous APIs present a memory allocation +routine which can allocate device-only memory and provide raw addresses to +it without guarentees of system-wide uniqueness. For example, minimal +implementations of OpenMP's omp_target_alloc() and CUDA/HIP's +cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device. + +Host-device unified addressing might not be a major implementation issue in +systems which can provide virtual memory across the platform, but might +bring challenges in cases where the device presents a global memory with +a disjoint address space (that can also be a physical memory address space) or, +for example, when a barebone embedded system lacks virtual memory support altogether. +This extension is targeted to complement the OpenCL SVM extension by providing +an additional lower-end step in the spectrum of type of pointers/buffers OpenCL +can allocate. + +=== New Command + + * {clSetKernelArgDevicePointerEXT} + +=== New Types + + * {cl_mem_device_address_EXT} + +=== New Enums + + * {cl_mem_properties_TYPE} + ** {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} + * {cl_mem_info_TYPE} + ** {CL_MEM_DEVICE_ADDRESS_EXT} + * {cl_kernel_exec_info_TYPE} + ** {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} + +=== Version History + + * Revision 1.0.0, 2025-01-15 + ** Initial version for detailed review. + * Revision 1.0.1, 2025-01-28 + ** Made it explicit that passing illegal pointers is legal as long as they are + not referenced. Removed CL_INVALID_ARG_VALUE as a possible error in + clSetKernelArgDevicePointerEXT() as there are no illegal pointer + cases when calling this function. Return CL_INVALID_OPERATION for + clGetMemObjectInfo() if the pointer is not a buffer device pointer. + clSetKernelExecInfo() and clSetKernelArgDevicePointerEXT() now only + error out if no devices in the context associated with kernel support + device pointers. + * Revision 1.0.2, 2025-02-04 + ** Converted the clSetKernelArgDevicePointerEXT() address parameter to + a value instead of a pointer to the value. + diff --git a/api/opencl_runtime_layer.asciidoc b/api/opencl_runtime_layer.asciidoc index c1331ffc1..f9ccf894e 100644 --- a/api/opencl_runtime_layer.asciidoc +++ b/api/opencl_runtime_layer.asciidoc @@ -595,6 +595,35 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_HANDLE_LIST_KHR.asciidoc[] {CL_MEM_DEVICE_HANDLE_LIST_END_KHR_anchor}) to associate with the external memory handle. endif::cl_khr_external_memory[] + +ifdef::cl_ext_buffer_device_address[] + +| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[] + | {cl_bool_TYPE} + | When set to {CL_TRUE}, specifies that the buffer must have a single fixed + device-side address for its lifetime, and the address can be queried via {clGetMemObjectInfo}. + + Each device in the context can have their own (fixed) device-side address and + a copy of the created buffer which are synchronized + implicitly by the runtime. + + The flag might imply that the buffer will be "pinned" permanently to + a device's memory, but might not be necessarily so, as long as the address + range of the buffer remains constant. + + The device addresses of sub-buffers derived from {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} + allocated buffers can be computed by adding the sub-buffer origin to the + device-specific start address. + + If the device supports SVM and {clCreateBufferWithProperties} is called with a pointer + returned by {clSVMAlloc} as its _host_ptr_ argument, and {CL_MEM_USE_HOST_PTR} is + set in its _flags_ argument, the device-side address is guaranteed to match + the _host_ptr_. + +endif::cl_ext_buffer_device_address[] + |==== ifdef::cl_khr_external_memory[] @@ -662,6 +691,12 @@ ifdef::cl_khr_external_memory[] {CL_MEM_DEVICE_HANDLE_LIST_KHR} is specified as part of _properties_. ** if _properties_ includes more than one external memory handle. endif::cl_khr_external_memory[] +ifdef::cl_ext_buffer_device_address[] + * {CL_INVALID_OPERATION} + ** If _properties_ includes {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there + are no devices in the context that support the {cl_ext_buffer_device_address_EXT} + extension. +endif::cl_ext_buffer_device_address[] [[memory-flags-table]] .List of supported memory flag values @@ -6463,6 +6498,20 @@ include::{generated}/api/version-notes/CL_MEM_D3D11_RESOURCE_KHR.asciidoc[] returns the _resource_ argument specified when _memobj_ was created. endif::cl_khr_d3d11_sharing[] +ifdef::cl_ext_buffer_device_address[] +| {CL_MEM_DEVICE_ADDRESS_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[] + | {cl_mem_device_address_EXT_TYPE}[] + | If _memobj_ was created using {clCreateBufferWithProperties} with + the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property set to {CL_TRUE}, + returns a list of device addresses for the buffer, one for each + device in the context in the same order as the list of devices + passed to {clCreateContext}. + +endif::cl_ext_buffer_device_address[] + + |==== // refError @@ -6472,6 +6521,11 @@ successfully. Otherwise, it returns one of the following errors: * {CL_INVALID_MEM_OBJECT} if _memobj_ is a not a valid memory object. +ifdef::cl_ext_buffer_device_address[] + * {CL_INVALID_OPERATION} is returned for the {CL_MEM_DEVICE_ADDRESS_EXT} query if + the {cl_ext_buffer_device_address_EXT} is not supported or if the + buffer was not allocated with {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}. +endif::cl_ext_buffer_device_address[] * {CL_INVALID_VALUE} if _param_name_ is not one of the supported values, or if the size in bytes specified by _param_value_size_ is less than size of the return type specified in the @@ -10778,6 +10832,48 @@ Otherwise, it returns one of the following errors: required by the OpenCL implementation on the host. -- +ifdef::cl_ext_buffer_device_address[] +[open,refpage='clSetKernelArgDevicePointerEXT',desc='Set a device pointer as the argument value for a specific argument of a kernel.',type='protos'] +-- +To set a device pointer as the argument value for a specific argument of a +kernel, call the function + +include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[] +include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[] + + * _kernel_ is a valid kernel object. + * _arg_index_ is the argument index. + Arguments to the kernel are referred by indices that go from 0 for the + leftmost argument to _n_ - 1, where _n_ is the total number of arguments + declared by a kernel. + * _arg_value_ is the device pointer that should be used as the argument value for + argument specified by _arg_index_. + The device pointer specified is the value used by all API calls that enqueue + _kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument + value is changed by a call to {clSetKernelArgDevicePointerEXT} for _kernel_. + The device pointer can only be used for arguments that are declared to be a + pointer to `global` memory allocated with {clCreateBufferWithProperties} with + the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property. The pointer value specified as + the argument value can be the pointer to the beginning of the buffer or any offset into + the buffer region. The device pointer value must be naturally aligned according to + the argument's type. It should be noted that it's legal to pass invalid + pointers as the value (similarly to C/C++ function calls with pointer arguments) as + long as the kernel doesn't dereference the pointer. + +{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the argument was set +successfully. Otherwise, it returns one of the following errors: + + * {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object. + * {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support + the {cl_ext_buffer_device_address_EXT} extension. + * {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index. + * {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required + by the OpenCL implementation on the device. + * {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources + required by the OpenCL implementation on the host. +-- +endif::cl_ext_buffer_device_address[] + [open,refpage='clSetKernelExecInfo',desc='Set additional execution information for a kernel.',type='protos'] -- To set additional execution information for a kernel, call the function @@ -10844,6 +10940,19 @@ include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM If {clSetKernelExecInfo} has not been called with a value for {CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM}, the default value is {CL_TRUE}. + +ifdef::cl_ext_buffer_device_address[] +| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor} + +include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[] + | {cl_mem_device_address_EXT_TYPE}[] + | Device pointers must reference locations contained entirely within + buffers that are passed to kernel as arguments, or that are passed + through the execution information. Non-argument device pointers accessed + by the kernel must be specified by passing pointers to those buffers + via this {clSetKernelExecInfo} option. +endif::cl_ext_buffer_device_address[] + |==== // refError @@ -10853,7 +10962,16 @@ successfully. Otherwise, it returns one of the following errors: * {CL_INVALID_KERNEL} if _kernel_ is a not a valid kernel object. - * {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support SVM. + * {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_SVM_PTRS} and + {CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} if no devices in + the context associated with _kernel_ support SVM. +ifdef::cl_ext_buffer_device_address[] + * {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} if no + device in the context associated with _kernel_ support the {cl_ext_buffer_device_address_EXT} + extension. +endif::cl_ext_buffer_device_address[] + * {CL_INVALID_VALUE} if _param_name_ is not valid, if _param_value_ is + `NULL` or if the size specified by _param_value_size_ is not valid. * {CL_INVALID_OPERATION} if _param_name_ is {CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} and _param_value_ is {CL_TRUE} and no devices in the context associated with _kernel_ support fine-grain diff --git a/xml/cl.xml b/xml/cl.xml index 39c3a4c4c..a2b0556b6 100644 --- a/xml/cl.xml +++ b/xml/cl.xml @@ -255,6 +255,7 @@ server's OpenCL/api-docs repository. typedef cl_bitfield cl_platform_command_buffer_capabilities_khr; typedef cl_bitfield cl_mutable_dispatch_asserts_khr typedef cl_bitfield cl_device_kernel_clock_capabilities_khr; + typedef cl_ulong cl_mem_device_address_ext; Structure types @@ -2315,6 +2316,12 @@ server's OpenCL/api-docs repository. + + + + + +