From 9b186825908c40f83fdaaed3748722d70db41004 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?= Date: Tue, 7 May 2024 19:05:29 +0300 Subject: [PATCH 1/5] cl_ext_buffer_device_address The basic cl_mem buffer API doesn't enable access to the underlying raw pointers in the device memory, preventing its use in host side data structures that need pointer references to objects. This API adds a minimal increment on top of cl_mem that provides such capabilities. --- api/cl_ext_buffer_device_address.asciidoc | 96 ++++++++++++++++ api/opencl_runtime_layer.asciidoc | 127 +++++++++++++++++++++- extensions/extensions.txt | 2 + xml/cl.xml | 35 +++++- 4 files changed, 256 insertions(+), 4 deletions(-) create mode 100644 api/cl_ext_buffer_device_address.asciidoc diff --git a/api/cl_ext_buffer_device_address.asciidoc b/api/cl_ext_buffer_device_address.asciidoc new file mode 100644 index 000000000..26f83e1f9 --- /dev/null +++ b/api/cl_ext_buffer_device_address.asciidoc @@ -0,0 +1,96 @@ +// Copyright 2024 The Khronos Group Inc. +// SPDX-License-Identifier: CC-BY-4.0 + +include::{generated}/meta/{refprefix}cl_ext_buffer_device_address.txt[] + +=== Other Extension Metadata + +*Last Modified Date*:: + 2024-12-06 +*IP Status*:: + No known IP claims. +*Contributors*:: + - Pekka Jääskeläinen, Intel + + - Karol Herbst, Red Hat + + - Henry Linjamäki, Intel + + - Kevin Petit, Arm + + +=== Description + +This extension provides access to raw device pointers for cl_mem buffers +without requiring a shared virtual address space between the host and +the device. + +==== Background + +Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature +that enables raw pointers in the OpenCL standard. Its coarse-grain +variant is relatively simple to implement on various platforms in terms of +coherency requirements, but it requires mapping the buffer's address range +to the host virtual address space. +However, various higher-level heterogeneous APIs present a memory allocation +routine which can allocate device-only memory and provide raw addresses to +it without guarentees of system-wide uniqueness. For example, minimal +implementations of OpenMP's omp_target_alloc() and CUDA/HIP's +cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device. + +Host-device unified addressing might not be a major implementation issue in +systems which can provide virtual memory across the platform, but might +bring challenges in cases where the device presents a global memory with +a disjoint address space (that can also be a physical memory address space) or, +for example, when a barebone embedded system lacks virtual memory support altogether. +This extension is targeted to complement the OpenCL SVM extension by providing +an additional lower-end step in the spectrum of type of pointers/buffers OpenCL +can allocate. + +=== New Command + + * {clSetKernelArgDevicePointerEXT} + +=== New Types + + * {cl_mem_device_address_EXT} + +=== New Enums + + * {cl_mem_properties_TYPE} + ** {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} + ** {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} + * {cl_mem_info_TYPE} + ** {CL_MEM_DEVICE_ADDRESS_EXT} + * {cl_kernel_exec_info_TYPE} + ** {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} + +=== Version History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|==== +| *Version* | *Date* | *Author* | *Changes* +| 0.9.0 | 2024-12-06 | Pekka Jääskeläinen, Kevin Petit | + Integrated to the main unified specification. + Moved the functionality to clCreateBufferWithProperties, + thus requiring 3.0+. Single memobj query for fetching the + address(es). Also other smaller improvements pointed by Kevin. + Candidate for final 1.0.0. +| 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst | + Made the allocation flags independent from each other and + renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and + CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the + same address across all devices in the context, whereas the latter + allows per-device addresses. +| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst | + Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device + cases "all", not "any", covering a case where not all devices + can ensure the same address across the context. In that case + CL_INVALID_VALUE can be returned. Defined sub-buffer address + computation to be 'base_addr + origin'. Added error conditions + for clSetKernelExecInfo when the device doesn't support + device pointers. +| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback. + This version describes the first API version that was prototyped + in PoCL and RustiCL using temporary placeholder flag/enum values. + The PoCL implementation and initial discussion on the extension + can be found https://github.com/pocl/pocl/pull/1441[in this PR]. +|==== diff --git a/api/opencl_runtime_layer.asciidoc b/api/opencl_runtime_layer.asciidoc index c1331ffc1..bcb315172 100644 --- a/api/opencl_runtime_layer.asciidoc +++ b/api/opencl_runtime_layer.asciidoc @@ -595,6 +595,39 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_HANDLE_LIST_KHR.asciidoc[] {CL_MEM_DEVICE_HANDLE_LIST_END_KHR_anchor}) to associate with the external memory handle. endif::cl_khr_external_memory[] + +ifdef::cl_ext_buffer_device_address[] + +| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[] + | {cl_bool_TYPE} + | When set to CL_TRUE, specifies that the buffer must have a single fixed + device-side address for its lifetime, and the address can be queried via {clGetMemObjectInfo}. + + Each device in the context can have their own (fixed) device-side address and + a copy of the created buffer which are synchronized + implicitly by the runtime. + + The flag might imply that the buffer will be "pinned" permanently to + a device's memory, but might not be necessarily so, as long as the address + range of the buffer remains constant. + + The device addresses of sub-buffers derived from {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} + allocated buffers can be computed by adding the sub-buffer origin to the + device-specific start address. + +| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[] + | {cl_bool_TYPE} + | When set to CL_TRUE, the buffer has otherwise the same properties as + when allocated using the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} flag, + but with an additional property that the buffer's address is the same across + all the devices in the context. + +endif::cl_ext_buffer_device_address[] + |==== ifdef::cl_khr_external_memory[] @@ -662,6 +695,15 @@ ifdef::cl_khr_external_memory[] {CL_MEM_DEVICE_HANDLE_LIST_KHR} is specified as part of _properties_. ** if _properties_ includes more than one external memory handle. endif::cl_khr_external_memory[] +ifdef::cl_ext_buffer_device_address[] + * {CL_INVALID_DEVICE} + ** If _properties_ includes either {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} or + {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there is at least one device in + the context that doesn't support such allocation. + * {CL_INVALID_VALUE} + ** If _properties_ includes both {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} and + {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} at the same time. +endif::cl_ext_buffer_device_address[] [[memory-flags-table]] .List of supported memory flag values @@ -6463,6 +6505,21 @@ include::{generated}/api/version-notes/CL_MEM_D3D11_RESOURCE_KHR.asciidoc[] returns the _resource_ argument specified when _memobj_ was created. endif::cl_khr_d3d11_sharing[] +ifdef::cl_ext_buffer_device_address[] +| {CL_MEM_DEVICE_ADDRESS_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[] + | {cl_mem_device_address_EXT_TYPE}[] + | If _memobj_ was created using {clCreateBufferWithProperties} with + the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property set to CL_TRUE, + returns a list of device addresses for the buffer, one for each + device in the context. If the buffer was allocated + with the {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} property, + only one device address is returned. + +endif::cl_ext_buffer_device_address[] + + |==== // refError @@ -6477,6 +6534,12 @@ Otherwise, it returns one of the following errors: the return type specified in the <> table and _param_value_ is not `NULL`. +ifdef::cl_ext_buffer_device_address[] + ** Returned for the {CL_MEM_DEVICE_ADDRESS_EXT} query if + the {cl_ext_buffer_device_address_EXT} is not supported or if the + buffer was not allocated with neither {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or + {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}. +endif::cl_ext_buffer_device_address[] * {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required by the OpenCL implementation on the device. * {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources @@ -10778,6 +10841,48 @@ Otherwise, it returns one of the following errors: required by the OpenCL implementation on the host. -- +ifdef::cl_ext_buffer_device_address[] +[open,refpage='clSetKernelArgDevicePointerEXT',desc='Set a device pointer as the argument value for a specific argument of a kernel.',type='protos'] +-- +To set a device pointer as the argument value for a specific argument of a +kernel, call the function + +include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[] +include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[] + + * _kernel_ is a valid kernel object. + * _arg_index_ is the argument index. + Arguments to the kernel are referred by indices that go from 0 for the + leftmost argument to _n_ - 1, where _n_ is the total number of arguments + declared by a kernel. + * _arg_value_ is the device pointer that should be used as the argument value for + argument specified by _arg_index_. + The device pointer specified is the value used by all API calls that enqueue + _kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument + value is changed by a call to {clSetKernelArgDevicePointerEXT} for _kernel_. + The device pointer can only be used for arguments that are declared to be a + pointer to `global` memory allocated with {clCreateBufferWithProperties} with + either the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} + property. The pointer value specified as the argument value + can be the pointer to the beginning of the buffer or any offset into + the buffer region. The device pointer value must be naturally aligned according to + the argument's type. + +{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the argument was set +successfully. Otherwise, it returns one of the following errors: + + * {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object. + * {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support + the device pointer. + * {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index. + * {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value. + * {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required + by the OpenCL implementation on the device. + * {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources + required by the OpenCL implementation on the host. +-- +endif::cl_ext_buffer_device_address[] + [open,refpage='clSetKernelExecInfo',desc='Set additional execution information for a kernel.',type='protos'] -- To set additional execution information for a kernel, call the function @@ -10844,6 +10949,19 @@ include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM If {clSetKernelExecInfo} has not been called with a value for {CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM}, the default value is {CL_TRUE}. + +ifdef::cl_ext_buffer_device_address[] +| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor} + +include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[] + | {cl_mem_device_address_EXT_TYPE}[] + | Device pointers must reference locations contained entirely within + buffers that are passed to kernel as arguments, or that are passed + through the execution information. Non-argument device pointers accessed + by the kernel must be specified by passing pointers to those buffers + via this {clSetKernelExecInfo} option. +endif::cl_ext_buffer_device_address[] + |==== // refError @@ -10853,7 +10971,14 @@ successfully. Otherwise, it returns one of the following errors: * {CL_INVALID_KERNEL} if _kernel_ is a not a valid kernel object. - * {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support SVM. + * {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_SVM_PTRS} if no devices in + the context associated with _kernel_ support SVM. +ifdef::cl_ext_buffer_device_address[] + * {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} if no + device in the context associated with _kernel_ support device pointers. +endif::cl_ext_buffer_device_address[] + * {CL_INVALID_VALUE} if _param_name_ is not valid, if _param_value_ is + `NULL` or if the size specified by _param_value_size_ is not valid. * {CL_INVALID_OPERATION} if _param_name_ is {CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} and _param_value_ is {CL_TRUE} and no devices in the context associated with _kernel_ support fine-grain diff --git a/extensions/extensions.txt b/extensions/extensions.txt index ab17caa3f..e30fa4dee 100644 --- a/extensions/extensions.txt +++ b/extensions/extensions.txt @@ -37,6 +37,8 @@ Khronos{R} OpenCL Working Group include::cl_ext_float_atomics.asciidoc[] <<< include::cl_ext_image_raw10_raw12.asciidoc[] +<<< +include::cl_ext_buffer_device_address.asciidoc[] // Vendor Extensions :leveloffset: 0 diff --git a/xml/cl.xml b/xml/cl.xml index 39c3a4c4c..c1097b319 100644 --- a/xml/cl.xml +++ b/xml/cl.xml @@ -255,6 +255,7 @@ server's OpenCL/api-docs repository. typedef cl_bitfield cl_platform_command_buffer_capabilities_khr; typedef cl_bitfield cl_mutable_dispatch_asserts_khr typedef cl_bitfield cl_device_kernel_clock_capabilities_khr; + typedef cl_ulong cl_mem_device_address_ext; Structure types @@ -719,6 +720,8 @@ server's OpenCL/api-docs repository. + + @@ -911,7 +914,7 @@ server's OpenCL/api-docs repository. - + @@ -1630,7 +1633,8 @@ server's OpenCL/api-docs repository. - + + @@ -1723,7 +1727,8 @@ server's OpenCL/api-docs repository. - + + @@ -3730,6 +3735,12 @@ server's OpenCL/api-docs repository. cl_uint arg_index const void* arg_value + + cl_int clSetKernelArgDevicePointerEXT + cl_kernel kernel + cl_uint arg_index + const void* arg_value + cl_int clSetKernelExecInfo cl_kernel kernel @@ -7191,6 +7202,24 @@ server's OpenCL/api-docs repository. + + + + + + + + + + + + + + + + + + From dbc5b7e6d50b2bfb67947ad9295ff0b949696fdd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?= Date: Thu, 12 Dec 2024 17:25:19 +0200 Subject: [PATCH 2/5] BDA: Removed CL_MEM_DEVICE_SHARED_ADDRESS_EXT as unneeded. Also made the enums globally unique. --- api/cl_ext_buffer_device_address.asciidoc | 5 ++-- api/opencl_runtime_layer.asciidoc | 29 +++++------------------ xml/cl.xml | 27 +++++++++++---------- 3 files changed, 23 insertions(+), 38 deletions(-) diff --git a/api/cl_ext_buffer_device_address.asciidoc b/api/cl_ext_buffer_device_address.asciidoc index 26f83e1f9..40f68bb80 100644 --- a/api/cl_ext_buffer_device_address.asciidoc +++ b/api/cl_ext_buffer_device_address.asciidoc @@ -6,7 +6,7 @@ include::{generated}/meta/{refprefix}cl_ext_buffer_device_address.txt[] === Other Extension Metadata *Last Modified Date*:: - 2024-12-06 + 2024-12-12 *IP Status*:: No known IP claims. *Contributors*:: @@ -55,7 +55,6 @@ can allocate. * {cl_mem_properties_TYPE} ** {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} - ** {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} * {cl_mem_info_TYPE} ** {CL_MEM_DEVICE_ADDRESS_EXT} * {cl_kernel_exec_info_TYPE} @@ -68,6 +67,8 @@ can allocate. [options="header"] |==== | *Version* | *Date* | *Author* | *Changes* +| 0.9.1 | 2024-12-12 | Pekka Jääskeläinen | + Removed CL_MEM_DEVICE_SHARED_ADDRESS_EXT as unneeded. | 0.9.0 | 2024-12-06 | Pekka Jääskeläinen, Kevin Petit | Integrated to the main unified specification. Moved the functionality to clCreateBufferWithProperties, diff --git a/api/opencl_runtime_layer.asciidoc b/api/opencl_runtime_layer.asciidoc index bcb315172..478a885e3 100644 --- a/api/opencl_runtime_layer.asciidoc +++ b/api/opencl_runtime_layer.asciidoc @@ -617,15 +617,6 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciido allocated buffers can be computed by adding the sub-buffer origin to the device-specific start address. -| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor} - -include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[] - | {cl_bool_TYPE} - | When set to CL_TRUE, the buffer has otherwise the same properties as - when allocated using the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} flag, - but with an additional property that the buffer's address is the same across - all the devices in the context. - endif::cl_ext_buffer_device_address[] |==== @@ -697,12 +688,8 @@ ifdef::cl_khr_external_memory[] endif::cl_khr_external_memory[] ifdef::cl_ext_buffer_device_address[] * {CL_INVALID_DEVICE} - ** If _properties_ includes either {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} or - {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there is at least one device in - the context that doesn't support such allocation. - * {CL_INVALID_VALUE} - ** If _properties_ includes both {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} and - {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} at the same time. + ** If _properties_ includes {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there + is at least one device in the context that doesn't support such allocation. endif::cl_ext_buffer_device_address[] [[memory-flags-table]] @@ -6513,9 +6500,7 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[] | If _memobj_ was created using {clCreateBufferWithProperties} with the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property set to CL_TRUE, returns a list of device addresses for the buffer, one for each - device in the context. If the buffer was allocated - with the {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} property, - only one device address is returned. + device in the context. endif::cl_ext_buffer_device_address[] @@ -6537,8 +6522,7 @@ Otherwise, it returns one of the following errors: ifdef::cl_ext_buffer_device_address[] ** Returned for the {CL_MEM_DEVICE_ADDRESS_EXT} query if the {cl_ext_buffer_device_address_EXT} is not supported or if the - buffer was not allocated with neither {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or - {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}. + buffer was not allocated with {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}. endif::cl_ext_buffer_device_address[] * {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required by the OpenCL implementation on the device. @@ -10862,9 +10846,8 @@ include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[] value is changed by a call to {clSetKernelArgDevicePointerEXT} for _kernel_. The device pointer can only be used for arguments that are declared to be a pointer to `global` memory allocated with {clCreateBufferWithProperties} with - either the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} - property. The pointer value specified as the argument value - can be the pointer to the beginning of the buffer or any offset into + the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property. The pointer value specified as + the argument value can be the pointer to the beginning of the buffer or any offset into the buffer region. The device pointer value must be naturally aligned according to the argument's type. diff --git a/xml/cl.xml b/xml/cl.xml index c1097b319..65f5821f6 100644 --- a/xml/cl.xml +++ b/xml/cl.xml @@ -720,8 +720,6 @@ server's OpenCL/api-docs repository. - - @@ -914,7 +912,7 @@ server's OpenCL/api-docs repository. - + @@ -1633,8 +1631,7 @@ server's OpenCL/api-docs repository. - - + @@ -1727,8 +1724,7 @@ server's OpenCL/api-docs repository. - - + @@ -2320,6 +2316,12 @@ server's OpenCL/api-docs repository. + + + + + +