Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
242 changes: 125 additions & 117 deletions api/opencl_runtime_layer.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -11942,45 +11942,57 @@ local work size.
executed successfully.
Otherwise, it returns one of the following errors:

* {CL_INVALID_COMMAND_QUEUE} if _command_queue_ is not a valid host
command-queue.
* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
* {CL_INVALID_CONTEXT} if the context associated with _kernel_ is not the
same as the context associated with _command_queue_.
* {CL_INVALID_PROGRAM_EXECUTABLE} if there is no successfully built
program executable available for _kernel_ for the device associated with
* {CL_INVALID_COMMAND_QUEUE}
** if _command_queue_ is not a valid host command-queue
* {CL_INVALID_KERNEL}
** if _kernel_ is not a valid kernel
* {CL_INVALID_PROGRAM_EXECUTABLE}
** if there is no successfully built program executable available for the device associated with _command_queue_
* {CL_INVALID_CONTEXT}
** if the context associated with _command_queue_ and _kernel_ are not the same
* {CL_INVALID_KERNEL_ARGS}
** if any kernel arguments for _kernel_ have not been set
* {CL_INVALID_WORK_DIMENSION}
** if _work_dim_ is not valid for the device associated with _command_queue_ (is greater than the value returned for {CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS})
* {CL_INVALID_GLOBAL_OFFSET}
** if _global_work_offset_ is not `NULL`.
This error condition does not apply when the device associated with
_command_queue_ supports OpenCL 1.1 or newer.
** if the value specified in _global_work_size_ plus the corresponding value
in _global_work_offset_ for any dimensions is greater than the maximum value
representable by {size_t_TYPE} on the device associated with _command_queue_
* {CL_INVALID_GLOBAL_WORK_SIZE}
** if _global_work_size_ is `NULL`
** if any of the values specified in _global_work_size_[0], ...
_global_work_size_[_work_dim_ - 1] are zero
** if any of the values specified in _global_work_size_[0], ...
_global_work_size_[_work_dim_ - 1] exceed the maximum value representable by
{size_t_TYPE} on the device associated with _command_queue_
* {CL_MISALIGNED_SUB_BUFFER_OFFSET}
** if a kernel argument for _kernel_ is a sub-buffer object and the offset
specified when the sub-buffer object is created is not aligned to
{CL_DEVICE_MEM_BASE_ADDR_ALIGN} for the device associated with
_command_queue_.
* {CL_INVALID_KERNEL_ARGS} if all argument values for _kernel_ have not
been set.
* {CL_MISALIGNED_SUB_BUFFER_OFFSET} if a sub-buffer object is set as an
argument to _kernel_ and the offset specified when the sub-buffer object
was created is not aligned to {CL_DEVICE_MEM_BASE_ADDR_ALIGN} for the
device associated with _command_queue_.
* {CL_INVALID_IMAGE_SIZE} if an image object is set as an argument to
_kernel_ and the image dimensions are not supported by device associated
with _command_queue_.
* {CL_IMAGE_FORMAT_NOT_SUPPORTED} if an image object is set as an argument
to _kernel_ and the image format is not supported by the device
associated with _command_queue_.
* {CL_INVALID_OPERATION} if an SVM pointer is set as an argument to
_kernel_ and the device associated with _command_queue_ does not support
SVM or the required SVM capabilities for the SVM pointer.
* {CL_INVALID_WORK_DIMENSION} if _work_dim_ is not a valid value (i.e. a
value between 1 and {CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS}).
* {CL_INVALID_GLOBAL_WORK_SIZE} if _global_work_size_ is NULL or if any of
the values specified in _global_work_size_ are 0.
* {CL_INVALID_GLOBAL_WORK_SIZE} if any of the values specified in
_global_work_size_ exceed the maximum value representable by `size_t` on
the device associated with _command_queue_.
* {CL_INVALID_GLOBAL_OFFSET} if the value specified in _global_work_size_
plus the corresponding value in _global_work_offset_ for dimension
exceeds the maximum value representable by `size_t` on the device
associated with _command_queue_.
* {CL_INVALID_VALUE} if _suggested_local_work_size_ is NULL.
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources
required by the OpenCL implementation on the device.
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
required by the OpenCL implementation on the host.
This error code is <<unified-spec, missing before>> version 1.1.
* {CL_INVALID_IMAGE_SIZE}
** if a kernel argument for _kernel_ is an image and the dimensions of the
image, such as the image width or image height, are not supported by the
device associated with _command_queue_
* {CL_IMAGE_FORMAT_NOT_SUPPORTED}
** if a kernel argument for _kernel_ is an image and the format of the
image, such as the image channel order or image channel data type, are not
supported by the device associated with _command_queue_
* {CL_INVALID_OPERATION}
** if SVM pointers are set as arguments for _kernel_ and the device associated with _command_queue_ does not support SVM
** if system pointers are set as arguments for _kernel_ and the device associated with _command_queue_ does not support fine-grain system SVM
* {CL_INVALID_VALUE}
** if _suggested_local_work_size_ is `NULL`
* {CL_OUT_OF_RESOURCES}
** if there is a failure to allocate resources required by the OpenCL
implementation on the device
* {CL_OUT_OF_HOST_MEMORY}
** if there is a failure to allocate resources required by the OpenCL
implementation on the host

NOTE: These error conditions are consistent with error conditions for
{clEnqueueNDRangeKernel}.
Expand Down Expand Up @@ -12115,90 +12127,86 @@ The starting local ID is always (0, 0, ..., 0).
successfully queued.
Otherwise, it returns one of the following errors:

* {CL_INVALID_PROGRAM_EXECUTABLE} if there is no successfully built program
executable available for the device associated with _command_queue_.
* {CL_INVALID_COMMAND_QUEUE} if _command_queue_ is not a valid host
command-queue.
* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
* {CL_INVALID_CONTEXT} if context associated with _command_queue_ and
_kernel_ are not the same or if the context associated with
_command_queue_ and events in _event_wait_list_ are not the same.
* {CL_INVALID_KERNEL_ARGS} if the kernel argument values have not been
specified.
* {CL_INVALID_WORK_DIMENSION} if _work_dim_ is not a valid value (i.e. a
value between 1 and {CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS}).
* {CL_INVALID_GLOBAL_WORK_SIZE} if _global_work_size_ is NULL or if any of
the values specified in _global_work_size_[0], ...
* {CL_INVALID_COMMAND_QUEUE}
** if _command_queue_ is not a valid host command-queue
* {CL_INVALID_KERNEL}
** if _kernel_ is not a valid kernel
* {CL_INVALID_PROGRAM_EXECUTABLE}
** if there is no successfully built program executable available for the device associated with _command_queue_
* {CL_INVALID_CONTEXT}
** if the context associated with _command_queue_ and _kernel_ are not the same
** if the context associated with _command_queue_ and events in _event_wait_list_ are not the same
* {CL_INVALID_KERNEL_ARGS}
** if any kernel arguments for _kernel_ have not been set
* {CL_INVALID_WORK_DIMENSION}
** if _work_dim_ is not valid for the device associated with _command_queue_ (is greater than the value returned for {CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS})
* {CL_INVALID_GLOBAL_OFFSET}
** if _global_work_offset_ is not `NULL`.
This error condition does not apply when the device associated with
_command_queue_ supports OpenCL 1.1 or newer.
** if the value specified in _global_work_size_ plus the corresponding value
in _global_work_offset_ for any dimensions is greater than the maximum value
representable by {size_t_TYPE} on the device associated with _command_queue_
* {CL_INVALID_GLOBAL_WORK_SIZE}
** if _global_work_size_ is `NULL`.
This error condition does not apply when the device associated with
_command_queue_ supports OpenCL 2.1 or newer.
** if any of the values specified in _global_work_size_[0], ...
_global_work_size_[_work_dim_ - 1] are zero.
This error condition does not apply when the device associated with
_command_queue_ supports OpenCL 2.1 or newer.
* {CL_INVALID_GLOBAL_WORK_SIZE} if any of the values specified in
_global_work_size_[0], ... _global_work_size_[_work_dim_ - 1] exceed the
maximum value representable by {size_t_TYPE} on the device on which the
kernel-instance will be enqueued.
* {CL_INVALID_GLOBAL_OFFSET} if the value specified in _global_work_size_
{plus} the corresponding values in _global_work_offset_ for any
dimensions is greater than the maximum value representable by size t on
the device on which the kernel-instance will be enqueued, or if
_global_work_offset_ is non-`NULL` <<unified-spec, before>> version 1.1.
* {CL_INVALID_WORK_GROUP_SIZE} if _local_work_size_ is specified and does
not match the required work-group size for _kernel_ in the program
source.
* {CL_INVALID_WORK_GROUP_SIZE} if _local_work_size_ is specified and is not
consistent with the required number of sub-groups for _kernel_ in the
program source.
* {CL_INVALID_WORK_GROUP_SIZE} if _local_work_size_ is specified and the
total number of work-items in the work-group computed as
_local_work_size_[0] {times} ... _local_work_size_[_work_dim_ - 1] is
greater than the value specified by {CL_KERNEL_WORK_GROUP_SIZE} in the
<<kernel-workgroup-info-table,Kernel Object Device Queries>> table.
* {CL_INVALID_WORK_GROUP_SIZE} if the work-group size must be uniform and
the _local_work_size_ is not `NULL`, is not equal to the required
work-group size specified in the kernel source, or the
_global_work_size_ is not evenly divisible by the _local_work_size_.
* {CL_INVALID_WORK_ITEM_SIZE} if the number of work-items specified in any
of _local_work_size_[0], ... _local_work_size_[_work_dim_ - 1] is
greater than the corresponding values specified by
{CL_DEVICE_MAX_WORK_ITEM_SIZES}[0], ...,
{CL_DEVICE_MAX_WORK_ITEM_SIZES}[_work_dim_ - 1].
* {CL_MISALIGNED_SUB_BUFFER_OFFSET} if a sub-buffer object is specified as
the value for an argument that is a buffer object and the _offset_
** if any of the values specified in _global_work_size_[0], ...
_global_work_size_[_work_dim_ - 1] exceed the maximum value representable by
{size_t_TYPE} on the device associated with _command_queue_
* {CL_INVALID_WORK_GROUP_SIZE}
** if _local_work_size_ is not `NULL`, if the work-group size must be uniform, and if the _global_work_size_ is not evenly divisible by the _local_work_size_
** if _local_work_size_ is not `NULL` and if the total number of work-items in the work-group is greater than the maximum work-group size supported for _kernel_ on the device associated with _command_queue_ (is greater than the value returned for {CL_KERNEL_WORK_GROUP_SIZE})
** if _local_work_size_ is not `NULL` and if the _local_work_size_ does not match the required work-group size for _kernel_
** if _local_work_size_ is not `NULL` and if the _local_work_size_ is not consistent with the required number of sub-groups for _kernel_
* {CL_INVALID_WORK_ITEM_SIZE}
** if the number of work-items specified in any dimension of _local_work_size_ is not valid for the device associated with _command_queue_ (is greater than the corresponding value returned for {CL_DEVICE_MAX_WORK_ITEM_SIZES})
* {CL_MISALIGNED_SUB_BUFFER_OFFSET}
** if a kernel argument for _kernel_ is a sub-buffer object and the offset
specified when the sub-buffer object is created is not aligned to
{CL_DEVICE_MEM_BASE_ADDR_ALIGN} value for device associated with _queue_.
{CL_DEVICE_MEM_BASE_ADDR_ALIGN} for the device associated with
_command_queue_.
This error code is <<unified-spec, missing before>> version 1.1.
* {CL_INVALID_IMAGE_SIZE} if an image object is specified as an argument
value and the image dimensions (image width, height, specified or
compute row and/or slice pitch) are not supported by device associated
with _queue_.
* {CL_IMAGE_FORMAT_NOT_SUPPORTED} if an image object is specified as an
argument value and the image format (image channel order and data type)
is not supported by device associated with _queue_.
* {CL_OUT_OF_RESOURCES} if there is a failure to queue the execution
instance of _kernel_ on the command-queue because of insufficient
resources needed to execute the kernel.
For example, the explicitly specified _local_work_size_ causes a failure
to execute the kernel because of insufficient resources such as
registers or local memory.
Another example would be the number of read-only image args used in
_kernel_ exceed the {CL_DEVICE_MAX_READ_IMAGE_ARGS} value for device or
the number of write-only and read-write image args used in _kernel_
exceed the {CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS} value for device or the
number of samplers used in _kernel_ exceed {CL_DEVICE_MAX_SAMPLERS} for
device.
* {CL_MEM_OBJECT_ALLOCATION_FAILURE} if there is a failure to allocate
memory for data store associated with image or buffer objects specified
as arguments to _kernel_.
* {CL_INVALID_EVENT_WAIT_LIST} if _event_wait_list_ is `NULL` and
_num_events_in_wait_list_ > 0, or _event_wait_list_ is not `NULL` and
_num_events_in_wait_list_ is 0, or if event objects in _event_wait_list_
are not valid events.
* {CL_INVALID_OPERATION} if SVM pointers are passed as arguments to a kernel
and the device does not support SVM, or if system pointers are passed as
arguments to a kernel and the device does not support fine-grain system SVM.
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
by the OpenCL implementation on the device.
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
required by the OpenCL implementation on the host.
* {CL_INVALID_IMAGE_SIZE}
** if a kernel argument for _kernel_ is an image and the dimensions of the
image, such as the image width or image height, are not supported by the
device associated with _command_queue_
* {CL_IMAGE_FORMAT_NOT_SUPPORTED}
** if a kernel argument for _kernel_ is an image and the format of the
image, such as the image channel order or image channel data type, are not
supported by the device associated with _command_queue_
* {CL_MEM_OBJECT_ALLOCATION_FAILURE}
** if there is a failure to allocate memory for the data store associated with any buffer or image object kernel arguments for _kernel_
* {CL_INVALID_EVENT_WAIT_LIST}
** if _event_wait_list_ is `NULL` and _num_events_in_wait_list_ is greater than zero
** if _event_wait_list_ is not `NULL` and _num_events_in_wait_list_ is zero
** if event objects in _event_wait_list_ are not valid events
* {CL_INVALID_OPERATION}
** if SVM pointers are set as arguments for _kernel_ and the device associated with _command_queue_ does not support SVM
** if system pointers are set as arguments for _kernel_ and the device associated with _command_queue_ does not support fine-grain system SVM
// TODO: Do we still need these explicit examples?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this can be removed as it is already specified in the footnote 7: https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#_footnotedef_7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder!

I agree footnote 7 is a good example where these out-of-resources cases are described elsewhere in the spec. So, I'm inclined to keep the same boilerplate text here for CL_OUT_OF_RESOURCES for clEnqueueNDRangeKernel vs. including the explicit examples.

If we really want to keep text like the following, which I don't believe is documented anywhere else in the spec...

For example, the explicitly specified local_work_size causes a failure to execute the kernel because of insufficient resources such as registers or local memory.

... perhaps it should move to a footnote, also? Or, we can just remove it completely, as is done in this PR.

// * {CL_OUT_OF_RESOURCES} if there is a failure to queue the execution
// instance of _kernel_ on the command-queue because of insufficient
// resources needed to execute the kernel.
// For example, the explicitly specified _local_work_size_ causes a failure
// to execute the kernel because of insufficient resources such as
// registers or local memory.
// Another example would be the number of read-only image args used in
// _kernel_ exceed the {CL_DEVICE_MAX_READ_IMAGE_ARGS} value for device or
// the number of write-only and read-write image args used in _kernel_
// exceed the {CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS} value for device or the
// number of samplers used in _kernel_ exceed {CL_DEVICE_MAX_SAMPLERS} for
// device.
* {CL_OUT_OF_RESOURCES}
** if there is a failure to allocate resources required by the OpenCL
implementation on the device
* {CL_OUT_OF_HOST_MEMORY}
** if there is a failure to allocate resources required by the OpenCL
implementation on the host
--

[open,refpage='clEnqueueTask',desc='Enqueues a command to execute a kernel, using a single work-item, on a device.',type='protos']
Expand Down