-
Notifications
You must be signed in to change notification settings - Fork 10
[Perf] Streams 1: Add CUDA stream and event API #407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| import weakref | ||
|
|
||
| from quadrants.lang import impl | ||
|
|
||
|
|
||
| def _get_prog_weakref(): | ||
| return weakref.ref(impl.get_runtime().prog) | ||
|
|
||
|
|
||
| class Stream: | ||
| """Wraps a backend-specific GPU stream for concurrent kernel execution. | ||
|
|
||
| On backends without native streams (e.g. CPU), this is a no-op object. | ||
| Call destroy() explicitly or use as a context manager to ensure cleanup. | ||
| """ | ||
|
|
||
| def __init__(self, handle: int, prog_ref: weakref.ref | None = None): | ||
| self._handle = handle | ||
| self._prog_ref = prog_ref | ||
|
|
||
| @property | ||
| def handle(self) -> int: | ||
| return self._handle | ||
|
|
||
| def synchronize(self): | ||
| """Block until all operations on this stream complete.""" | ||
| prog = impl.get_runtime().prog | ||
| prog.stream_synchronize(self._handle) | ||
|
|
||
| def destroy(self): | ||
| """Explicitly destroy the stream. Safe to call multiple times.""" | ||
| if self._handle != 0: | ||
| prog = impl.get_runtime().prog | ||
| prog.stream_destroy(self._handle) | ||
| self._handle = 0 | ||
|
|
||
| def __del__(self): | ||
| if self._handle != 0 and self._prog_ref is not None: | ||
| prog = self._prog_ref() | ||
| if prog is not None: | ||
| try: | ||
| prog.stream_destroy(self._handle) | ||
| self._handle = 0 | ||
| except Exception: | ||
| pass | ||
|
|
||
| def __enter__(self): | ||
| return self | ||
|
|
||
| def __exit__(self, *args): | ||
| self.destroy() | ||
|
|
||
|
|
||
| class Event: | ||
| """Wraps a backend-specific GPU event for stream synchronization. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you clarify what is an "event" in the documentation? I have no idea what it is. |
||
|
|
||
| On backends without native events (e.g. CPU), this is a no-op object. | ||
| Call destroy() explicitly or use as a context manager to ensure cleanup. | ||
| """ | ||
|
|
||
| def __init__(self, handle: int, prog_ref: weakref.ref | None = None): | ||
| self._handle = handle | ||
| self._prog_ref = prog_ref | ||
|
|
||
| @property | ||
| def handle(self) -> int: | ||
| return self._handle | ||
|
|
||
| def record(self, qd_stream: Stream | None = None): | ||
| """Record this event on a stream. None means the default stream.""" | ||
| prog = impl.get_runtime().prog | ||
| stream_handle = qd_stream.handle if qd_stream is not None else 0 | ||
| prog.event_record(self._handle, stream_handle) | ||
|
|
||
| def wait(self, qd_stream: Stream | None = None): | ||
| """Make a stream wait for this event. None means the default stream.""" | ||
| prog = impl.get_runtime().prog | ||
| stream_handle = qd_stream.handle if qd_stream is not None else 0 | ||
| prog.stream_wait_event(stream_handle, self._handle) | ||
|
|
||
| def synchronize(self): | ||
| """Block the host until this event has been reached.""" | ||
| prog = impl.get_runtime().prog | ||
| prog.event_synchronize(self._handle) | ||
|
|
||
| def destroy(self): | ||
| """Explicitly destroy the event. Safe to call multiple times.""" | ||
| if self._handle != 0: | ||
| prog = impl.get_runtime().prog | ||
| prog.event_destroy(self._handle) | ||
| self._handle = 0 | ||
|
|
||
| def __del__(self): | ||
| if self._handle != 0 and self._prog_ref is not None: | ||
|
Comment on lines
+88
to
+94
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally I prefer |
||
| prog = self._prog_ref() | ||
| if prog is not None: | ||
| try: | ||
| prog.event_destroy(self._handle) | ||
| self._handle = 0 | ||
| except Exception: | ||
| pass | ||
|
|
||
| def __enter__(self): | ||
| return self | ||
|
|
||
| def __exit__(self, *args): | ||
| self.destroy() | ||
|
|
||
|
|
||
| def create_stream() -> Stream: | ||
| """Create a new GPU stream for concurrent kernel execution.""" | ||
| prog = impl.get_runtime().prog | ||
| handle = prog.stream_create() | ||
| return Stream(handle, _get_prog_weakref()) | ||
|
|
||
|
|
||
| def create_event() -> Event: | ||
| """Create a new GPU event for stream synchronization.""" | ||
| prog = impl.get_runtime().prog | ||
| handle = prog.event_create() | ||
| return Event(handle, _get_prog_weakref()) | ||
|
|
||
|
|
||
| __all__ = ["Stream", "Event", "create_stream", "create_event"] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -495,7 +495,16 @@ void export_lang(py::module &m) { | |
| .def("compile_kernel", &Program::compile_kernel, | ||
| py::return_value_policy::reference) | ||
| .def("launch_kernel", &Program::launch_kernel) | ||
| .def("get_device_caps", &Program::get_device_caps); | ||
| .def("get_device_caps", &Program::get_device_caps) | ||
| .def("stream_create", &Program::stream_create) | ||
| .def("stream_destroy", &Program::stream_destroy) | ||
| .def("stream_synchronize", &Program::stream_synchronize) | ||
| .def("set_current_cuda_stream", &Program::set_current_cuda_stream) | ||
| .def("event_create", &Program::event_create) | ||
| .def("event_destroy", &Program::event_destroy) | ||
| .def("event_record", &Program::event_record) | ||
| .def("event_synchronize", &Program::event_synchronize) | ||
| .def("stream_wait_event", &Program::stream_wait_event); | ||
|
Comment on lines
+499
to
+507
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is cuda-specific and what is not? Only 'set_current_cuda_stream' is cuda specific? if so, stream are still usable on other backend or this function is necessary to make it useful? |
||
|
|
||
| py::class_<CompileResult>(m, "CompileResult") | ||
| .def_property_readonly( | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ PER_CUDA_FUNCTION(context_set_limit, cuCtxSetLimit, int, std::size_t); | |
|
|
||
| // Stream management | ||
| PER_CUDA_FUNCTION(stream_create, cuStreamCreate, void **, uint32); | ||
| PER_CUDA_FUNCTION(stream_destroy, cuStreamDestroy_v2, void *); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is 'cuStreamDestroy_v2' ? very weird name. Why do we have functions with '_v2' suffix at multiple places? |
||
|
|
||
| // Memory management | ||
| PER_CUDA_FUNCTION(memcpy_host_to_device, cuMemcpyHtoD_v2, void *, void *, std::size_t); | ||
|
|
@@ -52,6 +53,7 @@ PER_CUDA_FUNCTION(kernel_set_attribute, cuFuncSetAttribute, void *, CUfunction_a | |
|
|
||
| // Stream management | ||
| PER_CUDA_FUNCTION(stream_synchronize, cuStreamSynchronize, void *); | ||
| PER_CUDA_FUNCTION(stream_wait_event, cuStreamWaitEvent, void *, void *, uint32); | ||
|
|
||
| // Event management | ||
| PER_CUDA_FUNCTION(event_create, cuEventCreate, void **, uint32) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather pretend it can only be used as context manager, aligning with the API for torch.profiler. Because managing streams manually without context sounds a bad practice and should be made easy.