Skip to content

Add CUDA 13 runtime support and CI lane#134

Open
SamMausberg wants to merge 1 commit intoaccel-sim:devfrom
SamMausberg:cuda13_2_support
Open

Add CUDA 13 runtime support and CI lane#134
SamMausberg wants to merge 1 commit intoaccel-sim:devfrom
SamMausberg:cuda13_2_support

Conversation

@SamMausberg
Copy link
Copy Markdown

No description provided.

}
CUctx_st *context = GPGPUSim_Context(ctx);
function_info *entry = context->get_kernel(hostFun);
#if CUDART_VERSION < 10000
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably here to ensure that sub 10 CUDAs work.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I changed it new CUDA launch relied on __cudaPopCallConfiguration() being no-op. Meaning that the launch config would sit on g_cuda_launch-stack, so that cudaLaunchKernelInternal() did not need to call cudaConfigureCallInternal() for CUDART_VERSION >= 10000, you can check this from before my changes in libcuda/cuda_runtime_api.cc in line 2042.

I made __cudaPopCallConfiguration() to actually take the saved launch config and return gridDim, blockDim, sharedMem, stream, which is what newer CUDA expects. The modern pattern is __cudaPushCallConfiguration(..) -> __cudaPopCallConfiguration(...) -> cudaLaunchKernel(...).

To further clarify, when __cudaPopCallConfiguration(), cudaLaunchKernelInternal() then must rebuild the launch state, which is why the call is unconditional. So we did not remove support for < 10 CUDA, we just made the path consistent. I also reran the 11.7 lane after this, and it passed with short-tests-cmake.sh 10/10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants