Skip to content

Handle CUDA 13 CI compatibility in gpu-app-collection#85

Open
SamThe602 wants to merge 4 commits intoaccel-sim:devfrom
SamThe602:cuda13_1_ci_backcompat
Open

Handle CUDA 13 CI compatibility in gpu-app-collection#85
SamThe602 wants to merge 4 commits intoaccel-sim:devfrom
SamThe602:cuda13_1_ci_backcompat

Conversation

@SamThe602
Copy link

No description provided.

config.MEM_CLK_FREQUENCY = deviceProp.memoryClockRate * 1e-3f;
config.MEM_BITWIDTH = deviceProp.memoryBusWidth;
config.CLK_FREQUENCY = clockRateKHz * 1e-3f;
if (memoryClockRateKHz > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is with all the zeros?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify my intent: the > 0 means if CUDA returns positive, we overwrite the default. If it returns 0, it's because the query is unsupported; it leaves it alone. I made this choice because before we used deviceProp.memoryClockRate and deviceProp.memoryBusWidth, this needed to be switched to cudaDeviceGetAttribute()

I don't think its a correctness issue but yea its not the most readable I can change it if you'd like.


ARCH?=sm_80 sm_90a sm_100a sm_101 sm_120
ifneq ($(filter 13 14 15 16,$(CUDA_VERSION_MAJOR)),)
ARCH?=sm_80 sm_90a sm_100a sm_110 sm_120
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both the if and the else set the same thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point whoops 🤣

hid);

cudaThreadSynchronize();
cudaDeviceSynchronize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this exist in the old API?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's in CUDA 11.7.1 headers.

ifneq ($(filter 13 14 15 16,$(CUDA_VERSION_MAJOR)),)
ci: rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini
else
ci: rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini cuda_samples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samples should build it; it is an NVIDIA collection and should work at different CUDA versions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails with nvcc fatal: Unsupported gpu architecture 'compute_50' when in CUDA 13.1.1 environment.

How would you like me to handle this?

cuda_samples:
mkdir -p $(BINDIR)/$(BINSUBDIR)/
mkdir -p ./cuda/cuda-samples/build && cd ./cuda/cuda-samples/build && cmake .. && $(MAKE)
if [ ${CUDA_VERSION_MAJOR} -ge 13 ]; then \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the stuff in setup_environment, do we still need this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that setup_environment is ok for normal Makefile builds.

cuda_samples is a separate case, I believe cause it hardcodes 'CMAKE_CUDA_ARCHITECTURES' not sure.

^ could be completely wrong about this.

unsigned L2_BANKS = 0; // L2 Cache Banks (LTCs)
};
inline GpuConfig config;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's 0, the uBench will be wrong. Why is it failing?

If the syntax has changed, just update it to use the new syntax. There must still be a way to query device attributes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I don't think I changed the FBP_COUNT = 0 and L2_BANKS = 0

I think in the simulator path, they are overwritten from gpgpusim.config. In the hardware path filled with queryGrInfo().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These apps also run on real GPUs, not just in gpgpu-sim. The ubench uses device query to determine kernel parameters. So these value must be correct (matching the HW) for these kernels to perform as expected. This is not the config in the simulation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems like a pre-existing issue in commit 4becbe3.

The only changes I made to stuff like this in the file were to the clock rate, memory clock rate, and memory bus width.

Copy link
Author

@SamThe602 SamThe602 Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I misunderstood your point, yes, that's true. When I go over this, I can make that fix even though it was pre-existing.

src/Makefile Outdated
ci: rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini
else
ci: rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini cuda_samples
endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need cuda samples. If you need another commit, checkout in the makefile, or just update the submodule.

Updating the submodule might not be backward compatible. So checking out a newer commit might be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is probably the reason device query is failing?

Copy link
Author

@SamThe602 SamThe602 Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two deviceQuery binaries in the repo. One in cuda-samples and GPU_Microbenchmark.

GPU_Microbenchmark builds under CUDA 13.1.1. The CUDA-samples one is failing under CUDA 13.1.1. For the reason I replied above, because the CMake files are hardcoding compute_50.

I asked Tim, but I would also like to hear your thoughts on how you guys want me to handle this issue.

*** Never mind, I didn't notice, you answered in your original comment yes, I'll update the submodule to avoid this. Wasn't sure if you guys wanted me to touch this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants