Skip to content

ROCm: indexer_scores_wmma_kernel launched with empty body when rocWMMA is unavailable → uninitialized indexer scores on gfx1151 #348

@jamesburton

Description

@jamesburton

Summary

On ROCm/HIP builds, the host unconditionally launches indexer_scores_wmma_kernel, but that kernel's body is gated by #if __CUDA_ARCH__ >= 700. hipcc never defines __CUDA_ARCH__, so on HIP the kernel compiles to an empty body and the output scores buffer is left uninitialized on the head_dim == 128, n_head == 64, !quality_mode indexer path. This silently produces incorrect results on any ROCm GPU that takes this path (observed targeting gfx1151 / Radeon 8060S, Strix Halo).

Where

  • Kernel definition: ds4_cuda.cu (~line 4483), guarded #if __CUDA_ARCH__ >= 700.
  • Host launch: ds4_cuda.cu (~line 5002) — launches the WMMA kernel instead of falling through to the scalar indexer_scores_kernel, which fully initializes scores.

Impact

Wrong indexer output (silently, no error) on ROCm for the affected attention shape. Because it's an uninitialized-buffer read, symptoms can be nondeterministic.

Relationship to #311

PR #311 (9565c07) masks this by re-enabling the WMMA body on HIP — it changes the guard to #if __CUDA_ARCH__ >= 700 || defined(__HIP_DEVICE_COMPILE__) and supplies the matrix API via #include <rocwmma/rocwmma.hpp> + namespace wmma = rocwmma; in ds4_rocm.h. That works only where full rocWMMA headers are installed.

Environments that ship just the rocWMMA version header — notably the AMD HIP SDK for Windows (no rocwmma/ directory under include/) — cannot compile that path (wmma::fragment undefined). There, the correct behavior is to use the scalar fallback.

Suggested fix

Either:

Option (a) is what we're using to build/run natively on the Windows HIP SDK; happy to send a small PR for it if useful.

Environment

  • GPU: AMD Radeon 8060S (gfx1151, Strix Halo).
  • ROCm 7.1 (AMD HIP SDK for Windows) and ROCm 7.13 (Linux pip SDK).
  • Found while adding a native Windows (hipcc) ROCm build of the bench.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions