Skip to content

VX_RPP Audio support - HIP backend#1626

Merged
kiritigowda merged 49 commits intoROCm:developfrom
fiona-gladwin:fg/audio_support_gpu
Mar 30, 2026
Merged

VX_RPP Audio support - HIP backend#1626
kiritigowda merged 49 commits intoROCm:developfrom
fiona-gladwin:fg/audio_support_gpu

Conversation

@fiona-gladwin
Copy link
Copy Markdown
Contributor

Motivation

Adds HIP backend support for audio processing operations in the AMD OpenVX RPP extension. It enables GPU acceleration for audio augmentation functions that were previously only available on CPU.

Technical Details

  • Implements HIP kernels for TensorMulScalar and TensorAddTensor operations with custom CUDA-style kernels
  • Adds HIP backend support to 9 audio processing operations: ToDecibels, Spectrogram, Resample, PreemphasisFilter, NonSilentRegionDetection, MelFilterBank, and Downmix
  • Updates audio tensor descriptor logic to correctly handle 2D and 3D audio tensors

Test Plan

Test Result

Submission Checklist

swetha097 and others added 26 commits August 7, 2024 12:30
Remove print statements
The unified API changes needs to accomodate the HIP backend for the audio augmentations. This commit adds the necessary changes to the Resample node and the audio augmentations in RPP to support the HIP backend.
Copy link
Copy Markdown
Contributor

@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLease move the low-level kernel implementations to RPP instead

Comment thread amd_openvx_extensions/amd_rpp/amd_rpp_hip/amd_rpp_hip_host_decls.h Outdated
@simonCatBot
Copy link
Copy Markdown

simonCatBot commented Mar 20, 2026

Review: VX_RPP Audio Support - HIP Backend

Thank you for this substantial contribution enabling GPU acceleration for audio operations in the AMD OpenVX RPP extension.

Summary of Changes

This PR adds HIP backend support to 9 audio processing operations that were previously CPU-only:

  • TensorMulScalar / TensorAddTensor - Core tensor operations with custom HIP kernels
  • ToDecibels - Audio amplitude-to-decibel conversion
  • Spectrogram - Time-frequency representation
  • Resample - Audio sample rate conversion with windowed sinc interpolation
  • PreemphasisFilter - High-frequency emphasis for speech processing
  • NonSilentRegionDetection - Voice activity detection
  • MelFilterBank - Mel-frequency spectrogram computation
  • Downmix - Multi-channel to stereo/mono conversion

Key Technical Highlights

1. Audio Tensor Dimension Handling
The fix correctly distinguishes between 2D (BS×dim1) and 3D (BS×dim1×dim2) audio tensors. This is important since audio data varies between 1D waveforms and 2D spectrograms.

2. Memory Management Strategy
The pattern of for GPU path and for CPU path is consistent and correct:

  • Downmix: allocation
  • MelFilterBank: allocation
  • NonSilentRegion: allocation
  • PreemphasisFilter: and allocation

3. Resample Window LUT
The introduction of and the refactor with pinned memory support for the lookup table is a thoughtful optimization for GPU resampling performance.

4. Backend Selection Pattern
The unified backend selection pattern using enum:

0 ""

0 ""

0 ""

1 "/usr/include/stdc-predef.h" 1 3 4

0 "" 2

1 ""

This cleanly consolidates the CPU/GPU paths and reduces code duplication.

5. Version Updates

  • CMake: 3.2.0 → 3.3.2
  • Header: PATCH 1 → 2
  • RPP version requirement updated to support audio features

API Change Notice

The signature change (adding parameter) is a breaking change. Projects using this function will need updates. Consider documenting this in release notes.

Suggestions

  1. Testing: Given the complexity of audio resampling algorithms, comprehensive unit tests comparing CPU vs GPU outputs would strengthen confidence
  2. Documentation: Update API documentation to reflect the new ROI parameter for TensorMulScalar
  3. Error handling: Consider adding validation for the branch to catch malformed tensors earlier

Assessment

Needs Review - This is a significant feature addition with architectural changes. While the implementation looks solid, given the scope (9 operations, memory management changes, API modifications), this warrants thorough maintainer review and testing validation before merge.

The HIP kernel implementations appear well-structured and the memory management patterns are consistent with existing GPU operations in the codebase.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 14 comments.

Comments suppressed due to low confidence (5)

amd_openvx_extensions/amd_rpp/source/tensor/Spectrogram.cpp:78

  • refreshSpectrogram uses roi_tensor_ptr_src/dst after the device-type switch, but those pointers are only set inside #if ENABLE_HIP in the GPU branch. With ENABLE_HIP=0 and GPU affinity, the pointers are uninitialized and updateDstRoi() will dereference invalid memory. Return VX_ERROR_NOT_IMPLEMENTED (or handle OpenCL) when GPU is selected but HIP support is not compiled in.
    void *roi_tensor_ptr_src, *roi_tensor_ptr_dst;
    if (data->deviceType == AGO_TARGET_AFFINITY_GPU) {
#if ENABLE_HIP
    STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HIP, &data->pSrc, sizeof(data->pSrc)));
    STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HIP, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
    STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HIP, &data->pDst, sizeof(data->pDst)));
    STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[3], VX_TENSOR_BUFFER_HIP, &roi_tensor_ptr_dst, sizeof(roi_tensor_ptr_dst)));
#endif
    } else if (data->deviceType == AGO_TARGET_AFFINITY_CPU) {
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HOST, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HOST, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HOST, &data->pDst, sizeof(data->pDst)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[3], VX_TENSOR_BUFFER_HOST, &roi_tensor_ptr_dst, sizeof(roi_tensor_ptr_dst)));
    }
    updateDstRoi(data, reinterpret_cast<RpptROI *>(roi_tensor_ptr_src), reinterpret_cast<RpptROI *>(roi_tensor_ptr_dst));
    return status;

amd_openvx_extensions/amd_rpp/source/tensor/PreemphasisFilter.cpp:57

  • The GPU branch only populates roi_tensor_ptr_src under #if ENABLE_HIP, but roi_tensor_ptr_src is always dereferenced afterwards to compute pSampleSize. In OpenCL-only builds (ENABLE_HIP=0) with GPU affinity this leaves the pointer uninitialized and can crash. Return VX_ERROR_NOT_IMPLEMENTED (or handle OpenCL) when GPU is selected but HIP is not compiled in.
    void *roi_tensor_ptr_src;
    STATUS_ERROR_CHECK(vxCopyArrayRange((vx_array)parameters[3], 0, data->pSrcDesc->n, sizeof(float), data->pPreemphCoeff, VX_READ_ONLY, VX_MEMORY_TYPE_HOST));
    if (data->deviceType == AGO_TARGET_AFFINITY_GPU) {
#if ENABLE_HIP
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HIP, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HIP, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HIP, &data->pDst, sizeof(data->pDst)));
#endif
    }
    if (data->deviceType == AGO_TARGET_AFFINITY_CPU) {
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HOST, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HOST, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HOST, &data->pDst, sizeof(data->pDst)));
    }
    RpptROI *src_roi = reinterpret_cast<RpptROI *>(roi_tensor_ptr_src);
    for (int n = 0; n < data->inputTensorDims[0]; n++)
        data->pSampleSize[n] = src_roi[n].xywhROI.roiWidth * src_roi[n].xywhROI.roiHeight;

amd_openvx_extensions/amd_rpp/source/tensor/NonSilentRegionDetection.cpp:60

  • roi_tensor_ptr is only set inside #if ENABLE_HIP for the GPU path, but it is unconditionally dereferenced afterwards to fill pSrcLength. If ENABLE_HIP=0 (e.g., OpenCL build) and the node affinity is GPU, this will use an uninitialized pointer. Add an #else return (like VX_ERROR_NOT_IMPLEMENTED) inside the GPU branch when HIP isn’t enabled.
    void *roi_tensor_ptr;
    if (data->deviceType == AGO_TARGET_AFFINITY_GPU) {
#if ENABLE_HIP
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HIP, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HIP, &roi_tensor_ptr, sizeof(roi_tensor_ptr)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HIP, &data->pDst1, sizeof(data->pDst1)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[3], VX_TENSOR_BUFFER_HIP, &data->pDst2, sizeof(data->pDst2)));
#endif
    } else if (data->deviceType == AGO_TARGET_AFFINITY_CPU) {
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HOST, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HOST, &roi_tensor_ptr, sizeof(roi_tensor_ptr)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HOST, &data->pDst1, sizeof(data->pDst1)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[3], VX_TENSOR_BUFFER_HOST, &data->pDst2, sizeof(data->pDst2)));
    }
    unsigned *src_roi = static_cast<unsigned *>(roi_tensor_ptr);
    for (unsigned i = 0, j = 0; i < data->inputTensorDims[0]; i++, j += 4)
        data->pSrcLength[i] = src_roi[j + 2];

amd_openvx_extensions/amd_rpp/source/tensor/Downmix.cpp:56

  • In the GPU branch, roi_tensor_ptr_src is only set under #if ENABLE_HIP, but it is always dereferenced afterwards to populate pSrcRoi. With ENABLE_HIP=0 and GPU affinity (e.g., OpenCL build), this can crash due to an uninitialized pointer. Add an #else return VX_ERROR_NOT_IMPLEMENTED (or implement the OpenCL buffer path) inside the GPU branch.
    void *roi_tensor_ptr_src;
    if (data->deviceType == AGO_TARGET_AFFINITY_GPU) {
#if ENABLE_HIP
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HIP, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HIP, &data->pDst, sizeof(data->pDst)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HIP, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
#endif
    } else if (data->deviceType == AGO_TARGET_AFFINITY_CPU) {
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HOST, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HOST, &data->pDst, sizeof(data->pDst)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HOST, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
    }
    RpptROI *src_roi = reinterpret_cast<RpptROI *>(roi_tensor_ptr_src);
    for (int n = 0; n < data->inputTensorDims[0]; n++) {
        data->pSrcRoi[n * 2] = src_roi[n].xywhROI.roiWidth;
        data->pSrcRoi[n * 2 + 1] = src_roi[n].xywhROI.roiHeight;
    }

amd_openvx_extensions/amd_rpp/CMakeLists.txt:196

  • The audio-support CMake condition mixes OR and AND without parentheses. In CMake, AND binds tighter than OR, so for rpp_VERSION_MAJOR > 3 the check can enable audio support even when RPP_AUDIO_AUGMENTATIONS_SUPPORT_FOUND is false. Wrap the version comparison in parentheses so the support flag is required in all cases (i.e., (version_ok) AND RPP_AUDIO_AUGMENTATIONS_SUPPORT_FOUND).
# Audio features for VX_RPP
if((${rpp_VERSION_MAJOR} VERSION_GREATER "3") OR (${rpp_VERSION_MAJOR} VERSION_EQUAL "3" AND ${rpp_VERSION_MINOR} VERSION_GREATER_EQUAL "1" AND ${rpp_VERSION_PATCH} VERSION_GREATER_EQUAL "1")
AND RPP_AUDIO_AUGMENTATIONS_SUPPORT_FOUND)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DAUDIO_SUPPORT=1")
    message("-- ${White}vx_rpp Audio Features Included${ColourReset}")
    target_compile_definitions(vx_rpp PUBLIC RPP_AUDIO=1)
else()
    target_compile_definitions(vx_rpp PUBLIC RPP_AUDIO=0)
    message("-- ${Yellow}vx_rpp Audio Features Excluded${ColourReset}")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread amd_openvx_extensions/amd_rpp/source/tensor/ToDecibels.cpp
Comment thread amd_openvx_extensions/amd_rpp/source/tensor/Resample.cpp
Comment thread amd_openvx_extensions/amd_rpp/source/tensor/MelFilterBank.cpp
Comment thread amd_openvx_extensions/amd_rpp/source/tensor/TensorMulScalar.cpp
Comment thread amd_openvx_extensions/amd_rpp/source/tensor/TensorMulScalar.cpp
Comment thread amd_openvx_extensions/amd_rpp/source/tensor/TensorAddTensor.cpp
Comment thread amd_openvx_extensions/amd_rpp/source/tensor/TensorAddTensor.cpp Outdated
Comment thread amd_openvx_extensions/amd_rpp/source/kernel_rpp.cpp
Comment thread amd_openvx_extensions/amd_rpp/include/vx_ext_rpp.h
Comment thread amd_openvx_extensions/amd_rpp/include/vx_ext_rpp.h
Copy link
Copy Markdown
Contributor

@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve all review comments

Comment thread amd_openvx_extensions/amd_rpp/include/vx_ext_rpp.h
Comment thread amd_openvx_extensions/amd_rpp/source/tensor/TensorAddTensor.cpp
Comment thread amd_openvx_extensions/amd_rpp/source/kernel_rpp.cpp
Comment thread amd_openvx_extensions/amd_rpp/source/tensor/TensorMulScalar.cpp
@fiona-gladwin
Copy link
Copy Markdown
Contributor Author

@rrawther @LakshmiKumar23 Addressed all review comments please check

@kiritigowda kiritigowda requested a review from rrawther March 24, 2026 21:57
@rrawther
Copy link
Copy Markdown
Contributor

@kiritigowda : Can you please check this PR. It is ready to be merged from my side

@kiritigowda kiritigowda merged commit e4ddf9c into ROCm:develop Mar 30, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants