VX_RPP Audio support - HIP backend by fiona-gladwin · Pull Request #1626 · ROCm/MIVisionX

fiona-gladwin · 2026-02-24T01:40:47Z

Motivation

Adds HIP backend support for audio processing operations in the AMD OpenVX RPP extension. It enables GPU acceleration for audio augmentation functions that were previously only available on CPU.

Technical Details

Implements HIP kernels for TensorMulScalar and TensorAddTensor operations with custom CUDA-style kernels
Adds HIP backend support to 9 audio processing operations: ToDecibels, Spectrogram, Resample, PreemphasisFilter, NonSilentRegionDetection, MelFilterBank, and Downmix
Updates audio tensor descriptor logic to correctly handle 2D and 3D audio tensors

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…odes

…r nodes and removed memcpy related changes

Remove print statements

…udio_support_gpu

The unified API changes needs to accomodate the HIP backend for the audio augmentations. This commit adds the necessary changes to the Resample node and the audio augmentations in RPP to support the HIP backend.

…udio_support_gpu

rrawther

PLease move the low-level kernel implementations to RPP instead

Add support to use rppt API instead of using specific kernels

…in/MIVisionX into fg/audio_support_gpu

simonCatBot · 2026-03-20T21:30:34Z

Review: VX_RPP Audio Support - HIP Backend

Thank you for this substantial contribution enabling GPU acceleration for audio operations in the AMD OpenVX RPP extension.

Summary of Changes

This PR adds HIP backend support to 9 audio processing operations that were previously CPU-only:

TensorMulScalar / TensorAddTensor - Core tensor operations with custom HIP kernels
ToDecibels - Audio amplitude-to-decibel conversion
Spectrogram - Time-frequency representation
Resample - Audio sample rate conversion with windowed sinc interpolation
PreemphasisFilter - High-frequency emphasis for speech processing
NonSilentRegionDetection - Voice activity detection
MelFilterBank - Mel-frequency spectrogram computation
Downmix - Multi-channel to stereo/mono conversion

Key Technical Highlights

1. Audio Tensor Dimension Handling
The fix correctly distinguishes between 2D (BS×dim1) and 3D (BS×dim1×dim2) audio tensors. This is important since audio data varies between 1D waveforms and 2D spectrograms.

2. Memory Management Strategy
The pattern of for GPU path and for CPU path is consistent and correct:

Downmix: allocation
MelFilterBank: allocation
NonSilentRegion: allocation
PreemphasisFilter: and allocation

3. Resample Window LUT
The introduction of and the refactor with pinned memory support for the lookup table is a thoughtful optimization for GPU resampling performance.

4. Backend Selection Pattern
The unified backend selection pattern using enum:

0 ""

1 "/usr/include/stdc-predef.h" 1 3 4

0 "" 2

1 ""

This cleanly consolidates the CPU/GPU paths and reduces code duplication.

5. Version Updates

CMake: 3.2.0 → 3.3.2
Header: PATCH 1 → 2
RPP version requirement updated to support audio features

API Change Notice

The signature change (adding parameter) is a breaking change. Projects using this function will need updates. Consider documenting this in release notes.

Suggestions

Testing: Given the complexity of audio resampling algorithms, comprehensive unit tests comparing CPU vs GPU outputs would strengthen confidence
Documentation: Update API documentation to reflect the new ROI parameter for TensorMulScalar
Error handling: Consider adding validation for the branch to catch malformed tensors earlier

Assessment

Needs Review - This is a significant feature addition with architectural changes. While the implementation looks solid, given the scope (9 operations, memory management changes, API modifications), this warrants thorough maintainer review and testing validation before merge.

The HIP kernel implementations appear well-structured and the memory management patterns are consistent with existing GPU operations in the codebase.

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 14 comments.

Comments suppressed due to low confidence (5)

amd_openvx_extensions/amd_rpp/source/tensor/Spectrogram.cpp:78

refreshSpectrogram uses roi_tensor_ptr_src/dst after the device-type switch, but those pointers are only set inside #if ENABLE_HIP in the GPU branch. With ENABLE_HIP=0 and GPU affinity, the pointers are uninitialized and updateDstRoi() will dereference invalid memory. Return VX_ERROR_NOT_IMPLEMENTED (or handle OpenCL) when GPU is selected but HIP support is not compiled in.

    void *roi_tensor_ptr_src, *roi_tensor_ptr_dst;
    if (data->deviceType == AGO_TARGET_AFFINITY_GPU) {
#if ENABLE_HIP
    STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HIP, &data->pSrc, sizeof(data->pSrc)));
    STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HIP, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
    STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HIP, &data->pDst, sizeof(data->pDst)));
    STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[3], VX_TENSOR_BUFFER_HIP, &roi_tensor_ptr_dst, sizeof(roi_tensor_ptr_dst)));
#endif
    } else if (data->deviceType == AGO_TARGET_AFFINITY_CPU) {
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HOST, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HOST, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HOST, &data->pDst, sizeof(data->pDst)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[3], VX_TENSOR_BUFFER_HOST, &roi_tensor_ptr_dst, sizeof(roi_tensor_ptr_dst)));
    }
    updateDstRoi(data, reinterpret_cast<RpptROI *>(roi_tensor_ptr_src), reinterpret_cast<RpptROI *>(roi_tensor_ptr_dst));
    return status;

amd_openvx_extensions/amd_rpp/source/tensor/PreemphasisFilter.cpp:57

The GPU branch only populates roi_tensor_ptr_src under #if ENABLE_HIP, but roi_tensor_ptr_src is always dereferenced afterwards to compute pSampleSize. In OpenCL-only builds (ENABLE_HIP=0) with GPU affinity this leaves the pointer uninitialized and can crash. Return VX_ERROR_NOT_IMPLEMENTED (or handle OpenCL) when GPU is selected but HIP is not compiled in.

    void *roi_tensor_ptr_src;
    STATUS_ERROR_CHECK(vxCopyArrayRange((vx_array)parameters[3], 0, data->pSrcDesc->n, sizeof(float), data->pPreemphCoeff, VX_READ_ONLY, VX_MEMORY_TYPE_HOST));
    if (data->deviceType == AGO_TARGET_AFFINITY_GPU) {
#if ENABLE_HIP
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HIP, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HIP, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HIP, &data->pDst, sizeof(data->pDst)));
#endif
    }
    if (data->deviceType == AGO_TARGET_AFFINITY_CPU) {
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HOST, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HOST, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HOST, &data->pDst, sizeof(data->pDst)));
    }
    RpptROI *src_roi = reinterpret_cast<RpptROI *>(roi_tensor_ptr_src);
    for (int n = 0; n < data->inputTensorDims[0]; n++)
        data->pSampleSize[n] = src_roi[n].xywhROI.roiWidth * src_roi[n].xywhROI.roiHeight;

amd_openvx_extensions/amd_rpp/source/tensor/NonSilentRegionDetection.cpp:60

roi_tensor_ptr is only set inside #if ENABLE_HIP for the GPU path, but it is unconditionally dereferenced afterwards to fill pSrcLength. If ENABLE_HIP=0 (e.g., OpenCL build) and the node affinity is GPU, this will use an uninitialized pointer. Add an #else return (like VX_ERROR_NOT_IMPLEMENTED) inside the GPU branch when HIP isn’t enabled.

    void *roi_tensor_ptr;
    if (data->deviceType == AGO_TARGET_AFFINITY_GPU) {
#if ENABLE_HIP
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HIP, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HIP, &roi_tensor_ptr, sizeof(roi_tensor_ptr)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HIP, &data->pDst1, sizeof(data->pDst1)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[3], VX_TENSOR_BUFFER_HIP, &data->pDst2, sizeof(data->pDst2)));
#endif
    } else if (data->deviceType == AGO_TARGET_AFFINITY_CPU) {
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HOST, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HOST, &roi_tensor_ptr, sizeof(roi_tensor_ptr)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HOST, &data->pDst1, sizeof(data->pDst1)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[3], VX_TENSOR_BUFFER_HOST, &data->pDst2, sizeof(data->pDst2)));
    }
    unsigned *src_roi = static_cast<unsigned *>(roi_tensor_ptr);
    for (unsigned i = 0, j = 0; i < data->inputTensorDims[0]; i++, j += 4)
        data->pSrcLength[i] = src_roi[j + 2];

amd_openvx_extensions/amd_rpp/source/tensor/Downmix.cpp:56

In the GPU branch, roi_tensor_ptr_src is only set under #if ENABLE_HIP, but it is always dereferenced afterwards to populate pSrcRoi. With ENABLE_HIP=0 and GPU affinity (e.g., OpenCL build), this can crash due to an uninitialized pointer. Add an #else return VX_ERROR_NOT_IMPLEMENTED (or implement the OpenCL buffer path) inside the GPU branch.

    void *roi_tensor_ptr_src;
    if (data->deviceType == AGO_TARGET_AFFINITY_GPU) {
#if ENABLE_HIP
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HIP, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HIP, &data->pDst, sizeof(data->pDst)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HIP, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
#endif
    } else if (data->deviceType == AGO_TARGET_AFFINITY_CPU) {
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[0], VX_TENSOR_BUFFER_HOST, &data->pSrc, sizeof(data->pSrc)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[1], VX_TENSOR_BUFFER_HOST, &data->pDst, sizeof(data->pDst)));
        STATUS_ERROR_CHECK(vxQueryTensor((vx_tensor)parameters[2], VX_TENSOR_BUFFER_HOST, &roi_tensor_ptr_src, sizeof(roi_tensor_ptr_src)));
    }
    RpptROI *src_roi = reinterpret_cast<RpptROI *>(roi_tensor_ptr_src);
    for (int n = 0; n < data->inputTensorDims[0]; n++) {
        data->pSrcRoi[n * 2] = src_roi[n].xywhROI.roiWidth;
        data->pSrcRoi[n * 2 + 1] = src_roi[n].xywhROI.roiHeight;
    }

amd_openvx_extensions/amd_rpp/CMakeLists.txt:196

The audio-support CMake condition mixes OR and AND without parentheses. In CMake, AND binds tighter than OR, so for rpp_VERSION_MAJOR > 3 the check can enable audio support even when RPP_AUDIO_AUGMENTATIONS_SUPPORT_FOUND is false. Wrap the version comparison in parentheses so the support flag is required in all cases (i.e., (version_ok) AND RPP_AUDIO_AUGMENTATIONS_SUPPORT_FOUND).

# Audio features for VX_RPP
if((${rpp_VERSION_MAJOR} VERSION_GREATER "3") OR (${rpp_VERSION_MAJOR} VERSION_EQUAL "3" AND ${rpp_VERSION_MINOR} VERSION_GREATER_EQUAL "1" AND ${rpp_VERSION_PATCH} VERSION_GREATER_EQUAL "1")
AND RPP_AUDIO_AUGMENTATIONS_SUPPORT_FOUND)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DAUDIO_SUPPORT=1")
    message("-- ${White}vx_rpp Audio Features Included${ColourReset}")
    target_compile_definitions(vx_rpp PUBLIC RPP_AUDIO=1)
else()
    target_compile_definitions(vx_rpp PUBLIC RPP_AUDIO=0)
    message("-- ${Yellow}vx_rpp Audio Features Excluded${ColourReset}")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rrawther

Please resolve all review comments

…ttps://github.com/fiona-gladwin/MIVisionX into fg/audio_support_gpu

fiona-gladwin · 2026-03-24T18:27:39Z

@rrawther @LakshmiKumar23 Addressed all review comments please check

rrawther · 2026-03-30T17:35:07Z

@kiritigowda : Can you please check this PR. It is ready to be merged from my side

swetha097 and others added 26 commits August 7, 2024 12:30

Add hip backend support for spectrogram

765b421

Add hip backend support for preEmphasis filter

9f04acb

Add hip backend support for MFB filter

175892e

Add hip backend support for ToDecibels filter

79393fb

Add hip backend support for Downmix filter

e64ff8a

Add hip backend support for NSR + Slice

3e59393

added temporary HIP supoort for TensorAddTensor and TensorMulScalar n…

3c97428

…odes

added fix for handling 1D inputs

6b09fd4

added separate HIP kernels for tensor mul scalar and tensor add tenso…

5f76ecd

…r nodes and removed memcpy related changes

Update Slice.cpp

4324c79

Remove print statements

Fix minor bug in resample

6c33564

Merge branch 'develop' of https://github.com/ROCm/MIVisionX into fg/a…

cb95720

…udio_support_gpu

Add return status check for hip malloc and free

3722aad

Merge branch 'develop' into fg/audio_support_gpu

ed117f4

Fix Resample for HIP backend

1e69414

Fix build issue

beaac95

Fix warnings and copyright

228642a

Add EOF

e375932

Clean up code

6c1705e

Merge branch 'develop' of https://github.com/ROCm/MIVisionX into fg/a…

a52bb54

…udio_support_gpu

Fix build issues for Audio augmentations

5e3c769

The unified API changes needs to accomodate the HIP backend for the audio augmentations. This commit adds the necessary changes to the Resample node and the audio augmentations in RPP to support the HIP backend.

Fix stride for Resample HIP backend

802f767

Merge branch 'develop' of https://github.com/ROCm/MIVisionX into fg/a…

653c78d

…udio_support_gpu

Update CMake for amd_rpp_hip

0f6b43c

Minor change

ce86499

Address review comments

a8875c5

fiona-gladwin requested review from kiritigowda and rrawther as code owners February 24, 2026 01:40

SundarRajan98 requested a review from LakshmiKumar23 February 24, 2026 17:55

kiritigowda self-assigned this Feb 24, 2026

Merge branch 'develop' into fg/audio_support_gpu

25cc2ca

rrawther requested changes Mar 12, 2026

View reviewed changes

Comment thread amd_openvx_extensions/amd_rpp/amd_rpp_hip/amd_rpp_hip_host_decls.h Outdated

fiona-gladwin and others added 12 commits March 17, 2026 09:04

Remove AMD HIP kernels for tensor audio ops

d325601

Modify TensorAddTensor and TensorMulScalar

a8d46bc

Add support to use rppt API instead of using specific kernels

Fix issues with TensorAddTensor and TensorMulScalar

7839907

Minor change

1002cbe

Fix build issues

cc2341c

Add version check for the tensor audio ops

d8fe52c

Add RPP_AUDIO flag for window arg

2db1cae

Merge branch 'develop' into fg/audio_support_gpu

95a588d

Remove opencl check for Spectrogram

50218c5

Merge branch 'fg/audio_support_gpu' of https://github.com/fiona-gladw…

f82bb42

…in/MIVisionX into fg/audio_support_gpu

Update version

0c138a8

Merge branch 'develop' into fg/audio_support_gpu

c48b860

LakshmiKumar23 approved these changes Mar 23, 2026

View reviewed changes

Merge branch 'develop' into fg/audio_support_gpu

5d9c361

kiritigowda requested review from Copilot and rrawther March 24, 2026 03:36

Copilot started reviewing on behalf of kiritigowda March 24, 2026 03:38 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

rrawther requested changes Mar 24, 2026

View reviewed changes

Comment thread amd_openvx_extensions/amd_rpp/include/vx_ext_rpp.h

Comment thread amd_openvx_extensions/amd_rpp/source/tensor/TensorAddTensor.cpp

Comment thread amd_openvx_extensions/amd_rpp/source/kernel_rpp.cpp

Comment thread amd_openvx_extensions/amd_rpp/source/tensor/TensorMulScalar.cpp

fiona-gladwin added 2 commits March 24, 2026 14:24

Address review comments

bcc68d4

Merge branches 'fg/audio_support_gpu' and 'fg/audio_support_gpu' of h…

55e0851

…ttps://github.com/fiona-gladwin/MIVisionX into fg/audio_support_gpu

kiritigowda requested a review from rrawther March 24, 2026 21:57

Merge branch 'develop' into fg/audio_support_gpu

99b4ded

rrawther approved these changes Mar 30, 2026

View reviewed changes

kiritigowda merged commit e4ddf9c into ROCm:develop Mar 30, 2026
4 checks passed

Conversation

fiona-gladwin commented Feb 24, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

rrawther left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

simonCatBot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: VX_RPP Audio Support - HIP Backend

Summary of Changes

Key Technical Highlights

0 ""

0 ""

0 ""

1 "/usr/include/stdc-predef.h" 1 3 4

0 "" 2

1 ""

API Change Notice

Suggestions

Assessment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rrawther left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fiona-gladwin commented Mar 24, 2026

Uh oh!

rrawther commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

simonCatBot commented Mar 20, 2026 •

edited

Loading