Skip to content

[TEST] Rocm jpeg decoder#9529

Open
NicolasHug wants to merge 38 commits into
mainfrom
testrocmjpeg
Open

[TEST] Rocm jpeg decoder#9529
NicolasHug wants to merge 38 commits into
mainfrom
testrocmjpeg

Conversation

@NicolasHug

@NicolasHug NicolasHug commented Jun 25, 2026

Copy link
Copy Markdown
Member

This PR shouldn't be merged - this is a clone of #9342 that I'm pusing to the origin repo so that the ROCm jobs can be triggered.

cc @jeffdaily @jithunnair-amd

xytpai and others added 30 commits January 8, 2026 03:18
The rocJPEG path duplicated the entire decode_jpegs_cuda() entry point
(input validation, decoder lifecycle, error handling) from the nvJPEG
path, and carried a large amount of rocJPEG-sample boilerplate that was
dead or misleading in this context.

This reworks the file so the backend-agnostic orchestration lives once
and is shared by both backends, and each backend only implements what is
genuinely backend-specific.

Review in this order:

1. decode_jpegs_cuda.h: both decoders expose a uniform
   decode_images(images, mode), and a GpuJpegDecoder type alias selects
   the compiled-in backend.

2. decode_jpegs_cuda.cpp shared region (NVJPEG_FOUND || ROCJPEG_FOUND):
   the single decode_jpegs_cuda() entry point plus a
   validate_and_make_contiguous() helper. The input validation, the
   device guard, the decoder singleton lifecycle, and the error wrapper
   are no longer duplicated between backends.

3. nvJPEG block: the mode-to-format mapping, the version-property
   warning, and the event-based stream synchronization move into
   CUDAJpegDecoder::decode_images(mode); the existing nvJPEG internals
   are otherwise untouched.

4. rocJPEG block: rewritten to drop the dead ROI/crop handling (the
   decode params were always zero-initialized, so it never ran), the
   vestigial single-pass batch loop, the misleading memory-reuse
   comment, the unused iostream/fstream/sstream includes, and the
   bespoke typeid-based catch. STD_TORCH_CHECK is used throughout to
   match the surrounding code.

The CUDA JPEG decode test tolerance bump is now gated on ROCm so it does
not weaken the nvJPEG assertion.

The nvJPEG path cannot be built on a ROCm host; its changes are
mechanical relocations of existing lines and rely on CUDA CI for
confirmation.

Test Plan:

Built against ROCm 7.2.1 and rocJPEG 1.4.0 on a gfx90a GPU:

```
USE_ROCJPEG=1 PYTORCH_ROCM_ARCH=gfx90a pip install -e . --no-build-isolation
```

Ran the GPU JPEG decode tests:

```
python -m pytest test/test_image.py \
  -k "test_decode_jpegs_cuda or test_decode_jpeg_cuda_errors or test_decode_jpeg_cuda_device_param"
```

All 8 selected tests pass (UNCHANGED/GRAY/RGB, scripted and eager, plus
the error and device-parameter cases). The unrelated test_encode_*_cuda
failures are pre-existing: GPU JPEG encode is nvJPEG-only and is not part
of this change.

Authored with assistance from Claude (Anthropic).
* Update decode_jpegs_cuda.cpp

* Update decode_jpegs_cuda.h

* split code
@pytorch-bot

pytorch-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9529

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants