Problem Description
Since ROCm7.x the error behavior was changed to match its' CUDA counterpart. This is a follow-up issue for PR #859
It would be great if Triton team with more repository knowledge than me could go through the code handling hip calls and see where this change could also lead to a broken state. Potential candidate is driver.c in third_party/amd. Didn't make changes there because I'm not sure when is this used at all. Would be probably better if you could wrap all hip calls in a wrapper and just handle error discard there.
Jira that lead to this: https://ontrack-internal.amd.com/browse/SWDEV-546704
More info on error changes: https://ontrack-internal.amd.com/browse/SWDEV-438790
Operating System
n/a
CPU
n/a
GPU
n/a
ROCm Version
7.x
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Problem Description
Since ROCm7.x the error behavior was changed to match its' CUDA counterpart. This is a follow-up issue for PR #859
It would be great if Triton team with more repository knowledge than me could go through the code handling hip calls and see where this change could also lead to a broken state. Potential candidate is
driver.cin third_party/amd. Didn't make changes there because I'm not sure when is this used at all. Would be probably better if you could wrap all hip calls in a wrapper and just handle error discard there.Jira that lead to this: https://ontrack-internal.amd.com/browse/SWDEV-546704
More info on error changes: https://ontrack-internal.amd.com/browse/SWDEV-438790
Operating System
n/a
CPU
n/a
GPU
n/a
ROCm Version
7.x
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response