Skip to content

[BUG] amdgpu gfx ring timeout freeze with Steam DX12 on RX 9060 XT multi-monitor #103

@Dar1usTheGr3at

Description

@Dar1usTheGr3at

Summary

I've found a repeat bug on my AMD build, specifically to do with the GPU. I have a dual monitor setup, both BenQ 1080P, primary on DP, secondary on HDMI. While playing Resident Evil 2 and 3 remakes (higher end graphics), I like to have Youtube or Discord open on the other monitor, which triggers the freeze in-game. While monitoring resources, it doesn't seem like all my VRAM or GPU is being 100% used when it happens.

Additionally, there is a suspend then resume bug that results in a black primary monitor and a green secondary monitor. I've had more trouble identifying the root of this one.

I'm not entirely sure if this is driver specific or something else. If someone can tell me, I'll find that repo and place this report there. Please be aware, I'm fairly new to Linux, and it seems like so many elements intersect with this bug.

Hardware: Ryzen 9 5900X + RX 9060 XT 16GB, MSI MEG X570 ACE, 4 x 16 GB DDR4-3200

Kernel: 6.18.16-200.fc43.x86_64
Mesa: OpenGL string: 4.6 (Compatibility Profile) Mesa 26.0.1
Desktop: X11 Plasma

Report:
dmesg | grep -iE 'amdgpu|drm|gpu|ring|timeout|mes|sdma|fault|reset|hang|freeze'
journalctl -b -p err | grep -iE 'amdgpu|Xorg|proton|vkd3d|dxvk|brave'

dmesg: read kernel buffer failed: Operation not permitted
Mar 08 22:48:33 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=11625418, emitted seq=11625420
Mar 08 22:48:33 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: Process re2.exe pid 32794 thread vkd3d_queue pid 32840
Mar 08 22:48:33 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Mar 08 22:48:36 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: MES(1) failed to respond to msg=RESET
Mar 08 22:48:36 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: failed to reset legacy queue
Mar 08 22:48:36 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: Ring gfx_0.0.0 reset failed
Mar 08 22:48:39 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: MES(1) failed to respond to msg=REMOVE_QUEUE
Mar 08 22:48:39 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: failed to unmap legacy queue
Mar 08 22:48:41 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: MES(1) failed to respond to msg=REMOVE_QUEUE
Mar 08 22:48:41 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: failed to unmap legacy queue
Mar 08 22:48:44 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: MES(1) failed to respond to msg=REMOVE_QUEUE
Mar 08 22:48:44 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: failed to unmap legacy queue
Mar 08 22:48:47 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: MES(1) failed to respond to msg=REMOVE_QUEUE
Mar 08 22:48:47 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: failed to unmap legacy queue
Mar 08 22:48:50 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: MES(1) failed to respond to msg=REMOVE_QUEUE
Mar 08 22:48:50 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: failed to unmap legacy queue
Mar 08 22:48:52 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: MES(1) failed to respond to msg=REMOVE_QUEUE
Mar 08 22:48:52 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: failed to unmap legacy queue
Mar 08 22:48:53 ultramarine kernel: [drm:gfx_v12_0_hw_fini [amdgpu]] ERROR failed to halt cp gfx
Mar 08 22:48:55 ultramarine systemd-coredump[37573]: Process 2180 (Xorg) of user 1000 dumped core.
Module /usr/libexec/Xorg from rpm xorg-x11-server-21.1.21-1.fc43.x86_64
Module libinput_drv.so from rpm xorg-x11-drv-libinput-1.5.0-3.fc43.x86_64
Module libdrm_amdgpu.so.1 from rpm libdrm-2.4.131-1.fc43.x86_64
Module libglamoregl.so from rpm xorg-x11-server-21.1.21-1.fc43.x86_64
Module modesetting_drv.so from rpm xorg-x11-server-21.1.21-1.fc43.x86_64
Module libglx.so from rpm xorg-x11-server-21.1.21-1.fc43.x86_64
#1 0x00007efcb6b49880 _ZL30amdgpu_ctx_set_sw_reset_statusP17radeon_winsys_ctx17pipe_reset_statusPKcz (libgallium-26.0.0.so + 0xb49880)
#2 0x00007efcb6b4db69 _Z19amdgpu_cs_submit_ibIL10queue_type0EEvPvS1_i (libgallium-26.0.0.so + 0xb4db69)
#4 0x000000000053415e ospoll_wait (/usr/libexec/Xorg + 0x13415e)
#5 0x0000000000535709 InputThreadDoWork.lto_priv.0 (/usr/libexec/Xorg + 0x135709)
#8 0x00000000004536b8 BlockHandler (/usr/libexec/Xorg + 0x536b8)
#9 0x000000000053136b WaitForSomething (/usr/libexec/Xorg + 0x13136b)
#10 0x000000000041213f main (/usr/libexec/Xorg + 0x1213f)
#13 0x00000000004130c5 _start (/usr/libexec/Xorg + 0x130c5)
Mar 08 22:48:55 ultramarine kernel: amdgpu 0000:2f:00.0: amdgpu: [drm] ERROR Failed to initialize parser -125!
Module libdrm_amdgpu.so.1 from rpm libdrm-2.4.131-1.fc43.x86_64
#1 0x00007ff1d9749880 _ZL30amdgpu_ctx_set_sw_reset_statusP17radeon_winsys_ctx17pipe_reset_statusPKcz (libgallium-26.0.0.so + 0xb49880)
#2 0x00007ff1d974db69 _Z19amdgpu_cs_submit_ibIL10queue_type0EEvPvS1_i (libgallium-26.0.0.so + 0xb4db69)
Mar 08 22:49:19 ultramarine abrt-notification[38563]: Process 2176 (Xorg) crashed in amdgpu_pad_ib(radeon_cmdbuf*)()
Mar 08 22:49:42 ultramarine abrt-notification[38612]: Process 2901 (ckb-next) crashed in amdgpu_pad_ib(radeon_cmdbuf*)()

Steps to Reproduce

  1. Run Steam Flatpak.
  2. Run Game on Secondary monitor (HDMI output)
  3. Run Brave Browser with Youtube open.
  4. Bug occurs randomly during gameplay (Resident Evil 2 and 3 Remakes are all I've been playing)

Or for second bug

  1. Suspend
  2. Resume
  3. Black and green screens.

Possible Solutions

I can't describe any solutions at this time with my own skillset. I leave this to those better equipped to handle things, hoping my report aids their good work.

I've made several bandaid GRUB edits that have solved other issues, but it's never perfect:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amdgpu.runpm=0 amdgpu.dcdebugmask=0x12 amd.gfx_ring_timeout_ms=12000 amd_pstate=active idle=nomwait amdgpu.gpu_recovery=1 amdgpu.hangcheck=0 softlockup_panic=1 panic=10"

At most, this stops some suspend-resume crashes, and prevents GPU panics from wake-up. amd.gfx_ring_timeout_ms=12000 is my latest attempt to solve it myself. I'm running out of ideas.

Release Version

43

Edition / Desktop Environment / Hardware

KDE Plasma

Checklist

  • I've checked for duplicates in this repository. (not required)
  • I'm using the immutable/atomic version. (i.e. I think this issue might be specific to immutable.)
  • This issue is not reproducible on Fedora Linux.
  • This issue is not reproducible on other Linux distros.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions