Bug 106921

Summary: System lockup with Vega10 amdgpu: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
Product: DRI Reporter: sam <sam.psylo>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: devurandom, gwhite
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg w/mesa 18.0.2-1.fc28
none
dmesg w/mesa 18.2.0-0.11.git41dabdc.fc28
none
another dmesg w/mesa 18.0.2-1.fc28 none

Description sam 2018-06-14 16:47:11 UTC
Created attachment 140164 [details]
dmesg w/mesa 18.0.2-1.fc28

Using Vega10 hardware (in my case, RX Vega 64), the whole system experiences regular full lockups, requiring me to force reboot either with the power switch on the PC or using SysRq. The system is still running, since I am able to ssh in from a separate machine and retrieve logs/run commands/etc, but all keyboard and mouse input ceases.

I've had this occur when doing a multitude of things, some of which are as follows:
- Playing games through Steam (Half-Life 2, Portal 2, Terraria tested)
- Playing non-Steam games (SuperTuxKart, GNOME Mines)
- Idle GNOME 3 desktop (no applications running)
- Browsing the web with Firefox 60.0.1

I have had this occur with:
Kernel: 4.16.14-300.fc28.x86_64 (from Fedora repos), 4.17.0 & 4.18.0-git5.1 (from kernel-vanilla repositories linked on Fedora wiki)
Mesa: 18.0.2-1.fc28 (from Fedora repos), 18.2.0-0.11.git41dabdc.fc28 (from che/mesa copr repo)
linux-firmware: 20180525-85.git7518922b.fc28 (from Fedora repos), with amdgpu/vega10_vce.bin replaced with newest version from git master.
OS: Fedora 28 Workstation

I am attaching a few dmesgs, each of which going from boot to the bug occurring.
Comment 1 sam 2018-06-14 16:47:44 UTC
Created attachment 140165 [details]
dmesg w/mesa 18.2.0-0.11.git41dabdc.fc28
Comment 2 sam 2018-06-14 16:48:32 UTC
Created attachment 140166 [details]
another dmesg w/mesa 18.0.2-1.fc28
Comment 3 Henri Verbeet 2018-09-01 14:40:12 UTC
In case it helps, I was occasionally seeing something very similar on kernel 4.17, but so far haven't seen this since switching to 4.18 about two weeks ago.
Comment 4 Greg White 2018-09-16 21:29:49 UTC
I still see this with a Vega 56 in 4.19-rc3, mesa 18.2 and 18.3.
Comment 5 Greg White 2018-09-18 02:10:45 UTC
Update: I swapped the card into a machine and tried it with Windows.  It still crashed.  I replaced the card and all is well.
Comment 6 Martin Peres 2019-11-19 08:40:53 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/416.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.