05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] [1002:15dd] (rev d0) System locks up partially with Plague Inc. (native Linux game). With Path of Exile (wine) it fully locks up, not able to respond to sysrq. I haven't gotten any output from the full lockup, but for the partial lock-up, nothing showed up in the logs. Let me know what steps I can take to help diagnose the issue here. Happens on 4.18.12 and 4.19.0-rc7+ ( v4.19-rc7-15-g64c5e530ac2c)
Please attach your dmesg output and xorg log if using X.
Created attachment 142000 [details] Xorg log
Created attachment 142001 [details] dmesg Took some time to respond since I waited until I got more RAM to ensure there was no issue with running out of RAM. The game that had a soft lockup seemed to be related to ram and works fine. The other one I still get a system lockup. I have attached the logs.
This can be closed. I was able to get things working by `idle=nomwait` to my linux cmdline.
Created attachment 142013 [details] 4.18.12 dmesg log, was using vulkan at the time Seems I spoke too soon. I am uploading some logs taken with 4.18.12 while using vulkan. Screen was fully frozen, and for example trying to mute audio would not change the status indicator on my keyboard. Some extracts related to amdgpu below, but the full dmesg is attached. Oct 13 01:08:12 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=1055330, last emitted seq=1055332 ... Oct 13 01:09:24 kernel: kworker/0:2: page allocation failure: order:10, mode:0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null) Oct 13 01:09:24 kernel: kworker/0:2 cpuset=/ mems_allowed=0 Oct 13 01:09:24 kernel: CPU: 0 PID: 192238 Comm: kworker/0:2 Tainted: G C 4.18.12 #1 Oct 13 01:09:24 kernel: Hardware name: LENOVO 20MUCTO1WW/20MUCTO1WW, BIOS R0WET34W (1.02 ) 07/05/2018 Oct 13 01:09:24 kernel: Workqueue: events do_poweroff Oct 13 01:09:24 kernel: Call Trace: Oct 13 01:09:24 kernel: dump_stack+0x5c/0x7b Oct 13 01:09:24 kernel: warn_alloc+0xf7/0x180 Oct 13 01:09:24 kernel: ? _cond_resched+0x11/0x40 Oct 13 01:09:24 kernel: __alloc_pages_nodemask+0xee3/0x1030 Oct 13 01:09:24 kernel: ? __radix_tree_delete+0x7e/0xa0 Oct 13 01:09:24 kernel: cache_grow_begin+0x77/0x510 Oct 13 01:09:24 kernel: fallback_alloc+0x15c/0x1f0 Oct 13 01:09:24 kernel: ? amdgpu_vcn_suspend+0x47/0x80 [amdgpu] Oct 13 01:09:24 kernel: __kmalloc+0x1bf/0x240 Oct 13 01:09:24 kernel: amdgpu_vcn_suspend+0x47/0x80 [amdgpu] Oct 13 01:09:24 kernel: amdgpu_device_ip_suspend+0xbd/0x160 [amdgpu] Oct 13 01:09:24 kernel: device_shutdown+0x13f/0x1e0 Oct 13 01:09:24 kernel: kernel_power_off+0x2c/0x60 Oct 13 01:09:24 kernel: process_one_work+0x1e0/0x3c0 Oct 13 01:09:24 kernel: worker_thread+0x44/0x3f0 Oct 13 01:09:24 kernel: kthread+0xf0/0x130 Oct 13 01:09:24 kernel: ? process_one_work+0x3c0/0x3c0 Oct 13 01:09:24 kernel: ? kthread_flush_work_fn+0x10/0x10 Oct 13 01:09:24 kernel: ret_from_fork+0x22/0x40
Created attachment 142014 [details] 4.18.12 Xorg log, was using vulkan at the time
Created attachment 142015 [details] dmesg from 4.19.0-rc7+ (commit bab5c80b2110 I believe) Here is a log with git from a day ago (commit bab5c80b2110). This log and the previous I posted both seem to show some aspect of the system is still responding, as you can tell it logged my sysrq attempts. Though trying to hard shutdown/restart with sysrq doesn't result in the system restarting, and it still shows the same image on the screen from the time of the freeze. DRM messages at the end of the log: Oct 12 21:19:22 kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:41:crtc-0] flip_done timed out Oct 12 21:19:32 kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:41:crtc-0] flip_done timed out Oct 12 21:19:42 kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out
Also to clarify: The first dmesg I uploaded had `rcu: INFO: rcu_sched detected stalls on CPUs/tasks:` in the log. Adding `idle=nomwait` to kernel cmdline has so far fully resolved this and other messages which sometimes would appear which not always referenced rcu_sched but seemed to imply a cpu or thread had stalled. There seem to be several Ryzen errata related to mwait (https://www.amd.com/system/files/TechDocs/55449_Fam_17h_M_00h-0Fh_Rev_Guide.pdf) so I think that issue is fixed though the amdgpu/drm issues seem to have remained.
I can also confirm this issue. 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev d0) Lenovo ThinkPad A485 AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx Linux Mint 19.1 Kernel 5.0.7 Hard lockups of the GPU, requiring the laptop to be power cycled. Interestingly SSH still works in the background while it's locked up. It happens randomly when opening Firefox or doing very basic tasks - sometimes just sitting idle, however it will crash 100% of the time when trying to play Cities Skylines. [37258.615599] gmc_v9_0_process_interrupt: 10 callbacks suppressed [37258.615608] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317) [37258.615615] amdgpu 0000:06:00.0: in page starting at address 0x0000800107805000 from 27 [37258.615619] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031 [37258.615629] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317) [37258.615633] amdgpu 0000:06:00.0: in page starting at address 0x0000800107807000 from 27 [37258.615636] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [37258.615645] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317) [37258.615648] amdgpu 0000:06:00.0: in page starting at address 0x0000800107801000 from 27 [37258.615651] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [37258.615660] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317) [37258.615663] amdgpu 0000:06:00.0: in page starting at address 0x0000800107803000 from 27 [37258.615666] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [37258.615675] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317) [37258.615678] amdgpu 0000:06:00.0: in page starting at address 0x0000800107809000 from 27 [37258.615681] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [37258.615689] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317) [37258.615692] amdgpu 0000:06:00.0: in page starting at address 0x000080010780b000 from 27 [37258.615695] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [37258.615704] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317) [37258.615707] amdgpu 0000:06:00.0: in page starting at address 0x0000800107805000 from 27 [37258.615710] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [37258.615740] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317) [37258.615743] amdgpu 0000:06:00.0: in page starting at address 0x0000800107807000 from 27 [37258.615746] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [37258.615756] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317) [37258.615759] amdgpu 0000:06:00.0: in page starting at address 0x0000800107801000 from 27 [37258.615762] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [37258.615771] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317) [37258.615774] amdgpu 0000:06:00.0: in page starting at address 0x0000800107803000 from 27 [37258.615777] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [37268.712339] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37268.712387] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37268.712389] [drm] GPU recovery disabled. [37278.952537] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37278.952624] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37278.952628] [drm] GPU recovery disabled. [37289.192390] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37289.192478] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37289.192481] [drm] GPU recovery disabled. [37299.432447] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37299.432534] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37299.432538] [drm] GPU recovery disabled. [37309.676431] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37309.676518] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37309.676522] [drm] GPU recovery disabled. [37319.912444] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37319.912536] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37319.912541] [drm] GPU recovery disabled. [37330.156619] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37330.156706] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37330.156710] [drm] GPU recovery disabled. [37340.392424] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37340.392511] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37340.392515] [drm] GPU recovery disabled. [37350.632424] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37350.632511] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37350.632514] [drm] GPU recovery disabled. [37360.872417] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37360.872508] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37360.872511] [drm] GPU recovery disabled. [37371.112436] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37371.112523] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37371.112527] [drm] GPU recovery disabled. [37381.352427] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37381.352514] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37381.352517] [drm] GPU recovery disabled. [37391.592410] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37391.592497] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37391.592500] [drm] GPU recovery disabled. [37401.836426] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37401.836513] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37401.836517] [drm] GPU recovery disabled. [37412.072433] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37412.072520] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37412.072524] [drm] GPU recovery disabled. [37422.312442] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37422.312528] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37422.312532] [drm] GPU recovery disabled. [37432.552428] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37432.552515] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37432.552519] [drm] GPU recovery disabled. [37442.792418] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37442.792506] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37442.792510] [drm] GPU recovery disabled. [37453.032397] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37453.032483] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37453.032487] [drm] GPU recovery disabled. [37463.272534] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37463.272621] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37463.272624] [drm] GPU recovery disabled. [37473.512589] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37473.512676] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37473.512680] [drm] GPU recovery disabled. [37483.752954] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37483.753041] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37483.753044] [drm] GPU recovery disabled. [37493.992566] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478 [37493.992654] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317 [37493.992657] [drm] GPU recovery disabled.
Having the same issue, but it seems to be localized to Vulkan. I've confirmed this on a Ryzen 2300U laptop and a desktop Ryzen 2400G. I have a save file in Pac-Man Champion Edition DX+ that guaranteed will crash the system within 5 seconds of loading a time trial screen on both laptops. Same save file works fine on a proprietary nvidia card, and on (cherry trail) intel drivers, so this looks like it's a Vulkan bug exhibiting itself through DXVK on Raven Ridge. Tested on 19.1.0_rc2 and git current as of this date: both exhibit identical issues in Ubuntu and Gentoo.
Created attachment 144687 [details] Journalctl, from starting Steam to Magic Sysrq shutdown I can also confirm this issue, with the same error message in the log. I can reproduce the issue every single time by trying to start a new game in "Cities: Skylines". System information: ``` AMD Ryzen 5 2400G Linux ZenBox 5.1.15-arch1-1-ARCH #1 SMP PREEMPT Tue Jun 25 04:49:39 UTC 2019 x86_64 GNU/Linux mesa-19.1.1-1 ```
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/548.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.