Hi! This is a surprisingly long standing problem with a RX 460, more precisely since 4.15 all the way up to 4.18 AMD staging DRM next [1]. After resuming from sleep (echo -n mem > /sys/power/state) amdgpu is dead (always, reliably). Here's what dmesg has to say about it: [Sun Jul 8 11:01:17 2018] PM: suspend exit [Sun Jul 8 11:01:19 2018] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out. [Sun Jul 8 11:01:19 2018] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on GFX ring (-110). [Sun Jul 8 11:01:19 2018] [drm:process_one_work] *ERROR* ib ring test failed (-110). [Sun Jul 8 11:01:28 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=864, last emitted seq=868 [Sun Jul 8 11:01:28 2018] [drm] GPU recovery disabled. From ealier versions: [ 42.802559] PM: suspend exit [ 42.824332] amdgpu 0000:41:00.0: GPU fault detected: 147 0x0bd84802 [ 42.824338] amdgpu 0000:41:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0034F97B [ 42.824341] amdgpu 0000:41:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C048002 [ 42.824345] amdgpu 0000:41:00.0: VM fault (0x02, vmid 6) at page 3471739, read from 'TC0' (0x54433000) (72) [ 52.956306] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=1287, last emitted seq=1289 [ 52.956316] [drm] IP block:gfx_v8_0 is hung! [ 52.956362] [drm] GPU recovery disabled. I've also seen fault 146 but other than that it mostly looks the same. 4.14-lts (with dc=0) works fine. RX 460, Zenith Extreme, 1950x. [1] arch linux AUR; this versioning is a bit confusing, it may actually already be the 4.19 branch, latest commit is3838e387fd1eb17bfcf6ff7d443d931adb5cb41b
Indeed, crashes upon S3 resumes have been abundant with amdgpu.dc=1 for many months now, and seemingly for more than one reason. One bug I reported in August 2017 with https://bugs.freedesktop.org/show_bug.cgi?id=102323 - that one was fixed quickly. The next S3 resume crash I reported in October 2017 in https://bugs.freedesktop.org/show_bug.cgi?id=103277, that one stayed without any resolution until April 2018, and the fix found in that report only works if no "drm.edid_firmware=..." kernel command line option is used. Another crash bug with S3 resumes I reported for 4.17.2 kernels in https://bugs.freedesktop.org/show_bug.cgi?id=107065 - then realized that 4.18 pre-releases exhibit the very same kind of crash immediately upon starting X11. For this crash upon X11 startup, there is a patch in the bug report, but it does not prevent the S3 resume crash. I currently work around S3 resume crashes by switching to the console display before enterin S3 sleep - but this is really an awkward work-around.
(In reply to dwagner from comment #1) > I currently work around S3 resume crashes by switching to the console > display before enterin S3 sleep - but this is really an awkward work-around. Oh, that doesn't help either. It crashes the very moment I switch back to X. And what's more starting with 4.15 amdgpu.dc=0 doesn't appear to make any difference.
Please attach the full dmesg output. Can you bisect between 4.14 and 4.15?
Do you have a full dmesg?
Created attachment 140525 [details] dmesg amdgpu.dc=1 Booted with amdgpu.dc=1.
Created attachment 140526 [details] dmesg /etc/modprobe.d/ Booted with amdgpu.dc=1 in /etc/modprobe.d/
Sure, attached. AMD staging kernel. I don't know how to tell whether DC=1 is really enabled, so I did two runs: one with amdgpu.dc=1 as boot parameter and one with /etc/modprobe.d/ on top of that. Procedure was the same both times: - boot - X login - switch to console - sleep, wakeup - switch to X The drm/amdgpu lines appear already in the console right after waking up, prior to switching to X. This time "only" X crashed (could still move the pointer); at times the complete machine is dead, no switching to console and and no SSH. (as a side note: is is normal that waking up on ryzen takes something on the order of 10-30s? I'm used to split second wakeups on Intel.) HTH
Created attachment 140528 [details] dmesg 4.14 LTS Sorry, forgot about the requested 4.14 dmesg log. Attached as well. This is: boot, login (to KDE this time), do stuff, remember, sleep, wakeup.
Yeah, that is a known problem in the PCI subsystem. Will be fixed with 4.19 and then backported to older kernels.
So, there's 4.19rc1-amd-next \o/ echo: write error: Device or resource busy This started to happen with 4.18. dmesg: [ 171.245467] Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0): [ 171.245484] systemd-udevd D 0 700 615 0x80000124 So, is this sth. to report to fricking systemd to? Gee, really...?!
> systemd-udevd This is not systemd's fault, but indicative of something hanging in kernel land, which udevd ends up being blocked on. Experienced this a few major kernel releases ago, which were resolved by the next major version. Never did figure out what caused udevd to block... :/
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.