Summary: | [drm] GPU recovery disabled. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | freedesktop.org | ||||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||
Severity: | normal | ||||||||||
Priority: | medium | ||||||||||
Version: | unspecified | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Description
freedesktop.org
2018-07-08 09:24:30 UTC
Indeed, crashes upon S3 resumes have been abundant with amdgpu.dc=1 for many months now, and seemingly for more than one reason. One bug I reported in August 2017 with https://bugs.freedesktop.org/show_bug.cgi?id=102323 - that one was fixed quickly. The next S3 resume crash I reported in October 2017 in https://bugs.freedesktop.org/show_bug.cgi?id=103277, that one stayed without any resolution until April 2018, and the fix found in that report only works if no "drm.edid_firmware=..." kernel command line option is used. Another crash bug with S3 resumes I reported for 4.17.2 kernels in https://bugs.freedesktop.org/show_bug.cgi?id=107065 - then realized that 4.18 pre-releases exhibit the very same kind of crash immediately upon starting X11. For this crash upon X11 startup, there is a patch in the bug report, but it does not prevent the S3 resume crash. I currently work around S3 resume crashes by switching to the console display before enterin S3 sleep - but this is really an awkward work-around. (In reply to dwagner from comment #1) > I currently work around S3 resume crashes by switching to the console > display before enterin S3 sleep - but this is really an awkward work-around. Oh, that doesn't help either. It crashes the very moment I switch back to X. And what's more starting with 4.15 amdgpu.dc=0 doesn't appear to make any difference. Please attach the full dmesg output. Can you bisect between 4.14 and 4.15? Do you have a full dmesg? Created attachment 140525 [details]
dmesg amdgpu.dc=1
Booted with amdgpu.dc=1.
Created attachment 140526 [details]
dmesg /etc/modprobe.d/
Booted with amdgpu.dc=1 in /etc/modprobe.d/
Sure, attached. AMD staging kernel. I don't know how to tell whether DC=1 is really enabled, so I did two runs: one with amdgpu.dc=1 as boot parameter and one with /etc/modprobe.d/ on top of that. Procedure was the same both times: - boot - X login - switch to console - sleep, wakeup - switch to X The drm/amdgpu lines appear already in the console right after waking up, prior to switching to X. This time "only" X crashed (could still move the pointer); at times the complete machine is dead, no switching to console and and no SSH. (as a side note: is is normal that waking up on ryzen takes something on the order of 10-30s? I'm used to split second wakeups on Intel.) HTH Created attachment 140528 [details]
dmesg 4.14 LTS
Sorry, forgot about the requested 4.14 dmesg log. Attached as well.
This is: boot, login (to KDE this time), do stuff, remember, sleep, wakeup.
Yeah, that is a known problem in the PCI subsystem. Will be fixed with 4.19 and then backported to older kernels. So, there's 4.19rc1-amd-next \o/ echo: write error: Device or resource busy This started to happen with 4.18. dmesg: [ 171.245467] Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0): [ 171.245484] systemd-udevd D 0 700 615 0x80000124 So, is this sth. to report to fricking systemd to? Gee, really...?! > systemd-udevd
This is not systemd's fault, but indicative of something hanging in kernel land, which udevd ends up being blocked on.
Experienced this a few major kernel releases ago, which were resolved by the next major version. Never did figure out what caused udevd to block... :/
|
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.