Similar to #104649 but happening on amdgpu. The system immediately locks up when resuming from suspend. I get to see the mouse cursor and the blue background of KDE's screen lock (but no password entry or anything like that), but cannot do anything. I can also reproduce on 4.16.7 and 4.17-rc4. This does not happen with amdgpu blacklisted, or with Arch Linux' LTS kernel (4.14.39) though I get other random failures on the LTS kernel. Unfortunately, the systemd journal does not contain anything after entering suspend so I have no possibility to get at a backtrace. System: Arch Linux w/ Linux 4.16.7 DMI: HP ZBook 14u G5/83B2, BIOS Q78 Ver. 01.00.05 01/25/2018 Intel Kaby Refresh 8550u Intel UHD 620 AMD Radeon PRO WX 3100 (I believe this is Polaris, not sure about exact Generation) lspci: 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 08) 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07) 00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 08) 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21) 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21) 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21) 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21) 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1) 00:1c.3 PCI bridge: Intel Corporation Device 9d13 (rev f1) 00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1) 00:1f.0 ISA bridge: Intel Corporation Device 9d4e (rev 21) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21) 00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21) 00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21) 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) I219-V (rev 21) 01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa XT [Radeon PRO WX 3100] 02:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78) 3c:00.0 Non-Volatile memory controller: Toshiba America Info Systems Device 0116
Please attach the dmesg output captured before suspend.
Created attachment 139444 [details] dmesg (actually journalctl -k) output before suspend
Does it also happen with amdgpu.dc=0?
Yes, on both 4.16.7 and 4.17-rc4
Created attachment 139445 [details] dmesg of 4.17-rc4 before suspend
last dmesg output is with amdgpu.dc=0
I did a bisect and git reported this as the culprit: kugel@thomas-nb:linux.git$ git bisect good 08810a4119aaebf6318f209ec5dd9828e969cba4 is the first bad commit commit 08810a4119aaebf6318f209ec5dd9828e969cba4 Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Wed Oct 25 14:12:29 2017 +0200 PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags The motivation for this change is to provide a way to work around a problem with the direct-complete mechanism used for avoiding system suspend/resume handling for devices in runtime suspend. The problem is that some middle layer code (the PCI bus type and the ACPI PM domain in particular) returns positive values from its system suspend ->prepare callbacks regardless of whether the driver's ->prepare returns a positive value or 0, which effectively prevents drivers from being able to control the direct-complete feature. Some drivers need that control, however, and the PCI bus type has grown its own flag to deal with this issue, but since it is not limited to PCI, it is better to address it by adding driver flags at the core level. To that end, add a driver_flags field to struct dev_pm_info for flags that can be set by device drivers at the probe time to inform the PM core and/or bus types, PM domains and so on on the capabilities and/or preferences of device drivers. Also add two static inline helpers for setting that field and testing it against a given set of flags and make the driver core clear it automatically on driver remove and probe failures. Define and document two PM driver flags related to the direct- complete feature: NEVER_SKIP and SMART_PREPARE that can be used, respectively, to indicate to the PM core that the direct-complete mechanism should never be used for the device and to inform the middle layer code (bus types, PM domains etc) that it can only request the PM core to use the direct-complete mechanism for the device (by returning a positive value from its ->prepare callback) if it also has been requested by the driver. While at it, make the core check pm_runtime_suspended() when setting power.direct_complete so that it doesn't need to be checked by ->prepare callbacks. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> :040000 040000 6f18a781ca7ee0501888a66532f0667f2926aeb1 440821a72777285dccc37d3a8254688bf4a24486 M Documentation :040000 040000 6aaceba7f5aae9368a1e6e287a1f56cb1326adbf 557c1672f5101aeae16ce6bda4969c42dd3321bb M drivers :040000 040000 bdc707f2a476baf517361c46ed28977cb30b6e1b 7c33fb89c953ad06a7b1c8b686d6b6a403aa509b M include (I haven't tried reverting just this on top of 4.16 yet). Interestingly, this commit seems to also affect my wifi. I.e. the good commits (from the susped pov) do not have working wifi, while bad commits have working wifi. I'll attach a dmesg output when running on the last good commit
Created attachment 139453 [details] dmesg of last good commit after suspend
Here's the bisect log: git bisect start # bad: [75bc37fefc4471e718ba8e651aa74673d4e0a9eb] Linux 4.17-rc4 git bisect bad 75bc37fefc4471e718ba8e651aa74673d4e0a9eb # good: [bebc6082da0a9f5d47a1ea2edc099bf671058bd4] Linux 4.14 git bisect good bebc6082da0a9f5d47a1ea2edc099bf671058bd4 # bad: [e4ee8b85b7657d9c769b727038faabdc2e6a3412] Merge tag 'usb-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb git bisect bad e4ee8b85b7657d9c769b727038faabdc2e6a3412 # bad: [bec04432cb9036dedf89140c102b5ac03e4b3626] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux git bisect bad bec04432cb9036dedf89140c102b5ac03e4b3626 # bad: [5bbcc0f595fadb4cac0eddc4401035ec0bd95b09] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad 5bbcc0f595fadb4cac0eddc4401035ec0bd95b09 # bad: [2cd83ba5bede2f72cc6c79a19a1bddf576b50e88] Merge tag 'iommu-v4.15-rc1' of git://github.com/awilliam/linux-vfio git bisect bad 2cd83ba5bede2f72cc6c79a19a1bddf576b50e88 # bad: [449fcf3ab0baf3dde9952385e6789f2ca10c3980] Merge tag 'staging-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad 449fcf3ab0baf3dde9952385e6789f2ca10c3980 # good: [43ff2f4db9d0f76452b77cfa645f02b471143b24] Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 43ff2f4db9d0f76452b77cfa645f02b471143b24 # good: [43ff2f4db9d0f76452b77cfa645f02b471143b24] Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 43ff2f4db9d0f76452b77cfa645f02b471143b24 # good: [313144c1bcd6dd22f2375a602a8cb6efa759c8cd] Staging: rtlwifi: pci: fixed a coding style issue git bisect good 313144c1bcd6dd22f2375a602a8cb6efa759c8cd # good: [b18d62891aaff49d0ee8367d4b6bb9452469f807] Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good b18d62891aaff49d0ee8367d4b6bb9452469f807 # bad: [990a848d537e4da966907c8ccec95bc568f2911c] Merge branches 'pm-devfreq' and 'pm-tools' git bisect bad 990a848d537e4da966907c8ccec95bc568f2911c # good: [60af981c78a72255355c8e374e173b550d6742d6] Merge branch 'pm-cpufreq' git bisect good 60af981c78a72255355c8e374e173b550d6742d6 # good: [05d658b5b57214944067fb4f62bce59200bf496f] Merge branch 'pm-sleep' git bisect good 05d658b5b57214944067fb4f62bce59200bf496f # bad: [1efef68262dc567f0c09da9d11924e8287cd3a8b] Merge branch 'pm-core' git bisect bad 1efef68262dc567f0c09da9d11924e8287cd3a8b # bad: [08810a4119aaebf6318f209ec5dd9828e969cba4] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags git bisect bad 08810a4119aaebf6318f209ec5dd9828e969cba4 # good: [b082ddd8a6a3aa0399763bfb58fc7bdd84c95713] PM / core: Fix kerneldoc comments of four functions git bisect good b082ddd8a6a3aa0399763bfb58fc7bdd84c95713 # good: [69a10ca747c2d2d7c0354a883335e097c067ed35] Merge branch 'acpi-pm' into pm-core git bisect good 69a10ca747c2d2d7c0354a883335e097c067ed35 # first bad commit: [08810a4119aaebf6318f209ec5dd9828e969cba4] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
Looks like you should report this at https://bugzilla.kernel.org/enter_bug.cgi?product=Power%20Management&component=Hibernation/Suspend .
I can suspend+resume just fine with amdgpu blacklisted, so I'm under the impression that this is the right place.
That's debatable, given you bisected to a non-amdgpu commit, which affects WiFi as well.
I'll report the bug on the other site as well. In my view: Loading the amdgpu module breaks resuming from suspend. Maybe the module isn't correctly adapted to the changes made in generic subsystems earlier.
Same Problem here (HP zbook 15u 5g). https://bugzilla.kernel.org/show_bug.cgi?id=199609 Chen Yu recommended to write a request on amd-gfx@lists.freedesktop.org with no success so far. https://lists.freedesktop.org/archives/amd-gfx/2018-May/022064.html
I investigated the commit found by git bisect a bit more, and found that the following patch (which reverts part of said commit) repairs resuming. I can't tell the consequences, however reading the commit message suggests this part is non-critical: > While at it, make the core check pm_runtime_suspended() when > setting power.direct_complete so that it doesn't need to be > checked by ->prepare callbacks. diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 02a497e7c785..028c14386e5d 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1959,9 +1959,7 @@ static int device_prepare(struct device *dev, pm_message_t state) * applies to suspend transitions, however. */ spin_lock_irq(&dev->power.lock); - dev->power.direct_complete = state.event == PM_EVENT_SUSPEND && - pm_runtime_suspended(dev) && ret > 0 && - !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP); + dev->power.direct_complete = ret > 0 && state.event == PM_EVENT_SUSPEND; spin_unlock_irq(&dev->power.lock); return 0; } So, what to do with this information / potential fix?
(In reply to Thomas Martitz from comment #15) > > So, what to do with this information / potential fix? Please file a bug as per comment 10 and include that information.
Done, https://bugzilla.kernel.org/show_bug.cgi?id=199693
Hello , i don't know if this is the correct place to state this, but i have a Desktop PC running Ubuntu 16.04 and i noticed too that the system won't resume from suspend after installation of amdgpu-driver of rx560 this happens with the AMD Driver for 16.04 Xenial https://www.amd.com/en/support/kb/release-notes/rn-prorad-lin-amdgpupro but also with the AMDGPU.PRO driver for 18.04 https://www.amd.com/pl/support/1881 I have to say that i use Kubuntu instead of Ubuntu. 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux The System is Ryzen 5 2600 AMD RX560 2GB 16 GB RAM. Withouth the AMDGPU Driver there is no problem with suspend as it seems.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/380.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.