Bug 100979

Summary: Radeon r4 on a6-6310(BEEMA) APU hard lockup on hibernate and on second resume from suspend
Product: DRI Reporter: Przemek <soprwa>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: medium CC: bugzilla, daniel, FD
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
system log after clean boot
none
system log after performing hibernate
none
kernel config file
none
system log after first suspend
none
system log after second suspend and hard lockup
none
dmesg after first suspend/resume process kernel 4.12
none
kernel log during hibernate none

Description Przemek 2017-05-09 21:11:06 UTC
I'm using kernel 4.11 on gentoo with SI/CIK enabled on Lenovo G50-45 Notebook. Machine has AMD APU A6 6130 with Radeon r4 graphics card (Beema/Mullins). This CPU supports olny AMD IOMMU v1. There is no discrete graphic card on it, APU only.

When I try to hibernate this notebook it doesn't turning off. I have to press power button to reboot the machine. Similar situation is on "radeon" driver, and I had submitted bug report about this on kernel's bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=191571

But in this situation I'm unable to bisect because as I remember correctly problem always occur on amdgpu driver, so I think those two can correlate with each other. (dmesg form hibernation process attached).

As for suspend.
I can suspend/resume machine successfully only once in a row. Second time machine suspends correctly, but on resume I have hard lockup, fans are spinning on full rpm's and I cannot do anything but pressing power button to reboot(cold boot) the netbook.
Moreover in dmesg after first suspend I've got error messages:

[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status failed
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed

Attachments:
1. Log after clean start.
2. Kernel's config file.
3. Log after hibernation process.
4. Log after first suspend/resume.
5. Log after second suspend/resume.
Comment 1 Przemek 2017-05-09 21:12:13 UTC
Created attachment 131281 [details]
system log after clean boot
Comment 2 Przemek 2017-05-09 21:13:18 UTC
Created attachment 131282 [details]
system log after performing hibernate
Comment 3 Przemek 2017-05-09 21:14:32 UTC
Created attachment 131283 [details]
kernel config file
Comment 4 Przemek 2017-05-09 21:15:46 UTC
Created attachment 131284 [details]
system log after first suspend
Comment 5 Przemek 2017-05-09 21:17:57 UTC
Created attachment 131285 [details]
system log after second suspend and hard lockup
Comment 6 Daniel 2017-06-02 15:33:28 UTC
The situation over here is maybe related:

Hibernating/suspend-to-disk/s4 on my new AMD Carrizo box fails.

Upon hibernate, the system just reboots. No hints in the logs.

Suspend-to-RAM/S3 works fine.

When I disable amdgpu, hibernate & resume work fine.

Linux-4.11.3-gentoo-x86_64-AMD_A12-9800_RADEON_R7,_12_COMPUTE_CORES_4C+8G-with-gentoo-2.3, sys-kernel/linux-firmware-20170519

Cheers, Daniel
Comment 7 Daniel 2017-06-28 08:58:33 UTC
Now on kernel 4.11.7 and linux-firmware-20170622, still no good.
Comment 8 Przemek 2017-07-03 16:07:46 UTC
Created attachment 132411 [details]
dmesg after first suspend/resume process kernel 4.12

Situation still persist on kernel 4.12. After second "suspend" machine cannot resume.
I've attached dmesg after first suspend/resume process.

If I could help/test patches please let me know.
Thanks for your effort,
Przemek.
Comment 9 Przemek 2018-01-31 10:55:44 UTC
I have just upgraded kernel to 4.15.
There is a big progress. Laptop can now successfully suspend (S3) and resume  many times in a row.
 
_Thank you very much for your hard work_.

But unfortunately hibernate to disk (S4) still does not work as expected. Process is causing hard lockup (system freeze) just on the first attempt. 

Display goes black (backlight is on), cpu is getting hot (fans are working 100% rpms), and I can do noting more than press "power button" to hard reset the machine.

There is no more "amdgpu_atombios_dp_link_train" message in dmesg instead there are mesages related to "swiotlb buffer is full" and "swiotlb: coherent allocation failed" as in the bug: https://bugs.freedesktop.org/show_bug.cgi?id=104082.

Thanks,
Przemek
Comment 10 Przemek 2018-01-31 11:29:15 UTC
After some research I think that messages "swiotlb buffer is full" and "swiotlb: coherent allocation failed" are not related to this bug:

https://lkml.org/lkml/2018/1/16/106
Comment 11 Przemek 2018-01-31 12:12:18 UTC
Created attachment 137085 [details]
kernel log during hibernate

Kernel log taken during hibernate process. Netbook was booted up with command line "initcall_debug" and "no_console_suspend".
Comment 12 Przemek 2018-02-06 23:30:34 UTC
The valid mailing list post, when it comes to messages "swiotlb buffer is full" and "swiotlb: coherent allocation failed", is: https://lkml.org/lkml/2018/1/10/132. Thanks to Alex Deucher correcting me in another bug report.
Comment 13 Przemek 2018-03-28 10:54:14 UTC
Using the opportunity that I was working on another bug report with AMD DC kernel driver, and have git cloned agd5f drm-next-4.17-wip branch, I have tried to debug the situation on hibernate process with this experimental kernel also.

This time hibernation image is created, and laptop turns off after that.

But - during the hibernation process screen goes totally white, process take approximately 15-20 seconds on ssd drive before netbook gets power-off.

During resume hibernation image is read (it is possible to see percentage) and then screen goes white, and machine is in a locked-up state. I can do nothing but press power button to hard-reset the netbook.

Moreover I foud only once an error message in kernel log:

"Mar 28 01:36:54 eclipse kernel: [   86.665473] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:39:crtc-0] flip_done timed out
Mar 28 01:36:54 eclipse kernel: [   86.828685] [drm:gfx_v7_0_ring_test_ring] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD)
Mar 28 01:36:54 eclipse kernel: [   86.828698] [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block <gfx_v7_0> failed -22
Mar 28 01:36:54 eclipse kernel: [   86.828703] [drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed (-22).
Mar 28 01:36:54 eclipse kernel: [   86.828718] dpm_run_callback(): pci_pm_restore+0x0/0xa0 returns -22
Mar 28 01:36:54 eclipse kernel: [   86.828742] PM: Device 0000:00:01.0 failed to restore async: error -22"

of course I am unable to reproduce it. I am not sure how much it is related but this could be usefull, tough.

Kernel was booted up with amdgpu.dc=1.

Any help is appreciated.

Thanks,
Przemek.
Comment 14 Przemek 2019-02-15 15:10:32 UTC
I have tested hibernation on amd-staging-drm-next (git status - commit 	fa16d1eb6a78b265480bd4c2b8739c1ea261cdd8 ) and it's working as it should (with minor glitch).

Moreover I can suspend and resume the machine many times in a row without freeze/lockup. 

I nave no idea which commit made things work again, because haven't checked this future lately.

The only problem is that the second monitor connected to hdmi output is turning of after suspend/hibernate, and eDP screen brightness level is maxed out after resume from hibernate but this is not the case of this report.

Given the above I'm closing this bug report as RESOLVED/FIXED.

Thank you very much,
Przemek.
Comment 15 Przemek 2019-02-16 09:20:37 UTC
Just for the rectification,

after resume from hibernate both screens lights up as they should (eDP and HDMI).

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.