Bug 44772

Summary: Radeon HD6950 (Cayman): Resuming from hibernation fails sometimes
Product: DRI Reporter: Harald Judt <h.judt>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium    
Version: XOrg git   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Harald Judt 2012-01-14 02:26:12 UTC
Resuming after hibernation fails sometimes. After reading the snapshot (the percentage reached 100%), the screen would switch back, but instead the monitor goes black and turns off and the complete system hangs. No ssh or sysreq keys work, everything is dead.

Some things I noticed that might help:
* It does not occur with nomodeset=1.
* You don't need X to reproduce this, just console with modesetting enabled is enough.
* It is not reproducible reliably, but the likelyness that it will occur is unfortunately not small (I'd say 3-4 out of 10 attempts).
* The fastboot kernel parameter is not related to it.

That is all with vanilla kernel. Now enter tuxonice, because it provides more information. Tuxonice reads the atomic copy first (20%..40%..60%..80%->100%). Now, it would switch on tuxoniceui and show the progress for reading in the caches. However, in the problem case the switch does not happen, because the system hangs as described for the vanilla kernel (no ssh, sysreq). At this point, one has to push the reset button to reboot. However, tuxonice has a nice feature: It will detect the image, and that it has resumed from this image before. It then asks the user whether he wants to try resuming from that same image again or remove the image and continue booting. If one presses 'C' to resume from the image, this time everything works and the system is fully functional. In fact, it's a 100% chance that the screen does not get black or the system freezes.

The issue does not occur at certain cycles, sometimes three hibernation->resume cycles go well, then the fourth and fifth will not work, then the next four are ok and so on...

So, I believe there is some issue with hardware initialization and hibernation. It might be that something does not really finish in time. What could there be wrong?

This is with vanilla-3.2-rc5 and 3.2-rc7-tuxonice (see http://git.tuxonice.net/?p=tuxonice-head.git;a=summary), but I guess all 3.2-rc kernels will show the same problem. I had troubles hibernating with vanilla-3.2-final, so I could not test this.
Comment 1 Harald Judt 2012-01-21 07:42:30 UTC
Still reproducible in 3.3-rc1.
Comment 2 Harald Judt 2012-02-12 16:22:50 UTC
I cannot reproduce this anymore in 3.3-rc3, setting resolved fixed.
Comment 3 Harald Judt 2012-02-19 15:32:14 UTC
Ok, just when I thought this has been fixed, the bug strikes back! I still have the same problem and it seems I was a bit lucky that it did not occur until a few days ago.

I've tried to investigate further using netconsole, but unfortunately netconsole is of no use here because the network interface is down while reading the atomic copy.

The symptoms are still the same, and I cannot reproduce it with nomodeset=1.
Comment 4 Harald Judt 2012-02-21 14:13:08 UTC
As a side note: This does not occur with suspend and resume, only with hibernate and resume. Powering down vs rebooting after hibernating does not make a difference, too.
Comment 5 Harald Judt 2012-02-27 11:34:39 UTC
This problem is still present in 3.3-rc5 (vanilla and tuxonice).

> In fact, it's a 100% chance that the screen does not get black or
> the system freezes.

I have to falsify my previous assumption. It can also happen on the second, third etc. try. But eventually, if you keep trying long enough, it will still succeed to resume.

Anyone got an idea what could be wrong here or how to get more info?
Comment 6 Harald Judt 2012-03-20 11:15:35 UTC
The bug is still present with kernel-3.3 final.

Further tests confirm that this is an issue specific to the HD6950 card. I swapped the card with a Radeon HD3650 (RV635 chipset) for testing while leaving the rest of the system configuration unchanged, and the problem was no longer reproducible and hibernating and resuming worked fine.

If I change the power profile back from low to default, sometimes the system will freeze immediately when hibernating. Maybe some registers still don't get initialized/updated properly?
Comment 7 Harald Judt 2012-05-15 00:47:26 UTC
Still reproducible on linux-3.4.0-rc7.
Comment 8 Harald Judt 2012-06-14 15:02:10 UTC
Still reproducible with 3.4.0 final and "drm/radeon: fix vm deadlocks on cayman" applied, but at least I do not experience freezes anymore.
Comment 9 Harald Judt 2013-03-12 12:35:28 UTC
Still reproducible with 3.8.0.
Comment 10 Harald Judt 2013-05-26 05:35:24 UTC
Solved by setting /sys/power/pm_async to 0.
Comment 11 Harald Judt 2013-07-11 18:52:48 UTC
Reopened because it happens with 3.10 now even with pm_async set to 0. It happens with older kernel releases too when pm_async is set to 0, but is much harder to reproduce.
Comment 12 Harald Judt 2013-07-11 19:07:59 UTC
Just for reference: Since I've never got a response here, I've opened a bug report at kernel bugzilla in the hope of someone helping me collect more debug data: https://bugzilla.kernel.org/show_bug.cgi?id=57381

Since everything's documented there, I'll finally close this bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.