Bug 60439

Summary: Suspend/resume broken for cayman with 90a51a329258e3c868f6
Product: DRI Reporter: Harald Judt <h.judt>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: medium CC: alexandre.f.demers, dcherkassov, florian
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=56139
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
possible fix
none
ogv video showing corruption (1.4MiB) none

Description Harald Judt 2013-02-07 19:13:38 UTC
In all kernels newer than 3.6.x, suspend & resume is broken for cayman. It works fine in 3.6.11 and lower. Last tested release where issue is reproducible is 3.8.0-rc6.

Suspend seems to work ok as nothing hangs, but on resume the console appears with a blinking cursor at the top left corner, then after some seconds the video signal seems to be turned off and the monitor goes into powersave mode. SSH doesn't work, machine seems frozen. Hello, reset button!

I can bisect but would appreciate it if you could tell me which commits count as possible bad candidates.

(Also I noticed that in 3.8.0-rc6, the compiz-0.8 splash screen is garbled, which it is not in 3.7.x and 3.6.x, but that's a different issue.)
Comment 1 Alex Deucher 2013-02-08 00:24:31 UTC
Can you bisect?  I can't think of any commits in particular that would cause an issue with suspend and resume.
Comment 2 Harald Judt 2013-02-08 11:15:51 UTC
Seems to have been introduced by a change between 3.7-rc1 and 3.7-rc2 (ddffeb8c4d0331..6f0c0580b7). Don't have time to track that further now.
Maybe it doesn't have to do with suspend/resume directly, but is caused by it as some sort of side-effect?

The following commits fall within that range (only one specific to cayman), I guess I can skip some of them:

bd6126b - drm: radeon: fix printk format warning (vor 4 Monaten)
8ad33cd - drm/radeon: fix spelling typos in debugging output
0829184 - drm/radeon: Don't destroy I2C Bus Rec in radeon_ext_tmds_enc_destroy().
3691fee - drm/radeon: check if pcie gen 2 is already enabled (v2)
c1a7ca0 - drm/radeon/cayman: set VM max pfn at MC init
13e55c3 - drm/radeon: separate pt alloc from lru add
d72d43c - drm/radeon: don't add the IB pool to all VMs v2
90a51a3 - drm/radeon: allocate page tables on demand v4
23d4f1f - drm/radeon: update comments to clarify VM setup (v2)
29dbe3b - drm/radeon: allocate PPLLs from low to high
cd23492 - drm/radeon: fix compilation with backlight disabled<Alex Deucher>
a187193 - drm/radeon: use %zu for formatting size_t
Comment 3 Michel Dänzer 2013-02-08 14:20:02 UTC
(In reply to comment #2)
> The following commits fall within that range (only one specific to cayman),
> I guess I can skip some of them:

Please don't. In the best case, you save a couple of tests, in the worst case the bisection result is invalid and you have to start over again.
Comment 4 Harald Judt 2013-02-08 15:07:48 UTC
I will bisect the week after the next, before I will not have access to that machine. I'd rather avoid bisecting standby/resume problems too much because that could possibly cause data loss, and it is not possible to do in a VM.

BTW: I do not have such issues on a laptop with a ATI RV620 [Mobility Radeon HD 3400 Series], so this only seems to affect cayman.
Comment 5 Harald Judt 2013-02-19 15:42:15 UTC
I did bisect between ddffeb8c4d0331..6f0c0580b7, skipping only "23d4f1f - drm/radeon: update comments to clarify VM setup (v2)" which is a trivial step IMO. The offending commit is this:

commit 90a51a329258e3c868f6f4c1fb264ca01c590c57
Author: Christian König <deathsimple@vodafone.de>
Date:   Tue Oct 9 13:31:17 2012 +0200

    drm/radeon: allocate page tables on demand v4
    
    Based on Dmitries work, but splitting the code into page
    directory and page table handling makes it far more
    readable and (hopefully) more reliable.
    
    Allocations of page tables are made from the SA on demand,
    that should still work fine since all page tables are of
    the same size.
    
    Also using the fact that allocations from the SA are mostly
    continuously (except for end of buffer wraps and under very
    high memory pressure) to group updates send to the chipset
    specific code into larger chunks.
    
    v3: mostly a rewrite of Dmitries previous patch.
    v4: fix some typos and coding style
    
    Signed-off-by: Dmitry Cherkasov <Dmitrii.Cherkasov@amd.com>
    Signed-off-by: Christian König <deathsimple@vodafone.de>
    Tested-by: Michel Dänzer <michel.daenzer@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


Trying to revert this commit for 3.7-rc2 using git revert failed.
Comment 6 Alexandre Demers 2013-02-20 05:38:01 UTC
I think this is the same as bug 56139. I'll be watching what's going on here and I'm ready to test if anything comes out.
Comment 7 Harald Judt 2013-03-01 21:12:40 UTC
What makes you think bug 56139 and this one are related? Can you reproduce the suspend/resume problems introduced by this commit?
Comment 8 Alex Deucher 2013-03-11 13:38:27 UTC
*** Bug 56139 has been marked as a duplicate of this bug. ***
Comment 9 Alex Deucher 2013-03-11 13:46:46 UTC
Does switching to another VT before suspending help?
Comment 10 Christian König 2013-03-11 14:11:52 UTC
Going to look into it as soon as I have time. We are probably just missing to reset something after resume.

Christian.
Comment 11 Dmitry Cherkassov 2013-03-11 15:00:12 UTC
Hi.

I can't reproduce this bug on cayman with 3.8-rc3 (using pm-suspend).
Resuming works.

What software do you use? 
I've used xmonad launched via legacy startx while testing that.

Can you fire up a simple twm session without compiz, etc stuff and see how it goes?
Is it possible to get dmesg output (via serial cable or smth)?
Comment 12 Alexandre Demers 2013-03-11 15:05:31 UTC
(In reply to comment #11)
> Hi.
> 
> I can't reproduce this bug on cayman with 3.8-rc3 (using pm-suspend).
> Resuming works.
> 
> What software do you use? 
> I've used xmonad launched via legacy startx while testing that.
> 
> Can you fire up a simple twm session without compiz, etc stuff and see how
> it goes?
> Is it possible to get dmesg output (via serial cable or smth)?

See bug 56139, comments 51 and 52. Suspending/resuming from console or XFCE don't trigger the bug. However, doing the same under KDE or Gnome-Shell does exhibit it.
Comment 13 Alex Deucher 2013-03-11 19:35:12 UTC
Created attachment 76349 [details] [review]
possible fix

This patch should fix the issue.
Comment 14 Alexandre Demers 2013-03-11 21:58:06 UTC
(In reply to comment #13)
> Created attachment 76349 [details] [review] [review]
> possible fix
> 
> This patch should fix the issue.

It does over here.
Comment 15 Harald Judt 2013-03-12 12:33:36 UTC
Created attachment 76397 [details]
ogv video showing corruption (1.4MiB)

I confirm it fixes the problem, thanks for the patch.

However, it still doesn't fix bug #44772, can there be anything else missing? Symptoms are almost the same, except #44772 only occurs (sporadically) when hibernating/resuming, not on suspend/resume.

Further, corruption of the compiz splash screen is not fixed with this patch (as expected of course). It appears scrambled in some checkerboard fashion, see the attached ogv I recorded with recordmydesktop. It happens with 3.8.0 kernel, not sure about 3.7 (I do not think so) and definitely not with 3.6, all other components being unchanged and at current git.
Comment 16 Alex Deucher 2013-03-12 12:38:57 UTC
(In reply to comment #15)
> Further, corruption of the compiz splash screen is not fixed with this patch
> (as expected of course). It appears scrambled in some checkerboard fashion,
> see the attached ogv I recorded with recordmydesktop. It happens with 3.8.0
> kernel, not sure about 3.7 (I do not think so) and definitely not with 3.6,
> all other components being unchanged and at current git.

That issue is probably bug 60802.
Comment 17 Alexandre Demers 2013-03-12 15:17:33 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > Further, corruption of the compiz splash screen is not fixed with this patch
> > (as expected of course). It appears scrambled in some checkerboard fashion,
> > see the attached ogv I recorded with recordmydesktop. It happens with 3.8.0
> > kernel, not sure about 3.7 (I do not think so) and definitely not with 3.6,
> > all other components being unchanged and at current git.
> 
> That issue is probably bug 60802.

According to the attached ogv, it is exactly what I'm experiencing in bug 60802 and it should be tracked there, not in the current bug.
Comment 18 Florian Mickler 2013-03-24 10:21:20 UTC
A patch referencing this bug report has been merged in Linux v3.9-rc4:

commit fa3daf9aa74a3ac1c87d8188a43d283d06720032
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Mar 11 15:32:26 2013 -0400

    drm/radeon: fix S/R on VM systems (cayman/TN/SI)
Comment 19 Michel Dänzer 2013-03-25 11:34:01 UTC
Resolving per comment #18.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.