Bug 26872 - Kernel 2.6.33 fails to suspend (bisected)
Kernel 2.6.33 fails to suspend (bisected)
Status: NEW
Product: DRI
Classification: Unclassified
Component: DRM/Radeon
XOrg 6.7.0
x86-64 (AMD64) Linux (All)
: medium normal
Assigned To: Default DRI bug account
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-03-03 13:37 UTC by Nix
Modified: 2010-04-02 18:11 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Log of bisection of failure; I believe all but the last few lines, but repeating those last few bisections dumps me back in the same ridiculous place again (2.82 KB, text/plain)
2010-03-03 13:37 UTC, Nix
no flags Details
Kernel config on crashing machine ('make oldconfig' from working 2.6.32 configuration) (57.23 KB, text/plain)
2010-03-03 13:38 UTC, Nix
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nix 2010-03-03 13:37:45 UTC
Created attachment 33739 [details]
Log of bisection of failure; I believe all but the last few lines, but repeating those last few bisections dumps me back in the same ridiculous place again

I found to my unhappiness that suspension locks up solid in the atomic copy/restore phase, on my x86-64 KMS system. There's no need to start X or do anything 3D, I can reproduce this from a framebuffer console login prompt. The fault is plainly Radeon KMS's: compile it out and suspension works file.

Nothing is logged on the netconsole, even with verbose PM debugging on.

The graphics card is an HD4870, and suspension mostly worked with it in 2.6.32 (there are circumstances in which TuxOnIce does two suspensions without an intervening resume, and those have always caused Radeon KMS to lock up).

My attempts to bisect it were somewhat hampered by *another* suspend-resume bug with similar symptoms (for me, a triple flash of the caps-lock light followed by a spontaneous reboot, at atomic copy/restore time), fixed by commit 9270eb1b496cb002d75f49ef82c9ef4cbd22a5a0. (The log for this commit helpfully didn't mention suspend/resume at all, only the bug number, so my grepping checks were fruitless and I wasted six hours bisecting to a fixed bug. Bah.)

Unfortunately, my later attempt to bisect to the start of the freeze that I see in 2.6.33 failed, dumping me on a PowerPC commit. It's all completely reproducible -- bisection log attached -- but I'm sufficiently unconfident of it that I'll reproduce it again tomorrow with a few skips in there to see if I get any better results. (It *is* clear that I see a working 2.6.32, then f.d.o bug 25733, then a period of working suspend, and then a period of hard lockup which persists until 2.6.33.)

If there's anything I can do to help debug a hard lockup like this, please say. I do have a second machine available to debug the first, but if the first is *dead* it's hard to do anything...
Comment 1 Nix 2010-03-03 13:38:43 UTC
Created attachment 33740 [details]
Kernel config on crashing machine ('make oldconfig' from working 2.6.32 configuration)
Comment 2 Nix 2010-03-04 15:40:02 UTC
I re-bisected with more care and found it. Unfortunately it's a regression from a major API rework :/

The faulty commit pair is

commit ca262a9998d46196750bb19a9dc4bd465b170ff7
Author: Jerome Glisse <jglisse@redhat.com>
Date:   Tue Dec 8 15:33:32 2009 +0100

    drm/ttm: Rework validation & memory space allocation (V3)

commit 312ea8da049a1830aa50c6e00002e50e30df476e
Author: Jerome Glisse <jglisse@redhat.com>
Date:   Mon Dec 7 15:52:58 2009 +0100

    drm/radeon/kms: Convert radeon to new TTM validation API (V2)

Before these commits, suspension works. Afterwards, instead of suspension I see a quintuple(?) flash of the caps lock light and a hard reboot. I'm not certain what this means: a triple fault?

The second change in behaviour, between abrupt reboot on suspend and hard hang, I haven't yet fully bisected (forty-three reboots in one evening was quite enough), but it lies in the range 2c761270d5520dd84ab0b4e47c24d99ff8503c38..004b35063296b6772fa72404a35b498f1e71e87e.
Comment 3 Nix 2010-03-15 15:21:21 UTC
Crash persists with 2.6.33.1+tip of dairlie's drm-2.6.git/drm-radeon-testing git tree.
Comment 4 Nix 2010-03-31 04:19:11 UTC
Comparison with other Radeon and KMS users indicates that this is almost certainly r6xx-r7xx-specific. (rv3xx works, r3xx works, Intel works.)

(Well, either that or I'm the only person in the world who's seeing it: I haven't found anyone else with a r6xx or r7xx who's trying to suspend, yet.)
Comment 5 Nix 2010-04-02 10:08:43 UTC
2.6.33.2+tip fails.
Comment 6 Nix 2010-04-02 18:11:51 UTC
This is now kernel bug 15685.