Bugzilla – Bug 35648
[SNB] Kernel may crash after resume from S4 for several times
Last modified: 2012-04-10 22:20:13 UTC
Created attachment 44811 [details]
Bug detailed description:
Kernel crashed when doning Suspend-To-Disk(S4) for several times in X mode on sugarbay. This is regression. The last known good commit is kernel(2.6.37)3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5.
2. echo disk > /sys/power/state
3. repeat step 2 about 2-20 times
Can you do some sanity checking with the same kernel but S4 from VT and S4 without i915.ko?
Created attachment 44931 [details]
Tested about 50 times with the same Kernel, it also crashed on S4 from VT. It didn't happen on S4 without i915.ko.
Does Zhenyu's SNB resume patches help here? [Would be good to get some testing on those at any rate.]
Would you like to tell me where can I find Zhenyu's SNB resume patches?
You can directly download patches from http://people.freedesktop.org/~zhen/snb_desk_suspend_0323/
And note that Nanhai is working a workaround that need to be applied for SNB render engine after power cycle. You should ask him to test his patch too.
Kernel also crashed with Zhenyu's patches.
Xun, would you pls help bi-sect? sounds like you can easily reproduce the hang in 'several rounds' of S4...
S4 on SugarBay SDV was pretty stable, Rui@ACPI team once tested 1000+ times..
It seems that is not a regression. I retest it about 60 times with kernel(2.6.37)3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5 and find it also crashes.
BTW, retest S4 without i915.ko about 100 times and no crash happen.
Adjusting priority fields to reflect severity and impact. We still need to fix it, it will just take a little longer if it was not due to a recent regression.
Promoting to P1 for Q2 release consideration.
Xun, how many of our SNB machines have this problem?
(In reply to comment #11)
> Promoting to P1 for Q2 release consideration.
> Xun, how many of our SNB machines have this problem?
It happens on our two sugarbay machines.
x-sgb1: SugarBay Qual SDP (DH): i7-2600 D2 (id=0x0102, rev 09), H67 B1 (Intel DH67CL rev 03), and Host Bridge id=0x0100 (rev09)
(x-sgb3: SugarBay desktop: i5-2500K product (id=0x0112, rev 09), H67 B1 SDP (rev 03), and Host Bridge id=0x0100 (rev09)
The panic happens in d_move, which makes me thing we're clobbering filesystem state somehow. Is the backtrace you get consistent? Does it still happen with 3.0-rc2?
Highly unlikely to be fixed before release.
I tend to put it in the P1 list -- even if we can't fix it in this release, we need maintain it as known issue in release notes.
Xun, can you answer Jesse's question (comment#13), by running the latest drm-intel-fixes?
It still happens with latest drm-intel-fixes kernel(3.0.0rc7).
Backtrace seems to be diffrent from the previous. Below is the Call trace.
kernel: [<ffffffff8110d9ce>] ? __sync_filesystem+0x75/0x75
kernel: [<ffffffff8110d99b>] __sync_filesystem+0x42/0x75
kernel: [<ffffffff8110d9df>] sync_one_sb+0x11/0x13
kernel: [<ffffffff810ed4c6>] iterate_supers+0x67/0xb7
kernel: [<ffffffff8110da21>] sys_sync+0x40/0x57
kernel: [<ffffffff81070f2c>] hibernate+0x88/0x1b8
kernel: [<ffffffff8106fa4c>] state_store+0x57/0xce
kernel: [<ffffffff811df993>] kobj_attr_store+0x17/0x19
kernel: [<ffffffff811420a2>] sysfs_write_file+0x10c/0x148
kernel: [<ffffffff810ebd0d>] vfs_write+0xae/0x153
kernel: [<ffffffff810ebe6b>] sys_write+0x45/0x6c
kernel: [<ffffffff813bf6fb>] system_call_fastpath+0x16/0x1b
kernel: Code: 48 c7 c7 40 23 69 81 45 31 ed e8 b3 f6 2a 00 49 8b 9c 24 c0 00 00 00 49 81 c4 c0 00 00 00 48 81 eb 90 00 00 00 eb 7b 4c 8d 73 20 <4c> 8b bb 48 01 00 00 4c 89 f7 e8 88 f6 2a 00 f6 43 28 38 75 07
Jesse, after setting no_console_suspend, I get more info out, please check the attached setting no_console_suspend.log.
(In reply to comment #13)
> The panic happens in d_move, which makes me thing we're clobbering filesystem
> state somehow. Is the backtrace you get consistent? Does it still happen with
Created attachment 49485 [details]
Those backtraces look unrelated to gfx; does the same panic occur even without i915 loaded (you'll need to use netconsole to see it still).
Panic still occurs with kernel 3.1.0-rc1. it doesn't occur without i915 loaded.
Could you please check if those issues happen if you disable modesetting (e.g., boot with 'nomodeset' kernel parameter)?
The issue goes away by using 'nomodeset' kernel parameter. It still happens when modeset is used.
It still fails on SandyBride with Kernel 3.1(c3b92c8787367a8bb53d57d9789b558f1295cc96). I don't see this on IvyBridge.
*** Bug 35846 has been marked as a duplicate of this bug. ***
Created attachment 57170 [details] [review]
freeze workqueue on suspend
Could you please try with this patch and verify if it changes anything?
sorry, the bug still exists on 3.2.4 with that patch. (In reply to comment #25)
> Created attachment 57170 [details] [review] [review]
> freeze workqueue on suspend
> Could you please try with this patch and verify if it changes anything?
Could you please try with the Dave's patch from https://lkml.org/lkml/2012/3/29/72 (the patch itself is http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=3fa016a0b5c5237e9c387fc3249592b2cb5391c6)? I am fairly sure it could solve this..
We believe we finally have the root cause of so many crashes following hibernation. Please update and test, thanks.
Author: Dave Airlie <firstname.lastname@example.org>
Date: Wed Mar 28 10:48:49 2012 +0100
drm/i915: suspend fbdev device around suspend/hibernate
Looking at hibernate overwriting I though it looked like a cursor,
so I tracked down this missing piece to stop the cursor blink
timer. I've no idea if this is sufficient to fix the hibernate
problems people are seeing, but please test it.
Both radeon and nouveau have done this for a long time.
I've run this personally all night hib/resume cycles with no fails.
Reviewed-by: Keith Packard <email@example.com>
Reported-by: Petr Tesarik <firstname.lastname@example.org>
Reported-by: Stanislaw Gruszka <email@example.com>
Reported-by: Lots of misc segfaults after hibernate across the world.
Tested-by: Dave Airlie <firstname.lastname@example.org>
Tested-by: Bojan Smojver <email@example.com>
Tested-by: Andreas Hartmann <firstname.lastname@example.org>
Signed-off-by: Dave Airlie <email@example.com>
It works fine. No crash happens. Verified with drm-intel-fixes commit 14667a4bde4361b7ac420d68a2e9e9b9b2df5231.