Summary: | [SNB] Kernel may crash after resume from S4 for several times | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | fangxun <xunx.fang> | ||||||||||
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> | ||||||||||
Status: | CLOSED FIXED | QA Contact: | |||||||||||
Severity: | critical | ||||||||||||
Priority: | high | CC: | bryce, eugeni, jbarnes, nanhai.zou, yuanhan.liu, zhenyu.z.wang | ||||||||||
Version: | unspecified | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux (All) | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | i915 features: | ||||||||||||
Bug Depends on: | |||||||||||||
Bug Blocks: | 42991, 44622 | ||||||||||||
Attachments: |
|
Can you do some sanity checking with the same kernel but S4 from VT and S4 without i915.ko? Created attachment 44931 [details]
dmesg_s4_from_vt
Tested about 50 times with the same Kernel, it also crashed on S4 from VT. It didn't happen on S4 without i915.ko.
Does Zhenyu's SNB resume patches help here? [Would be good to get some testing on those at any rate.] Would you like to tell me where can I find Zhenyu's SNB resume patches? You can directly download patches from http://people.freedesktop.org/~zhen/snb_desk_suspend_0323/ And note that Nanhai is working a workaround that need to be applied for SNB render engine after power cycle. You should ask him to test his patch too. Kernel also crashed with Zhenyu's patches. Xun, would you pls help bi-sect? sounds like you can easily reproduce the hang in 'several rounds' of S4... S4 on SugarBay SDV was pretty stable, Rui@ACPI team once tested 1000+ times.. It seems that is not a regression. I retest it about 60 times with kernel(2.6.37)3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5 and find it also crashes. BTW, retest S4 without i915.ko about 100 times and no crash happen. Adjusting priority fields to reflect severity and impact. We still need to fix it, it will just take a little longer if it was not due to a recent regression. Promoting to P1 for Q2 release consideration. Xun, how many of our SNB machines have this problem? (In reply to comment #11) > Promoting to P1 for Q2 release consideration. > Xun, how many of our SNB machines have this problem? It happens on our two sugarbay machines. x-sgb1: SugarBay Qual SDP (DH): i7-2600 D2 (id=0x0102, rev 09), H67 B1 (Intel DH67CL rev 03), and Host Bridge id=0x0100 (rev09) (x-sgb3: SugarBay desktop: i5-2500K product (id=0x0112, rev 09), H67 B1 SDP (rev 03), and Host Bridge id=0x0100 (rev09) The panic happens in d_move, which makes me thing we're clobbering filesystem state somehow. Is the backtrace you get consistent? Does it still happen with 3.0-rc2? Highly unlikely to be fixed before release. I tend to put it in the P1 list -- even if we can't fix it in this release, we need maintain it as known issue in release notes. Xun, can you answer Jesse's question (comment#13), by running the latest drm-intel-fixes? It still happens with latest drm-intel-fixes kernel(3.0.0rc7). Backtrace seems to be diffrent from the previous. Below is the Call trace. Call Trace: kernel: [<ffffffff8110d9ce>] ? __sync_filesystem+0x75/0x75 kernel: [<ffffffff8110d99b>] __sync_filesystem+0x42/0x75 kernel: [<ffffffff8110d9df>] sync_one_sb+0x11/0x13 kernel: [<ffffffff810ed4c6>] iterate_supers+0x67/0xb7 kernel: [<ffffffff8110da21>] sys_sync+0x40/0x57 kernel: [<ffffffff81070f2c>] hibernate+0x88/0x1b8 kernel: [<ffffffff8106fa4c>] state_store+0x57/0xce kernel: [<ffffffff811df993>] kobj_attr_store+0x17/0x19 kernel: [<ffffffff811420a2>] sysfs_write_file+0x10c/0x148 kernel: [<ffffffff810ebd0d>] vfs_write+0xae/0x153 kernel: [<ffffffff810ebe6b>] sys_write+0x45/0x6c kernel: [<ffffffff813bf6fb>] system_call_fastpath+0x16/0x1b kernel: Code: 48 c7 c7 40 23 69 81 45 31 ed e8 b3 f6 2a 00 49 8b 9c 24 c0 00 00 00 49 81 c4 c0 00 00 00 48 81 eb 90 00 00 00 eb 7b 4c 8d 73 20 <4c> 8b bb 48 01 00 00 4c 89 f7 e8 88 f6 2a 00 f6 43 28 38 75 07 Jesse, after setting no_console_suspend, I get more info out, please check the attached setting no_console_suspend.log. (In reply to comment #13) > The panic happens in d_move, which makes me thing we're clobbering filesystem > state somehow. Is the backtrace you get consistent? Does it still happen with > 3.0-rc2? Created attachment 49485 [details]
setting no_console_suspend
Those backtraces look unrelated to gfx; does the same panic occur even without i915 loaded (you'll need to use netconsole to see it still). Panic still occurs with kernel 3.1.0-rc1. it doesn't occur without i915 loaded. Hi, Could you please check if those issues happen if you disable modesetting (e.g., boot with 'nomodeset' kernel parameter)? The issue goes away by using 'nomodeset' kernel parameter. It still happens when modeset is used. It still fails on SandyBride with Kernel 3.1(c3b92c8787367a8bb53d57d9789b558f1295cc96). I don't see this on IvyBridge. *** Bug 35846 has been marked as a duplicate of this bug. *** Created attachment 57170 [details] [review] freeze workqueue on suspend Could you please try with this patch and verify if it changes anything? sorry, the bug still exists on 3.2.4 with that patch. (In reply to comment #25) > Created attachment 57170 [details] [review] [review] > freeze workqueue on suspend > > Could you please try with this patch and verify if it changes anything? sorry, the bug still exists on 3.2.4 with that patch. (In reply to comment #25) > Created attachment 57170 [details] [review] [review] > freeze workqueue on suspend > > Could you please try with this patch and verify if it changes anything? Could you please try with the Dave's patch from https://lkml.org/lkml/2012/3/29/72 (the patch itself is http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=3fa016a0b5c5237e9c387fc3249592b2cb5391c6)? I am fairly sure it could solve this.. We believe we finally have the root cause of so many crashes following hibernation. Please update and test, thanks. commit 3fa016a0b5c5237e9c387fc3249592b2cb5391c6 Author: Dave Airlie <airlied@redhat.com> Date: Wed Mar 28 10:48:49 2012 +0100 drm/i915: suspend fbdev device around suspend/hibernate Looking at hibernate overwriting I though it looked like a cursor, so I tracked down this missing piece to stop the cursor blink timer. I've no idea if this is sufficient to fix the hibernate problems people are seeing, but please test it. Both radeon and nouveau have done this for a long time. I've run this personally all night hib/resume cycles with no fails. Reviewed-by: Keith Packard <keithp@keithp.com> Reported-by: Petr Tesarik <kernel@tesarici.cz> Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Reported-by: Lots of misc segfaults after hibernate across the world. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=37142 Tested-by: Dave Airlie <airlied@redhat.com> Tested-by: Bojan Smojver <bojan@rexursive.com> Tested-by: Andreas Hartmann <andihartmann@01019freenet.de> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com> It works fine. No crash happens. Verified with drm-intel-fixes commit 14667a4bde4361b7ac420d68a2e9e9b9b2df5231. Closing old verified+fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 44811 [details] dmesg file System Environment: -------------------------- Arch: x86_64 Platform: sugarbay Libdrm: (master)2.4.24-7-gfd3ed34a2070fca3804baf54ece40d0bc2666226 Mesa: (7.10)b8a077cee0f3856d5c3d4468918513515bbd0dcb Xserver: (master)xorg-server-1.10.0 Xf86_video_intel: (master)2.14.901-4-gee740778f5d5355c04f6fc4564f598993b106d62 Kernel: (drm-intel-fixes)f0c860246472248a534656d6cdbed5a36d1feb2e Bug detailed description: ------------------------- Kernel crashed when doning Suspend-To-Disk(S4) for several times in X mode on sugarbay. This is regression. The last known good commit is kernel(2.6.37)3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5. Reproduce steps: ---------------- 1. xinit& 2. echo disk > /sys/power/state 3. repeat step 2 about 2-20 times