Summary: | [ILK] Suspend to disk: Random (frequent) reboots at resume | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Nicolas FRANÇOIS <nicolas.mb.francois> | ||||||||||
Component: | DRM/Intel | Assignee: | Jesse Barnes <jbarnes> | ||||||||||
Status: | CLOSED FIXED | QA Contact: | |||||||||||
Severity: | major | ||||||||||||
Priority: | medium | CC: | ben, bojan, chris, eugeni, jbarnes, jrnieder | ||||||||||
Version: | unspecified | ||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||
OS: | Linux (All) | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | i915 features: | ||||||||||||
Attachments: |
|
Description
Nicolas FRANÇOIS
2011-08-19 10:44:29 UTC
Created attachment 50382 [details]
kernel log
Hi Gordon, Yes this looks similar. At resume, everything happens as expected until memory pages are loaded up to 100%, then screen flickers, and finally the kernel reboots the machine. It never returns to userspace. I also tried netconsole, but kernel complains that my network card doesn't suport polling, no luck... Cheers (In reply to comment #2) > looks like bug#36071 Created attachment 50426 [details]
Stacktrace while hibernating (not at resume)
(Sorry for the ugly jpeg, I catched this by chance)
This is weird, it looks like it is already thawing.
Did a hard reset after this, and it rebooted after loading pages.
Looks like the monitor thread runs after we remove the IPS driver and references something it shouldn't... Can you gdb your i915.ko and do a "list *i915_chipset_val+0xbc" and also gdb your intel_ips.ko and do a "list *ips_monitor+0x341"? Hi, Here it is: (gdb) list *i915_chipset_val+0xbc 0x23c8 is in i915_chipset_val (/tmp/buildd/linux-2.6-3.0.0/debian/build/source_amd64_none/include/linux/math64.h:18). 13 in /tmp/buildd/linux-2.6-3.0.0/debian/build/source_amd64_none/include/linux/math64.h (gdb) list *ips_monitor+0x341 0xd6c is in ips_monitor (/tmp/buildd/linux-2.6-3.0.0/debian/build/source_amd64_none/drivers/platform/x86/intel_ips.c:943). 938 in /tmp/buildd/linux-2.6-3.0.0/debian/build/source_amd64_none/drivers/platform/x86/intel_ips.c Cheers, NicolaF (In reply to comment #5) > Looks like the monitor thread runs after we remove the IPS driver and > references something it shouldn't... Can you gdb your i915.ko and do a "list > *i915_chipset_val+0xbc" and also gdb your intel_ips.ko and do a "list > *ips_monitor+0x341"? Created attachment 50648 [details] [review] Use suspend/resume routines instead of hibernate/thaw Hi, After further investigations, I found this bug, reported kernel side: https://bugzilla.kernel.org/show_bug.cgi?id=37142 The symptoms are quite similar (memory corruption, which may, I think, lead to the reboot problems I experience), and the proposed patch (thanks to Rafael J. Wysocki), which I re-attach here, works perfectly for me. This is a bit dirty (there must be good reasons to do different things when suspending and hibernating), but works for me, no reboots or memory corruption in tenths of hibernate/thaw cycles. However, the thread synchronization problem is still there, I got that null pointer dereference stacktrace once again. Cheers, NicolaF Any new developments here? Although you cannot see this (because bugzilla.kernel.org is down) that patch from Rafael (essentially replacement of freeze/thaw with suspend/resume) did not work for everyone in kernel bug #37142. I tried as recent as 3.1.0-rc6 without any luck. Still memory corruption after several hibernate/thaw cycles. (In reply to comment #8) > I tried as recent as 3.1.0-rc6 without any luck. Still memory corruption after > several hibernate/thaw cycles. Also, rc7. (Updating) After investigation by Bojan Smojver on intel-gfx mailing list, the problem seems to only happen when modeset is enabled. When booting with 'nomodeset', the issue does not happens [1]. Could someone affected by this issue confirm this please? [1] http://permalink.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/6173 (In reply to comment #10) > (Updating) > After investigation by Bojan Smojver on intel-gfx mailing list, the problem > seems to only happen when modeset is enabled. When booting with 'nomodeset', > the issue does not happens [1]. > > Could someone affected by this issue confirm this please? > > [1] http://permalink.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/6173 Hi, It seems to work: Just performed about 10 hibernate/thaw cycles (with some suspend to disk, for the fun), and no problems for the moment. Cheers, NicolaF but my X wont start when i use "nomodeset" kernel boot option... [ 22.637] (EE) open /dev/fb0: No such device ... [ 22.839] (II) VESA(0): VBESetVBEMode failed how can i have X-ability _and_ suspend-ability? -arne Created attachment 57169 [details] [review] freeze workqueue on suspend For the ones affected by this issue, could you please try with this patch? (In reply to comment #13) > Created attachment 57169 [details] [review] [review] > freeze workqueue on suspend > > For the ones affected by this issue, could you please try with this patch? The patch did not help my ThinkPad T510. I got segfaults, just like before, after about 20 something hibernate/thaw cycles. They looked like this: [ 723.970911] pm-hibernate[8884]: segfault at 0 ip 0000000000477900 sp 00007fff674d1730 error 6 in bash[400000+da000] [ 727.545054] pm-hibernate[8894]: segfault at 0 ip 0000000000477900 sp 00007fff8298a860 error 6 in bash[400000+da000] [ 731.099119] pm-hibernate[8905]: segfault at 0 ip 0000000000477900 sp 00007fff76919de0 error 6 in bash[400000+da000] [ 734.669372] pm-hibernate[8916]: segfault at 0 ip 0000000000477900 sp 00007fff4f5b3700 error 6 in bash[400000+da000] [ 738.248239] pm-hibernate[8927]: segfault at 0 ip 0000000000477900 sp 00007fff3f7c4d70 error 6 in bash[400000+da000] [ 741.816694] pm-hibernate[8950]: segfault at 0 ip 0000000000477900 sp 00007fff28b757d0 error 6 in bash[400000+da000] [ 745.311532] pm-hibernate[8961]: segfault at 0 ip 0000000000477900 sp 00007fff319232a0 error 6 in bash[400000+da000] [ 748.936928] pm-hibernate[8972]: segfault at 0 ip 0000000000477900 sp 00007fff8d0da390 error 6 in bash[400000+da000] [ 752.562089] pm-hibernate[8982]: segfault at 0 ip 0000000000477900 sp 00007fff18368f50 error 6 in bash[400000+da000] Could you please try with the Dave's patch from https://lkml.org/lkml/2012/3/29/72 (the patch itself is http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=3fa016a0b5c5237e9c387fc3249592b2cb5391c6)? I am fairly sure it could solve this.. We believe we finally have the root cause of so many crashes following hibernation. Please update and test, thanks. commit 3fa016a0b5c5237e9c387fc3249592b2cb5391c6 Author: Dave Airlie <airlied@redhat.com> Date: Wed Mar 28 10:48:49 2012 +0100 drm/i915: suspend fbdev device around suspend/hibernate Looking at hibernate overwriting I though it looked like a cursor, so I tracked down this missing piece to stop the cursor blink timer. I've no idea if this is sufficient to fix the hibernate problems people are seeing, but please test it. Both radeon and nouveau have done this for a long time. I've run this personally all night hib/resume cycles with no fails. Reviewed-by: Keith Packard <keithp@keithp.com> Reported-by: Petr Tesarik <kernel@tesarici.cz> Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Reported-by: Lots of misc segfaults after hibernate across the world. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=37142 Tested-by: Dave Airlie <airlied@redhat.com> Tested-by: Bojan Smojver <bojan@rexursive.com> Tested-by: Andreas Hartmann <andihartmann@01019freenet.de> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com> |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.