Hi, I'm currently on debian wheezy (kernel 3.0.0, xserver-xorg-video-intel 2:2.15.0-3, xserver-xorg 1:7.6+8) and suspend to disk causes frequent reboots at resume. DRI looks incriminated because s2disk works perfectly from runlevel 1. When resume succeeds, I often get a lot of relocation errors and segfaults after resume. Please see attachments, let me know if you need more details. Cheers, Nicolas FRANÇOIS lspci: 00:00.0 Host bridge: Intel Corporation Core Processor DRAM Controller (rev 02) 00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 02) 00:16.0 Communication controller: Intel Corporation 5 Series/3400 Series Chipset HECI Controller (rev 06) 00:1a.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05) 00:1b.0 Audio device: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio (rev 05) 00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 05) 00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 05) 00:1c.2 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 3 (rev 05) 00:1d.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev a5) 00:1f.0 ISA bridge: Intel Corporation Mobile 5 Series Chipset LPC Interface Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller (rev 05) 00:1f.6 Signal processing controller: Intel Corporation 5 Series/3400 Series Chipset Thermal Subsystem (rev 05) 04:00.0 System peripheral: JMicron Technology Corp. SD/MMC Host Controller (rev 80) 04:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller (rev 80) 04:00.3 System peripheral: JMicron Technology Corp. MS Host Controller (rev 80) 04:00.5 Ethernet controller: JMicron Technology Corp. JMC250 PCI Express Gigabit Ethernet Controller (rev 03) 05:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8191SEvB Wireless LAN Controller (rev 10) ff:00.0 Host bridge: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers (rev 02) ff:00.1 Host bridge: Intel Corporation Core Processor QuickPath Architecture System Address Decoder (rev 02) ff:02.0 Host bridge: Intel Corporation Core Processor QPI Link 0 (rev 02) ff:02.1 Host bridge: Intel Corporation Core Processor QPI Physical 0 (rev 02) ff:02.2 Host bridge: Intel Corporation Core Processor Reserved (rev 02) ff:02.3 Host bridge: Intel Corporation Core Processor Reserved (rev 02)
Created attachment 50382 [details] kernel log
looks like bug#36071
Hi Gordon, Yes this looks similar. At resume, everything happens as expected until memory pages are loaded up to 100%, then screen flickers, and finally the kernel reboots the machine. It never returns to userspace. I also tried netconsole, but kernel complains that my network card doesn't suport polling, no luck... Cheers (In reply to comment #2) > looks like bug#36071
Created attachment 50426 [details] Stacktrace while hibernating (not at resume) (Sorry for the ugly jpeg, I catched this by chance) This is weird, it looks like it is already thawing. Did a hard reset after this, and it rebooted after loading pages.
Looks like the monitor thread runs after we remove the IPS driver and references something it shouldn't... Can you gdb your i915.ko and do a "list *i915_chipset_val+0xbc" and also gdb your intel_ips.ko and do a "list *ips_monitor+0x341"?
Hi, Here it is: (gdb) list *i915_chipset_val+0xbc 0x23c8 is in i915_chipset_val (/tmp/buildd/linux-2.6-3.0.0/debian/build/source_amd64_none/include/linux/math64.h:18). 13 in /tmp/buildd/linux-2.6-3.0.0/debian/build/source_amd64_none/include/linux/math64.h (gdb) list *ips_monitor+0x341 0xd6c is in ips_monitor (/tmp/buildd/linux-2.6-3.0.0/debian/build/source_amd64_none/drivers/platform/x86/intel_ips.c:943). 938 in /tmp/buildd/linux-2.6-3.0.0/debian/build/source_amd64_none/drivers/platform/x86/intel_ips.c Cheers, NicolaF (In reply to comment #5) > Looks like the monitor thread runs after we remove the IPS driver and > references something it shouldn't... Can you gdb your i915.ko and do a "list > *i915_chipset_val+0xbc" and also gdb your intel_ips.ko and do a "list > *ips_monitor+0x341"?
Created attachment 50648 [details] [review] Use suspend/resume routines instead of hibernate/thaw Hi, After further investigations, I found this bug, reported kernel side: https://bugzilla.kernel.org/show_bug.cgi?id=37142 The symptoms are quite similar (memory corruption, which may, I think, lead to the reboot problems I experience), and the proposed patch (thanks to Rafael J. Wysocki), which I re-attach here, works perfectly for me. This is a bit dirty (there must be good reasons to do different things when suspending and hibernating), but works for me, no reboots or memory corruption in tenths of hibernate/thaw cycles. However, the thread synchronization problem is still there, I got that null pointer dereference stacktrace once again. Cheers, NicolaF
Any new developments here? Although you cannot see this (because bugzilla.kernel.org is down) that patch from Rafael (essentially replacement of freeze/thaw with suspend/resume) did not work for everyone in kernel bug #37142. I tried as recent as 3.1.0-rc6 without any luck. Still memory corruption after several hibernate/thaw cycles.
(In reply to comment #8) > I tried as recent as 3.1.0-rc6 without any luck. Still memory corruption after > several hibernate/thaw cycles. Also, rc7.
(Updating) After investigation by Bojan Smojver on intel-gfx mailing list, the problem seems to only happen when modeset is enabled. When booting with 'nomodeset', the issue does not happens [1]. Could someone affected by this issue confirm this please? [1] http://permalink.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/6173
(In reply to comment #10) > (Updating) > After investigation by Bojan Smojver on intel-gfx mailing list, the problem > seems to only happen when modeset is enabled. When booting with 'nomodeset', > the issue does not happens [1]. > > Could someone affected by this issue confirm this please? > > [1] http://permalink.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/6173 Hi, It seems to work: Just performed about 10 hibernate/thaw cycles (with some suspend to disk, for the fun), and no problems for the moment. Cheers, NicolaF
but my X wont start when i use "nomodeset" kernel boot option... [ 22.637] (EE) open /dev/fb0: No such device ... [ 22.839] (II) VESA(0): VBESetVBEMode failed how can i have X-ability _and_ suspend-ability? -arne
Created attachment 57169 [details] [review] freeze workqueue on suspend For the ones affected by this issue, could you please try with this patch?
(In reply to comment #13) > Created attachment 57169 [details] [review] [review] > freeze workqueue on suspend > > For the ones affected by this issue, could you please try with this patch? The patch did not help my ThinkPad T510. I got segfaults, just like before, after about 20 something hibernate/thaw cycles. They looked like this: [ 723.970911] pm-hibernate[8884]: segfault at 0 ip 0000000000477900 sp 00007fff674d1730 error 6 in bash[400000+da000] [ 727.545054] pm-hibernate[8894]: segfault at 0 ip 0000000000477900 sp 00007fff8298a860 error 6 in bash[400000+da000] [ 731.099119] pm-hibernate[8905]: segfault at 0 ip 0000000000477900 sp 00007fff76919de0 error 6 in bash[400000+da000] [ 734.669372] pm-hibernate[8916]: segfault at 0 ip 0000000000477900 sp 00007fff4f5b3700 error 6 in bash[400000+da000] [ 738.248239] pm-hibernate[8927]: segfault at 0 ip 0000000000477900 sp 00007fff3f7c4d70 error 6 in bash[400000+da000] [ 741.816694] pm-hibernate[8950]: segfault at 0 ip 0000000000477900 sp 00007fff28b757d0 error 6 in bash[400000+da000] [ 745.311532] pm-hibernate[8961]: segfault at 0 ip 0000000000477900 sp 00007fff319232a0 error 6 in bash[400000+da000] [ 748.936928] pm-hibernate[8972]: segfault at 0 ip 0000000000477900 sp 00007fff8d0da390 error 6 in bash[400000+da000] [ 752.562089] pm-hibernate[8982]: segfault at 0 ip 0000000000477900 sp 00007fff18368f50 error 6 in bash[400000+da000]
Could you please try with the Dave's patch from https://lkml.org/lkml/2012/3/29/72 (the patch itself is http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=3fa016a0b5c5237e9c387fc3249592b2cb5391c6)? I am fairly sure it could solve this..
We believe we finally have the root cause of so many crashes following hibernation. Please update and test, thanks. commit 3fa016a0b5c5237e9c387fc3249592b2cb5391c6 Author: Dave Airlie <airlied@redhat.com> Date: Wed Mar 28 10:48:49 2012 +0100 drm/i915: suspend fbdev device around suspend/hibernate Looking at hibernate overwriting I though it looked like a cursor, so I tracked down this missing piece to stop the cursor blink timer. I've no idea if this is sufficient to fix the hibernate problems people are seeing, but please test it. Both radeon and nouveau have done this for a long time. I've run this personally all night hib/resume cycles with no fails. Reviewed-by: Keith Packard <keithp@keithp.com> Reported-by: Petr Tesarik <kernel@tesarici.cz> Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Reported-by: Lots of misc segfaults after hibernate across the world. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=37142 Tested-by: Dave Airlie <airlied@redhat.com> Tested-by: Bojan Smojver <bojan@rexursive.com> Tested-by: Andreas Hartmann <andihartmann@01019freenet.de> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.