Bug 26825

Summary: [965GM] memory corruption after recovering from pm-hibernate
Product: xorg Reporter: Santiago García Mantiñán <manty>
Component: Driver/intelAssignee: Carl Worth <cworth>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: kibi, monnier
Version: 7.5 (2009.10)   
Hardware: Other   
OS: All   
URL: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=571741
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
xorg.0.log
none
This is the xorg log when using kms. This config exibits the bug.
none
dmesg with kms activated none

Description Santiago García Mantiñán 2010-03-01 14:42:42 UTC
Created attachment 33668 [details]
xorg.0.log

-- chipset: GM965
-- system architecture: x86_64
-- xf86-video-intel/xserver/mesa/libdrm version:

Running Debian testing (squeeze) packages:

xserver-xorg                                 1:7.5+3
xserver-xorg-core                            2:1.7.5-1
xserver-xorg-video-intel                     2:2.9.1-2
libgl1-mesa-dri                              7.7-4
libgl1-mesa-glx                              7.7-4
libglu1-mesa                                 7.7-4
mesa-utils                                   7.7-4
libdrm-intel1                                2.4.18-2
libdrm-radeon1                               2.4.18-2
libdrm2                                      2.4.18-2

-- kernel version: tried several, currently 2.6.32-3-amd64 but also 2.6.33-2-amd64 (see debian bug for details)
-- Linux distribution: Debian squeeze (testing)
-- Machine or mobo model: Compaq Presario C760EM
-- Display connector: Built in TFT (DVI)

3) Reproduce steps.
I boot the machine, log on to X (gdm and then icewm) start up 4 rxvts iceweasel pidgin, ... then pm-hibernate, then power on.

The result of this seems to be memory corruption. You can see the messages I was seing during all the tests I did on the debian bugreport (571741).
Comment 1 Santiago García Mantiñán 2010-03-01 15:57:25 UTC
Created attachment 33670 [details]
This is the xorg log when using kms. This config exibits the bug.

The other xorg log I had attached before was the one produced by the same xorg.conf but loading the kernel i915 module without the modeset option. That config doesn't cause any memory corruption. It is this Xorg log I'm attaching now and loading i915 with options i915 modeset=1 what produces the memory corruption.

If you compare both logs you'll see that there are some differences (extra xorg modules and the like).
Comment 2 Santiago García Mantiñán 2010-03-01 15:58:27 UTC
Created attachment 33671 [details]
dmesg with kms activated
Comment 3 monnier 2010-04-19 11:06:00 UTC
FWIW, I'm seeing the exact same symptoms: my machine (an MSI Wind U100 running Debian testing) started to behave strangely (like bash segfaulting, or random apps failing with ld.so errors although those same applications were already running before).  So I first suspect a hardware problem and tried to stress-test the machine (RAM/Disk/CPU) but it seemed "rock solid".  I finally correlated the problem with hibernation, and after a few more days of head scratching I discovered that disabling KMS made the problem disappear.  Most likely the problem started to appear when I upgraded to the newer xserver-xorg-video-intel package that enables KMS by default.
Apparently KMS on this machine causes pm-hibernate to corrupt memory somewhere along the way.
Comment 4 Santiago García Mantiñán 2010-04-20 23:06:21 UTC
I'm also seeing this problem on my work's desktop, it is a Lenovo ThinkCentre M58 6239 with a E7500 cpu and a Intel Q45/Q43.

On this machine it is quite more difficult to replicate the problem, at home, on the compaq I can almost hit the problem whenever I want, but here at work only from time to time, about one out of seven times.

Maybe having just 2 GB of memory at home and 4GB at the lenovo makes the difference on how hard it is to get it to crash.
Comment 5 monnier 2010-05-04 09:47:50 UTC
> FWIW, I'm seeing the exact same symptoms: my machine (an MSI Wind U100 running

I just got a Thinkpad X201s (with those i7 LM-620 CPU+GPU combos) and see the same problem there.  It's more problematic on that machine, tho, because the X server doesn't work (and renders the machine unusable) without KMS, and because pm-suspend hits another bug (CPU soft-lockup), so there's no way to avoid a shutdown&reboot short of keeping the machine ON all the time :-(
Comment 6 Chris Wilson 2010-07-02 11:03:16 UTC
commit 985b823b919273fe1327d56d2196b4f92e5d0fae
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Jul 2 10:04:42 2010 +1000

    drm/i915: fix hibernation since i915 self-reclaim fixes
    
    Since commit 4bdadb9785696439c6e2b3efe34aa76df1149c83 ("drm/i915:
    Selectively enable self-reclaim"), we've been passing GFP_MOVABLE to the
    i915 page allocator where we weren't before due to some over-eager
    removal of the page mapping gfp_flags games the code used to play.
    
    This caused hibernate on Intel hardware to result in a lot of memory
    corruptions on resume.  See for example
    
      http://bugzilla.kernel.org/show_bug.cgi?id=13811
    
    Reported-by: Evengi Golov (in bugzilla)
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Tested-by: M. Vefa Bicakci <bicave@superonline.com>
    Cc: stable@kernel.org
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.