Bug 28478 - Intermittent graphics lockups due to overflow/loop with Intel driver
Summary: Intermittent graphics lockups due to overflow/loop with Intel driver
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: 7.5 (2009.10)
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 28471 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-06-10 02:51 UTC by mutlu
Modified: 2010-08-08 12:11 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg.0.log.old (34.75 KB, text/plain)
2010-06-10 02:51 UTC, mutlu
no flags Details

Description mutlu 2010-06-10 02:51:06 UTC
Created attachment 36189 [details]
Xorg.0.log.old

I get seemingly random lockups of the entire graphics system (not able to switch to other virtual consoles) due to EQ overflows in X when using the Intel driver and compositing under KDE 4.4. I am on Arch Linux.

%lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)

kernel26 2.6.33.4
xorg-server 1.7.6
xf86-video-intel 2.10.0
intel-dri 7.7.1
mesa 7.7.1

I use no xorg.conf file

Kernel options: enable_mtrr_cleanup nopat i915.powersave=0
(without (some of?) these the display goes black after resume)

Xorg.0.log.old is attached.


Backtrace:
0: /usr/bin/X (xorg_backtrace+0x3b) [0x809f81b]
1: /usr/bin/X (mieqEnqueue+0x1ab) [0x809856b]
2: /usr/bin/X (xf86PostMotionEventP+0xd2) [0x80a3b22]
3: /usr/lib/xorg/modules/input/evdev_drv.so (0xb72a9000+0x4581) [0xb72ad581]
4: /usr/lib/xorg/modules/input/evdev_drv.so (0xb72a9000+0x487e) [0xb72ad87e]
5: /usr/bin/X (0x8048000+0x6663f) [0x80ae63f]
6: /usr/bin/X (0x8048000+0xf95b4) [0x81415b4]
7: (vdso) (__kernel_sigreturn+0x0) [0xb780f400]
8: /usr/lib/libpixman-1.so.0 (0xb773a000+0x59e5a) [0xb7793e5a]
9: /usr/lib/libpixman-1.so.0 (0xb773a000+0x16373) [0xb7750373]
10: /usr/lib/libpixman-1.so.0 (pixman_blt+0x78) [0xb7775b18]
11: /usr/lib/xorg/modules/libfb.so (fbCopyNtoN+0x24d) [0xb7240bdd]
12: /usr/lib/xorg/modules/drivers/intel_drv.so (0xb724c000+0x35aca) [0xb7281aca]
13: /usr/bin/X (miCopyRegion+0x21b) [0x81a5b4b]
14: /usr/bin/X (miDoCopy+0x44d) [0x81a606d]
15: /usr/lib/xorg/modules/drivers/intel_drv.so (0xb724c000+0x35328) [0xb7281328]
16: /usr/bin/X (0x8048000+0xc65c3) [0x810e5c3]
17: /usr/bin/X (0x8048000+0x3e2d5) [0x80862d5]
18: /usr/bin/X (0x8048000+0x40437) [0x8088437]
19: /usr/bin/X (0x8048000+0x1a705) [0x8062705]
20: /lib/libc.so.6 (__libc_start_main+0xe6) [0xb73f3c76]
21: /usr/bin/X (0x8048000+0x1a2f1) [0x80622f1]
Comment 1 Chris Wilson 2010-06-10 03:00:23 UTC
Different signal, but this looks remarkably similar to bug 27313 and the patch there should fix the crash.

Though I am actually more interested in knowing which path provoked the fallback.
Comment 2 mutlu 2010-06-10 14:31:02 UTC
Chris, I could patch the driver and see if the (albeit rare) crashes disappear. However, you mention that you are more interested in what triggers the crash. Could you point me to what I could do to be of most help to you?
Comment 3 Chris Wilson 2010-07-01 01:37:43 UTC
This turns out to be another page-fault-of-doom...

08:57 < ohsix> hrm
08:57 < ohsix>             83118.00 - 62.1% : drm_clflush_pages        [drm]
08:57 < ohsix> doesn't look right heh; (from perf top)
08:57 < ohsix>            206911.00 - 64.2% : drm_clflush_pages        [drm]
08:57 < ohsix>             59009.00 - 18.3% : read_hpet
08:58 < ohsix> rebooted back with the "old" kernel and its doing that cpu burning freeze that it did last night
09:03 < ohsix> [ 31003.366] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
Comment 4 Chris Wilson 2010-07-01 06:51:21 UTC
*** Bug 28471 has been marked as a duplicate of this bug. ***
Comment 5 mutlu 2010-07-13 06:54:47 UTC
Chris, I tried the patch you suggested in comment #c1 and did not experience the lockup afterwards. By now I have moved on to the 1.8 release (@ 1.8.1.902 at the moment) with a 2.6.34 kernel. No lockups since.

Should I close this? Unfortunately, I don't know ohsix's setup.
Comment 6 Chris Wilson 2010-07-13 07:09:49 UTC
Thanks for testing, I was starting to think this was the page-fault-of-doom and not a simple buffer overrun.  We can't close this just yet as the patch hasn't been included upstream, since under review this patch should have no effect. Yet it obviously does...
Comment 7 Chris Wilson 2010-08-08 12:11:19 UTC
Yes! A much better fix is finally on its way upstream!


commit e2bf07fe23fd11a2acba609bf34ccc59c5553389
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Aug 7 11:01:24 2010 +0100

    drm/i915: Implement fair lru eviction across both rings. (v2)
    
    Based in a large part upon Daniel Vetter's implementation and adapted
    for handling multiple rings in a single pass.
    
    This should lead to better gtt usage and fixes the page-fault-of-doom
    triggered. The fairness is provided by scanning through the GTT space
    amalgamating space in rendering order. As soon as we have a contiguous
    space in the GTT large enough for the new object (and its alignment),
    evict any object which lies within that space. This should keep more
    objects resident in the GTT.
    
    Doing throughput testing on a PineView machine with cairo-perf-trace
    indicates that there is very little difference with the new LRU scan,
    perhaps a small improvement... Except oddly for the poppler trace.
    
    Reference:
    
      Bug 15911 - Intermittent X crash (freeze)
      https://bugzilla.kernel.org/show_bug.cgi?id=15911
    
      Bug 20152 - cannot view JPG in firefox when running UXA
      https://bugs.freedesktop.org/show_bug.cgi?id=20152
    
      Bug 24369 - Hang when scrolling firefox page with window in front
      https://bugs.freedesktop.org/show_bug.cgi?id=24369
    
      Bug 28478 - Intermittent graphics lockups due to overflow/loop
      https://bugs.freedesktop.org/show_bug.cgi?id=28478
    
    v2: Attempt to clarify the logic and order of eviction through the use
    of comments and macros.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
    Signed-off-by: Eric Anholt <eric@anholt.net>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.