Bug 29453 - [Pineview] X crash under composite WM, after gltestperf
[Pineview] X crash under composite WM, after gltestperf
Status: VERIFIED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915
unspecified
All Linux (All)
: high critical
Assigned To: Chris Wilson
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-08 20:11 UTC by Yi Sun
Modified: 2011-01-05 16:30 UTC (History)
0 users

See Also:


Attachments
xorg.0.log (39.72 KB, text/plain)
2010-08-08 20:12 UTC, Yi Sun
Details
dmesg information (38.83 KB, text/plain)
2010-08-30 19:23 UTC, Yi Sun
Details
valgrind X (23.21 KB, application/x-zip-compressed)
2010-08-30 19:29 UTC, Yi Sun
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yi Sun 2010-08-08 20:11:26 UTC
System Environment:
--------------------------
Kernel: v2.6.35
Mesa:           (master)d38afcd2f286e924e0f9b7f484712ac19e3f98fc

Xserver:       (master)xorg-server-1.8.99.905-12-g8d7b7a0d71e0b89321b3341b781bc8845386def6
Xf86_video_intel:   (master)2.12.0-65g6304cb048c745be81dae13f1d936996e04eaa530

Bug detailed description:
-------------------------
After running gltestperf,we start gnome. So long as we enable compiz, the gnome crashes.



Reproduce steps:
----------------
1.xinit&
2. /mesa/demos/gltestperf
3. gnome-session
4. enable compiz
Comment 1 Yi Sun 2010-08-08 20:12:21 UTC
Created attachment 37711 [details]
xorg.0.log
Comment 2 Gordon Jin 2010-08-09 00:03:17 UTC
The previous story about gltestperf was bug#28296
Comment 3 Gordon Jin 2010-08-15 19:03:10 UTC
Eric, can you reproduce?
Comment 4 Eric Anholt 2010-08-17 15:15:01 UTC
Generally gltestperf produces really degenerate rendering that triggers the hangcheck timer without actually hanging the gpu.  That may be cascading into other failures here.
Comment 5 Chris Wilson 2010-08-20 02:25:16 UTC
I've tried to reproduce this on my pineview with xorg-edgers + drm-intel-next+ [i.e. with my pending patch series on top of drm-intel-next on top of 2.6.36-rc1], nothing. Not even a squeak in dmesg or Xorg.log.
Comment 6 Chris Wilson 2010-08-20 02:32:20 UTC
Works similarly with stock 2.6.35 from xorg-edgers.
Comment 7 Chris Wilson 2010-08-20 02:46:07 UTC
I've verified that xorg-edgers is a clean build of xf86-video-intel 8b94b35 i.e. page-flips are enabled.

sunyi can you verify the failure still occurs on your test platform? The likely suspects are the xserver [potential invalid dri drawable dereference] or xf86-video-intel [logic error]. I don't think it is related to a GPU hang.
Comment 8 Yi Sun 2010-08-23 19:27:40 UTC
I double check it with the latest X, indeed, the issue is that X crash but not Gpu hang.

When X crash,the backtrace is as following:
Backtrace:
0: X (xorg_backtrace+0x3b) [0x809a67b]
1: X (mieqEnqueue+0x1ab) [0x80994eb]
2: X (xf86PostKeyEventP+0x7c) [0x80abebc]
3: X (xf86PostKeyboardEvent+0x4b) [0x80abf8b]
4: /opt/X11R7/lib/xorg/modules/input/evdev_drv.so (0xb7244000+0x352a) [0xb724752
a]
5: X (0x8048000+0x79c2f) [0x80c1c2f]
6: X (0x8048000+0x118bc4) [0x8160bc4]
7: (vdso) (__kernel_sigreturn+0x0) [0xb7716400]
8: /lib/libc.so.6 (0x4aa30000+0x67dfe) [0x4aa97dfe]
9: /lib/libc.so.6 (0x4aa30000+0x6e261) [0x4aa9e261]
10: /lib/libc.so.6 (0x4aa30000+0x70a65) [0x4aaa0a65]
11: /opt/X11R7/lib/libdrm.so.2 (drmFree+0x24) [0xb75e977c]
12: /opt/X11R7/lib/libdrm.so.2 (drmModeGetPropertyBlob+0xf5) [0xb75ef8f0]
13: /opt/X11R7/lib/xorg/modules/drivers/intel_drv.so (0xb7594000+0x87fb) [0xb759
c7fb]
14: X (xf86ProbeOutputModes+0x226) [0x80cff96]
15: X (0x8048000+0x131590) [0x8179590]
16: X (RRGetInfo+0xa9) [0x8105cd9]
17: X (ProcRRGetScreenInfo+0x92) [0x8106f82]
18: X (0x8048000+0xb77b5) [0x80ff7b5]
19: X (0x8048000+0x264e7) [0x806e4e7]
20: X (0x8048000+0x1a28a) [0x806228a]
21: /lib/libc.so.6 (__libc_start_main+0xe6) [0x4aa46bb6]
22: X (0x8048000+0x19e61) [0x8061e61]
Comment 9 Chris Wilson 2010-08-24 13:28:11 UTC
The only drmFree() is on the error path within drmModeGetPropertyBlob() and looks valid. So memory corruption? That might also explain what upset the kernel to trigger the error path as well, any clues in dmesg?

valgrinding X is worth a shot, though it might disturb the timings sufficient to hide the cause.
Comment 10 Yi Sun 2010-08-30 19:23:26 UTC
Created attachment 38325 [details]
dmesg information
Comment 11 Yi Sun 2010-08-30 19:29:37 UTC
Created attachment 38326 [details]
valgrind X
Comment 12 Chris Wilson 2010-09-10 07:39:48 UTC
commit 0515256490d5bcd55f85af83b84918d1bfe7f8f8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Sep 10 00:08:58 2010 +0100

    display: Free the EDID blob after we copy it to the output, not before.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 13 Yi Sun 2011-01-05 16:30:01 UTC
The issue is gone now. So verified it.