Bug 25470

Summary: [i845G] SIGBUS in Intel driver during Firefox usage
Product: xorg Reporter: Daniel Richard G. <skunk>
Component: Driver/intelAssignee: Carl Worth <cworth>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: chris, gomyhr
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
X server log file none

Description Daniel Richard G. 2009-12-05 22:43:56 UTC
Created attachment 31778 [details]
X server log file

Using Firefox, hit Enter in the address bar, X server crashes.

Interesting excerpt from Xorg.log.0.old (attached):

Backtrace:
0: /usr/bin/X(xorg_backtrace+0x3b) [0x8133dbb]
1: /usr/bin/X(xf86SigHandler+0x55) [0x80c1395]
2: [0x3c8400]
3: /usr/lib/xorg/modules/drivers//intel_drv.so [0x2fb574]
4: /usr/bin/X [0x8181800]
5: /usr/bin/X(ProcPutImage+0x159) [0x808a4c9]
6: /usr/bin/X(Dispatch+0x35f) [0x808d1af]
7: /usr/bin/X(main+0x395) [0x8072515]
8: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x3dfb56]
9: /usr/bin/X [0x80719c1]
Saw signal 7.  Server aborting.

This is the same system as the one in

    https://bugs.launchpad.net/bugs/484020

which has batchbuffer dump tarballs, should those contain useful information.

I'm not familiar with Xorg/Intel driver debugging. The only thing I've enabled here is INTEL_DEBUG=batch to help track down that Ubuntu bug. I'd be happy to enable other debugging options on this system, but I don't what those would be, or which ones are applicable to this bug.
Comment 1 Geir Ove Myhr 2009-12-06 06:08:05 UTC
The backtrace in Xorg.0.log is just a short "summary" backtrace. A backtrace tells what the program is up to at the time of the crash In order to get a full backtrace, which includes the values of important variables and the line numbers in the source code, you need to install some debug packages and run gdb. There is some explanation for how to do this in Ubuntu at https://wiki.ubuntu.com/X/Backtracing . The page is a  bit messy, since there are many ways of doing getting the backtrace and all methods don't work in all cases. My personal favourite is to generate a core file (see post-mortem section) and run gdb on that afterwards.
Comment 2 Daniel Richard G. 2009-12-06 16:56:30 UTC
Geir, thanks for that link. I have a full backtrace:

Core was generated by `/usr/bin/X :0 -br -verbose -auth /var/run/gdm/auth-for-gdm-A7gMYe/database -nol'.
Program terminated with signal 7, Bus error.
#0  0x002e5006 in memcpy () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0  0x002e5006 in memcpy () from /lib/tls/i686/cmov/libc.so.6
#1  0x00000001 in ?? ()
#2  0x001c1574 in uxa_do_put_image (pDrawable=0x9e6e968, pGC=0x9f10c90, 
    depth=24, x=0, y=0, w=48, h=36, leftPad=0, format=2, 
    bits=0x9f34088 "4Fe\377\066Hg\377\067Kj\377\070Mm\377\067Nn\377\067Pr\377\071Rt\377\071Tv\377<Py\377\071Rz\377\066Sz\377\063Ry\377\061Ry\377\062Sz\377\065S|\377\070S\177\377%Tz\377;X\177\377+6V\377\f\017\036\377\a\016\v\377\016\030\f\377\t\020\t\377\t\b\n\377\v\016\022\377\f\024\033\377\036+9\377;Pf\377Jg\206\377Bh\222\377\067c\222\377\062b\226\377:j\216\377=h\217\377@e\221\377Lv\241\377\064d\216\377<k\221\377Ci\214\377So\221\377Al\227\377Am\226\377Fo\226\377Jq\227\377Kp\222\377Mq\217\377Ww\224\377_\200\232\377\063Ed\377\066Hg\377"...)
    at ../../uxa/uxa-accel.c:173
#3  uxa_do_shm_put_image (pDrawable=0x9e6e968, pGC=0x9f10c90, depth=24, x=0, 
    y=0, w=48, h=36, leftPad=0, format=2, 
    bits=0x9f34088 "4Fe\377\066Hg\377\067Kj\377\070Mm\377\067Nn\377\067Pr\377\071Rt\377\071Tv\377<Py\377\071Rz\377\066Sz\377\063Ry\377\061Ry\377\062Sz\377\065S|\377\070S\177\377%Tz\377;X\177\377+6V\377\f\017\036\377\a\016\v\377\016\030\f\377\t\020\t\377\t\b\n\377\v\016\022\377\f\024\033\377\036+9\377;Pf\377Jg\206\377Bh\222\377\067c\222\377\062b\226\377:j\216\377=h\217\377@e\221\377Lv\241\377\064d\216\377<k\221\377Ci\214\377So\221\377Al\227\377Am\226\377Fo\226\377Jq\227\377Kp\222\377Mq\217\377Ww\224\377_\200\232\377\063Ed\377\066Hg\377"...)
    at ../../uxa/uxa-accel.c:218
#4  uxa_put_image (pDrawable=0x9e6e968, pGC=0x9f10c90, depth=24, x=0, y=0, 
    w=48, h=36, leftPad=0, format=2, 
    bits=0x9f34088 "4Fe\377\066Hg\377\067Kj\377\070Mm\377\067Nn\377\067Pr\377\071Rt\377\071Tv\377<Py\377\071Rz\377\066Sz\377\063Ry\377\061Ry\377\062Sz\377\065S|\377\070S\177\377%Tz\377;X\177\377+6V\377\f\017\036\377\a\016\v\377\016\030\f\377\t\020\t\377\t\b\n\377\v\016\022\377\f\024\033\377\036+9\377;Pf\377Jg\206\377Bh\222\377\067c\222\377\062b\226\377:j\216\377=h\217\377@e\221\377Lv\241\377\064d\216\377<k\221\377Ci\214\377So\221\377Al\227\377Am\226\377Fo\226\377Jq\227\377Kp\222\377Mq\217\377Ww\224\377_\200\232\377\063Ed\377\066Hg\377"...)
    at ../../uxa/uxa-accel.c:287
#5  0x08181800 in damagePutImage (pDrawable=0x9e6e968, pGC=0x9f10c90, 
    depth=24, x=0, y=0, w=48, h=36, leftPad=0, format=2, 
    pImage=0x9f34088 "4Fe\377\066Hg\377\067Kj\377\070Mm\377\067Nn\377\067Pr\377\071Rt\377\071Tv\377<Py\377\071Rz\377\066Sz\377\063Ry\377\061Ry\377\062Sz\377\065S|\377\070S\177\377%Tz\377;X\177\377+6V\377\f\017\036\377\a\016\v\377\016\030\f\377\t\020\t\377\t\b\n\377\v\016\022\377\f\024\033\377\036+9\377;Pf\377Jg\206\377Bh\222\377\067c\222\377\062b\226\377:j\216\377=h\217\377@e\221\377Lv\241\377\064d\216\377<k\221\377Ci\214\377So\221\377Al\227\377Am\226\377Fo\226\377Jq\227\377Kp\222\377Mq\217\377Ww\224\377_\200\232\377\063Ed\377\066Hg\377"...)
    at ../../../miext/damage/damage.c:905
#6  0x0808a4c9 in ProcPutImage (client=0x9e5a6c8) at ../../dix/dispatch.c:1917
#7  0x0808d1af in Dispatch () at ../../dix/dispatch.c:456
#8  0x08072515 in main (argc=9, argv=0xbfdccd14, envp=0xbfdccd3c)
    at ../../dix/main.c:397


Relevant package versions:

intel-gpu-tools  1.0.2-1~karmic6
libdrm-intel1  2.4.16~git20091203.db50f512-0ubuntu0sarvatt2~karmic
libdrm-intel1-dbg  2.4.16~git20091203.db50f512-0ubuntu0sarvatt2~karmic
libdrm-radeon1  2.4.16~git20091203.db50f512-0ubuntu0sarvatt2~karmic
libdrm2  2.4.16~git20091203.db50f512-0ubuntu0sarvatt2~karmic
libdrm2-dbg  2.4.16~git20091203.db50f512-0ubuntu0sarvatt2~karmic
xserver-common  2:1.6.5+git20091107+server-1.6-branch.2dbcb06a-0ubuntu0sarvatt~karmic
xserver-xephyr  2:1.6.5+git20091107+server-1.6-branch.2dbcb06a-0ubuntu0sarvatt~karmic
xserver-xorg-core  2:1.6.5+git20091107+server-1.6-branch.2dbcb06a-0ubuntu0sarvatt~karmic
xserver-xorg-core-dbg  2:1.6.5+git20091107+server-1.6-branch.2dbcb06a-0ubuntu0sarvatt~karmic
xserver-xorg-input-evdev  1:2.3.0+git20091107.a0f7f34d-0ubuntu0sarvatt4~karmic
xserver-xorg-input-synaptics  1:1.2.0+git20091107.e6b1a4ef-0ubuntu0sarvatt2~karmic
xserver-xorg-video-intel  2:2.9.99.901+git20091204.415aab47-0ubuntu0tormod~karmic
xserver-xorg-video-intel-dbg  2:2.9.99.901+git20091204.415aab47-0ubuntu0tormod~karmic

(These are from the Ubuntu xorg-edgers PPA)
Comment 3 Daniel Richard G. 2009-12-06 22:12:50 UTC
Same failure mode as before.

Core was generated by `/usr/bin/X :0 -br -verbose -auth /var/run/gdm/auth-for-gdm-Fj5Laa/database -nol'.
Program terminated with signal 7, Bus error.
#0  0x00185006 in memcpy () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0  0x00185006 in memcpy () from /lib/tls/i686/cmov/libc.so.6
#1  0x00000001 in ?? ()
#2  0x00b03574 in uxa_do_put_image (pDrawable=0xa598410, pGC=0xa54b9f0, 
    depth=24, x=0, y=0, w=80, h=60, leftPad=0, format=2, 
    bits=0xa5b380c "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"...) at ../../uxa/uxa-accel.c:173
#3  uxa_do_shm_put_image (pDrawable=0xa598410, pGC=0xa54b9f0, depth=24, x=0, 
    y=0, w=80, h=60, leftPad=0, format=2, 
    bits=0xa5b380cat ../../uxa/uxa-accel.c:218
#4  uxa_put_image (pDrawable=0xa598410, pGC=0xa54b9f0, depth=24, x=0, y=0, 
    w=80, h=60, leftPad=0, format=2, 
    bits=0xa5b380cat ../../uxa/uxa-accel.c:287
#5  0x08181800 in damagePutImage (pDrawable=0xa598410, pGC=0xa54b9f0, 
    depth=24, x=0, y=0, w=80, h=60, leftPad=0, format=2, 
    pImage=0xa5b380cat ../../../miext/damage/damage.c:905
#6  0x0808a4c9 in ProcPutImage (client=0xa3b4f50) at ../../dix/dispatch.c:1917
#7  0x0808d1af in Dispatch () at ../../dix/dispatch.c:456
#8  0x08072515 in main (argc=9, argv=0xbfd2bf54, envp=0xbfd2bf7c)
    at ../../dix/main.c:397

The above crash occurred after the following updates:

2009-12-06 19:59:08 status installed libdrm2 2.4.16+git20091206.9707733a-0ubuntu0sarvatt~karmic
2009-12-06 19:59:08 status installed libdrm-intel1 2.4.16+git20091206.9707733a-0ubuntu0sarvatt~karmic
2009-12-06 19:59:08 status installed libdrm-intel1-dbg 2.4.16+git20091206.9707733a-0ubuntu0sarvatt~karmic
2009-12-06 19:59:08 status installed libdrm2-dbg 2.4.16+git20091206.9707733a-0ubuntu0sarvatt~karmic
2009-12-06 19:59:08 status installed libdrm-radeon1 2.4.16+git20091206.9707733a-0ubuntu0sarvatt~karmic
Comment 4 Daniel Richard G. 2009-12-06 22:46:41 UTC
Cc'ing Chris Wilson on this.

Chris, this appears to be the same issue with uxa_put_image() that you came across on the 2nd. Is there an existing report for this?
Comment 5 Chris Wilson 2009-12-07 01:22:49 UTC
Not strictly the bug I ran into recently, this is in fact an older one. The difference is SIGBUS vs SIGSEGV. This bug should be "fixed" with

commit c715089f49844260f1eeae8e3b55af9468ba1325
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 23 00:43:56 2009 +0100

    drm/i915: Handle ERESTARTSYS during page fault
    
    During a page fault and rebinding the buffer there exists a window for a
    signal to arrive during the i915_wait_request() and trigger a
    ERESTARTSYS. This used to be handled by returning SIGBUS and thereby
    killing the application. Try 'cairo-perf-trace & cairo-test-suite' and
    watch X go boom!
    
    The solution as suggested by H. Peter Anvin is to simply return NOPAGE and
    leave the higher layers to spot we did not fill the page and resubmit
    the page fault.
    
And the for the common cases we now hit an "accelerated" PutImage which will bypass this fallback:

commit 19d8c0cf50e98909c533ebfce3a0dd3f72b755c1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Nov 29 21:16:49 2009 +0000

    uxa: PutImage acceleration
    
    Avoid waiting on dirty buffer object by streaming the upload to a fresh,
    non-GPU hot buffer and blitting to the destination.
    
    This should help to redress the regression reported in bug 18075:
    
      [UXA] XPutImage performance regression
      https://bugs.freedesktop.org/show_bug.cgi?id=18075
    
    Using the particular synthetic benchmark in question on a g45:
    
    Before:
       9542.910448 Ops/s; put composition (!); 15x15
       5623.271889 Ops/s; put composition (!); 75x75
       1685.520362 Ops/s; put composition (!); 250x250
    
    After:
      40173.865300 Ops/s; put composition (!); 15x15
      28670.280612 Ops/s; put composition (!); 75x75
       4794.368601 Ops/s; put composition (!); 250x250
    
    which while not stellar performance is at least an improvement. As
    anticipated this has little impact on the non-fallback RENDER paths, for
    instance the current cairo-xlib backend is unaffected by this change.
    
And of course there were a few bugs in libdrm that prevented correct error propagation after a failure to map:

commit acb4aa671507aa181b3ff50ccf26a1c0d705a309
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 2 12:40:26 2009 +0000

    intel: Review use of errno.
    
    Hitting this error lead to a segfault:
    
      intel_bufmgr_gem.c:919: Error mapping buffer 48607 (pixmap):
                              Cannot allocate memory.
    
    because the errno was reused as the function return value after being
    reset by the fprintf(), so caller thought the mapping had succeeded. The
    convention established by libdrm is that the return value is the
    negative errno and that uses of libdrm cannot trust the value of errno
    afterwards, but must use the return code.

So I *believe* this bug to be fixed, even though I'm sure I've not found all the corner cases that are causing errors elsewhere...
Comment 6 Daniel Richard G. 2009-12-07 07:31:01 UTC
Er... you're aware I'm getting the SIGBUS with bleeding-edge code, yes? I'm using packages compiled by a third party, but the top of the changelog shows

    commit 415aab474edd1425034981306718afd8506445f1
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Fri Dec 4 09:02:36 2009 +0000

I can reproduce this crash fairly easily. I'd be happy to instrument things to help track down what's going on, but I don't know what additional instrumentation/information would be useful in this case.
Comment 7 Chris Wilson 2009-12-07 13:10:26 UTC
Missed the recent version of xf86-video-intel amongst the list, but are you running a 2.6.32 kernel? As the one known cause of SIGBUS has been "fixed" 
with

commit c715089f49844260f1eeae8e3b55af9468ba1325
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 23 00:43:56 2009 +0100

    drm/i915: Handle ERESTARTSYS during page fault
Comment 8 Daniel Richard G. 2009-12-08 07:32:18 UTC
At the time, I was running Ubuntu's patched 2.6.31 kernel. I've upgraded to a
mainline 2.6.32 build, and have been unable to reproduce the SIGBUS (or any
other form of crash) so far.

Thanks for the clarification! I'll report back if anything untoward crops up
again.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.