Bug 38614 - frequent X crashes with SNA
Summary: frequent X crashes with SNA
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Chris Wilson
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-23 12:27 UTC by nkalkhof
Modified: 2011-06-25 06:31 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
xorg.log (58.29 KB, text/plain)
2011-06-23 12:27 UTC, nkalkhof
no flags Details

Description nkalkhof 2011-06-23 12:27:43 UTC
Created attachment 48354 [details]
xorg.log

Hello,

I've experienced frequent X crashes with latest xf86-video-intel compiled with --enable-sna and kernel 3.0-rc4-next-20110623. Kernel also contains patches from this ticket: https://bugs.freedesktop.org/show_bug.cgi?id=38529

Crashes are not reproducable but seem to occur every time I work with eclipse.

xorg.log spits out something like this:

[   818.268] 0: /usr/bin/X (xorg_backtrace+0x28) [0x496a54]
[   818.268] 1: /usr/bin/X (0x400000+0x5d589) [0x45d589]
[   818.268] 2: /lib/libpthread.so.0 (0x7fadc310b000+0xf4c0) [0x7fadc311a4c0]
[   818.268] 3: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fadc0238000+0x47645) [0x7fadc027f645]
[   818.268] 4: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fadc0238000+0x3d0c2) [0x7fadc02750c2]
[   818.268] 5: /usr/bin/X (0x400000+0xcd3ab) [0x4cd3ab]
[   818.268] 6: /usr/lib64/xorg/modules/extensions/libextmod.so (0x7fadc0efe000+0xba15) [0x7fadc0f09a15]
[   818.268] 7: /usr/bin/X (0x400000+0xab347) [0x4ab347]
[   818.268] 8: /usr/bin/X (FreeResource+0x101) [0x44956d]
[   818.268] 9: /usr/bin/X (0x400000+0x2b576) [0x42b576]
[   818.268] 10: /usr/bin/X (0x400000+0x2f412) [0x42f412]
[   818.268] 11: /usr/bin/X (0x400000+0x2490a) [0x42490a]
[   818.268] 12: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7fadc205eebd]
[   818.268] 13: /usr/bin/X (0x400000+0x244d9) [0x4244d9]
[   818.268] Segmentation fault at address 0xff35355d

Any Ideas?

Thanks and Regards,
Nic
Comment 1 Chris Wilson 2011-06-23 12:36:02 UTC
At a guess from the sparse backtrace, it looks like the GLX refcounting misery that Adam Jackson fixed in the xserver.

After updating your xserver, can you attach gdb or pass -core and then grab the symbols for the backtrace.
Comment 2 nkalkhof 2011-06-24 00:14:40 UTC
(In reply to comment #1)
> At a guess from the sparse backtrace, it looks like the GLX refcounting misery
> that Adam Jackson fixed in the xserver.
> 
> After updating your xserver, can you attach gdb or pass -core and then grab the
> symbols for the backtrace.

Hi Chris,

current xorg-server-9999 dev on my gentoo box seems to be broken. Could you point me to the GLX refcount fix, please (lkml or bugzilla link)?

Thanks
Nic
Comment 3 Chris Wilson 2011-06-24 00:35:35 UTC
It was in the middle of a series of fixes by Adam Jackson:

commit 6a433b67ca15fd1ea58334e607f867554f227451
Author: Adam Jackson <ajax@redhat.com>
Date:   Mon Mar 28 12:30:09 2011 -0400

    glx: Fix lifetime tracking for pixmaps
    
    GLX pixmaps take a reference on the underlying pixmap; X and GLX pixmap
    IDs can be destroyed in either order with no error.  Only windows need
    to be tracked under both XIDs.
    
    Fixes piglit/glx-pixmap-life.
    
    Reviewed-by: Michel Dänzer <michel@daenzer.net>
    Signed-off-by: Adam Jackson <ajax@redhat.com>
Comment 4 nkalkhof 2011-06-24 00:54:51 UTC
(In reply to comment #3)
> It was in the middle of a series of fixes by Adam Jackson:
> 
> commit 6a433b67ca15fd1ea58334e607f867554f227451
> Author: Adam Jackson <ajax@redhat.com>
> Date:   Mon Mar 28 12:30:09 2011 -0400
> 
>     glx: Fix lifetime tracking for pixmaps
> 
>     GLX pixmaps take a reference on the underlying pixmap; X and GLX pixmap
>     IDs can be destroyed in either order with no error.  Only windows need
>     to be tracked under both XIDs.
> 
>     Fixes piglit/glx-pixmap-life.
> 
>     Reviewed-by: Michel Dänzer <michel@daenzer.net>
>     Signed-off-by: Adam Jackson <ajax@redhat.com>

(In reply to comment #3)
> It was in the middle of a series of fixes by Adam Jackson:
> 
> commit 6a433b67ca15fd1ea58334e607f867554f227451
> Author: Adam Jackson <ajax@redhat.com>
> Date:   Mon Mar 28 12:30:09 2011 -0400
> 
>     glx: Fix lifetime tracking for pixmaps
> 
>     GLX pixmaps take a reference on the underlying pixmap; X and GLX pixmap
>     IDs can be destroyed in either order with no error.  Only windows need
>     to be tracked under both XIDs.
> 
>     Fixes piglit/glx-pixmap-life.
> 
>     Reviewed-by: Michel Dänzer <michel@daenzer.net>
>     Signed-off-by: Adam Jackson <ajax@redhat.com>

Hi Chris,

ok this seems to be fixed in xorg-server 1.10.2 which is the current version on my system.

btw. top reports approx 200 MB less memory after firing up x with SNA enabled. Does SNA grab more shared memory for the gpu?

I'll try to attach a gdb session and see if I can get some more backtrace information.

Thx
Nic
Comment 5 Chris Wilson 2011-06-24 01:15:06 UTC
Right, onto the symbols then. The trace looks nigh on identical to the resource miscounting... ;-)

The amount of memory used is primarily dependent upon the number of pixmaps in use. GPU buffers are more or less invisible to top and only show up in the cached page count. I'd only start to worry if /sys/kernel/debug/dri/0/i915_gem_objects showed a leak. Otherwise the memory should be managed as normal by the kernel, swapping on demand.
Comment 6 nkalkhof 2011-06-24 05:30:09 UTC
(In reply to comment #5)
> Right, onto the symbols then. The trace looks nigh on identical to the resource
> miscounting... ;-)
> 
> The amount of memory used is primarily dependent upon the number of pixmaps in
> use. GPU buffers are more or less invisible to top and only show up in the
> cached page count. I'd only start to worry if
> /sys/kernel/debug/dri/0/i915_gem_objects showed a leak. Otherwise the memory
> should be managed as normal by the kernel, swapping on demand.

Hi Chris,

well this is kinda embarrasing but the attached gdb session reveiled only one line of usable information after X craped out (see below). Is there a way to tell gdb to spit out some more information? 8-)

Thx
Nic

(gdb) attach 18125
Attaching to process 18125
Reading symbols from /usr/bin/Xorg...Reading symbols from /usr/lib64/debug/usr/bin/Xorg.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Reading symbols from /lib/libudev.so.0...Reading symbols from /usr/lib64/debug/lib64/libudev.so.0.11.5.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib/libudev.so.0
Reading symbols from /usr/lib/libgcrypt.so.11...Reading symbols from /usr/lib64/debug/usr/lib64/libgcrypt.so.11.7.0.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgcrypt.so.11
Reading symbols from /lib/libdl.so.2...Reading symbols from /usr/lib64/debug/lib64/libdl-2.13.so.debug...done.
done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libpciaccess.so.0...Reading symbols from /usr/lib64/debug/usr/lib64/libpciaccess.so.0.10.8.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libpciaccess.so.0
Reading symbols from /lib/libpthread.so.0...Reading symbols from /usr/lib64/debug/lib64/libpthread-2.13.so.debug...done.
[Thread debugging using libthread_db enabled]
done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /usr/lib/libpixman-1.so.0...Reading symbols from /usr/lib64/debug/usr/lib64/libpixman-1.so.0.22.0.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libpixman-1.so.0
Reading symbols from /usr/lib/libXfont.so.1...Reading symbols from /usr/lib64/debug/usr/lib64/libXfont.so.1.4.1.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libXfont.so.1
Reading symbols from /usr/lib/libXau.so.6...Reading symbols from /usr/lib64/debug/usr/lib64/libXau.so.6.0.0.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libXau.so.6
Reading symbols from /usr/lib/libXdmcp.so.6...Reading symbols from /usr/lib64/debug/usr/lib64/libXdmcp.so.6.0.0.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libXdmcp.so.6
Reading symbols from /lib/libm.so.6...Reading symbols from /usr/lib64/debug/lib64/libm-2.13.so.debug...done.
done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/librt.so.1...Reading symbols from /usr/lib64/debug/lib64/librt-2.13.so.debug...done.
done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/libc.so.6...Reading symbols from /usr/lib64/debug/lib64/libc-2.13.so.debug...done.
done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/lib/libgpg-error.so.0...Reading symbols from /usr/lib64/debug/usr/lib64/libgpg-error.so.0.8.0.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgpg-error.so.0
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib64/debug/lib64/ld-2.13.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libz.so.1...Reading symbols from /usr/lib64/debug/lib64/libz.so.1.2.5.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /usr/lib/libfreetype.so.6...Reading symbols from /usr/lib64/debug/usr/lib64/libfreetype.so.6.6.2.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libfreetype.so.6
Reading symbols from /lib/libbz2.so.1...Reading symbols from /usr/lib64/debug/lib64/libbz2.so.1.0.6.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib/libbz2.so.1
Reading symbols from /usr/lib/libfontenc.so.1...Reading symbols from /usr/lib64/debug/usr/lib64/libfontenc.so.1.0.0.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libfontenc.so.1
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /usr/lib64/xorg/modules/extensions/libglx.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/xorg/modules/extensions/libglx.so
Reading symbols from /usr/lib64/xorg/modules/extensions/libextmod.so...Reading symbols from /usr/lib64/debug/usr/lib64/xorg/modules/extensions/libextmod.so.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/xorg/modules/extensions/libextmod.so
Reading symbols from /usr/lib64/xorg/modules/extensions/libdbe.so...Reading symbols from /usr/lib64/debug/usr/lib64/xorg/modules/extensions/libdbe.so.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/xorg/modules/extensions/libdbe.so
Reading symbols from /usr/lib64/xorg/modules/extensions/librecord.so...Reading symbols from /usr/lib64/debug/usr/lib64/xorg/modules/extensions/librecord.so.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/xorg/modules/extensions/librecord.so
Reading symbols from /usr/lib64/xorg/modules/extensions/libdri.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/xorg/modules/extensions/libdri.so
Reading symbols from /usr/lib/libdrm.so.2...Reading symbols from /usr/lib64/debug/usr/lib64/libdrm.so.2.4.0.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libdrm.so.2
Reading symbols from /usr/lib64/xorg/modules/extensions/libdri2.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/xorg/modules/extensions/libdri2.so
Reading symbols from /usr/lib64/xorg/modules/drivers/intel_drv.so...Reading symbols from /usr/lib64/debug/usr/lib64/xorg/modules/drivers/intel_drv.so.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/xorg/modules/drivers/intel_drv.so
Reading symbols from /usr/lib/libdrm_intel.so.1...Reading symbols from /usr/lib64/debug/usr/lib64/libdrm_intel.so.1.0.0.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libdrm_intel.so.1
Reading symbols from /usr/lib64/xorg/modules/libfb.so...Reading symbols from /usr/lib64/debug/usr/lib64/xorg/modules/libfb.so.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/xorg/modules/libfb.so
Reading symbols from /usr/lib64/dri/i965_dri.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/dri/i965_dri.so
Reading symbols from /usr/lib/libexpat.so.1...Reading symbols from /usr/lib64/debug/usr/lib64/libexpat.so.1.5.2.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libexpat.so.1
Reading symbols from /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/libstdc++.so.6...Reading symbols from /usr/lib64/debug/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/libstdc++.so.6.0.14.debug...done.
done.
Loaded symbols for /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/libstdc++.so.6
Reading symbols from /usr/lib64/xorg/modules/input/evdev_drv.so...Reading symbols from /usr/lib64/debug/usr/lib64/xorg/modules/input/evdev_drv.so.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/xorg/modules/input/evdev_drv.so
Reading symbols from /usr/lib64/xorg/modules/input/synaptics_drv.so...Reading symbols from /usr/lib64/debug/usr/lib64/xorg/modules/input/synaptics_drv.so.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/xorg/modules/input/synaptics_drv.so
0x00007f91b7e0ed13 in __select_nocancel ()
    at ../sysdeps/unix/syscall-template.S:82
	in ../sysdeps/unix/syscall-template.S
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x000000000043e8b2 in SetClipRects ()
(gdb) Detaching from program: /usr/bin/Xorg, process 18125
Comment 7 Chris Wilson 2011-06-24 05:49:33 UTC
Type "bt" (backtrace).

Can you please also make sure you sna is up-to-date and includes:

commit 58d7a89b93ba4022f45465e479d2799b8903137a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 24 00:35:30 2011 +0100

    sna: Don't render to invalid surfaces
    
    Fixes a regression from d0362a. In bypassing the is_wedged checked, we
    also ended up bypassing the checks that we could indeed render to the
    target bo. With the result that we were creating GPU buffers for SHM
    surfaces, something that requires Xserver fixes before we can actually
    enable...
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 8 nkalkhof 2011-06-24 06:26:27 UTC
(In reply to comment #7)
> Type "bt" (backtrace).
> 
> Can you please also make sure you sna is up-to-date and includes:
> 
> commit 58d7a89b93ba4022f45465e479d2799b8903137a
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Jun 24 00:35:30 2011 +0100
> 
>     sna: Don't render to invalid surfaces
> 
>     Fixes a regression from d0362a. In bypassing the is_wedged checked, we
>     also ended up bypassing the checks that we could indeed render to the
>     target bo. With the result that we were creating GPU buffers for SHM
>     surfaces, something that requires Xserver fixes before we can actually
>     enable...
> 
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


Hi Chris,

ok, I've used xf86-video-intel git checked out 2011-06-24 14:55:32 (GMT) from mater branch with --enable-sna. X still drops dead when I work with eclipse gui.

gdb is a little more cooperative with the bt command :)

(gdb) attach 20494
Attaching to process 20494
......
Loaded symbols for /usr/lib64/xorg/modules/input/synaptics_drv.so
0x00007f1b7fa9bd13 in __select_nocancel ()
    at ../sysdeps/unix/syscall-template.S:82
	in ../sysdeps/unix/syscall-template.S
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007f1b808370ce in pixman_region_fini () from /usr/lib/libpixman-1.so.0
(gdb) bt
#0  0x00007f1b808370ce in pixman_region_fini () from /usr/lib/libpixman-1.so.0
#1  0x00007f1b7dc09620 in __sna_damage_destroy ()
   from /usr/lib64/xorg/modules/drivers/intel_drv.so
#2  0x00007f1b7dbff060 in sna_destroy_pixmap ()
   from /usr/lib64/xorg/modules/drivers/intel_drv.so
#3  0x00000000004cd3ab in damageDestroyPixmap ()
#4  0x00007f1b7e894a15 in XvDestroyPixmap ()
   from /usr/lib64/xorg/modules/extensions/libextmod.so
#5  0x00000000004ab347 in ShmDestroyPixmap ()
#6  0x0000000000449ba0 in FreeClientResources ()
#7  0x000000000042e988 in CloseDownClient ()
#8  0x000000000042f437 in Dispatch ()
#9  0x000000000042490a in main ()
(gdb) detas     detach
Detaching from program: /usr/bin/Xorg, process 20494
(gdb)

btw. how do I expose /sys/kernel/debug/dri/* ???

Regards
Nic
Comment 9 Chris Wilson 2011-06-24 07:44:22 UTC
(In reply to comment #8)
> btw. how do I expose /sys/kernel/debug/dri/* ???

You need to compile CONFIG_DEBUGFS into your kernel and then "mount -tdebugfs debug /sys/kernel/debug"
Comment 10 nkalkhof 2011-06-24 08:04:13 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > btw. how do I expose /sys/kernel/debug/dri/* ???
> 
> You need to compile CONFIG_DEBUGFS into your kernel and then "mount -tdebugfs
> debug /sys/kernel/debug"

Hi Chris,

already did. However /sys/kernel/debug/ yields no "dri" node, just:
"acpi  bdi  hid  ieee80211  mce  mmc0  sched_features  usb  x86" :(

Nic
Comment 11 Chris Wilson 2011-06-25 05:11:11 UTC
I've installed eclipse on F15. As a novice user, how do I recreate the scenario that is most likely to reproduce your crash?
Comment 12 nkalkhof 2011-06-25 06:03:50 UTC
(In reply to comment #11)
> I've installed eclipse on F15. As a novice user, how do I recreate the scenario
> that is most likely to reproduce your crash?

Hi Chris,

I've managed to make X crash after a couple of minutes by intensive use of eclipse gui elements like opening dialogs, klicking around through tabs, compile a project (CDT) and view the console output, start eclipse update, change views, etc. Sometimes before X dies, the menu text in the eclipse pull down menu is incomplete, i.e. some characters missing.

Eclipse Version is 3.6.2
Build id: M20110210-1200 with latest CDT and ADT installed.

Maybe opening firefox and running mplayer-vaapi and an opengl game (line prboom-plus) in background like I did contributes to the crash I'm not sure.

Hope this helps.
Comment 13 Chris Wilson 2011-06-25 06:06:26 UTC
--enable-debug=full reveals all. So eclipse did crash on me, and has not yet
since spotting this typo...

commit 3833ff967766b0b99f1d636c6453de1783a90586
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 25 14:02:50 2011 +0100

    sna: Correct typo in computing damage of PolyPoint

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38614
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.