git checkout ff91fd4a2749f1ea8722189aaec938e51bce4222 xorg-x11-server-Xorg-1.12.2-1.fc18.x86_64 While I'd tried to seek in some flash played video I've got this Xorg crash. I've relatively recently renabled hw acceleration usage for it. (gdb) bt #0 0x00007f6ac2f8fc25 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:63 #1 0x00007f6ac2f913d8 in __GI_abort () at abort.c:90 #2 0x00007f6ac2f88a52 in __assert_fail_base (fmt=0x7f6ac30d6388 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f6ac1381010 "damage->extents.x1 == region->extents.x1 && damage->extents.y1 == region->extents.y1 && damage->extents.x2 == region->extents.x2 && damage->extents.y2 == region->extents.y2", file=file@entry=0x7f6ac1380f93 "sna_damage.c", line=line@entry=502, function=function@entry=0x7f6ac13812a0 "__sna_damage_reduce") at assert.c:92 #3 0x00007f6ac2f88b02 in __GI___assert_fail ( assertion=assertion@entry=0x7f6ac1381010 "damage->extents.x1 == region->extents.x1 && damage->extents.y1 == region->extents.y1 && damage->extents.x2 == region->extents.x2 && damage->extents.y2 == region->extents.y2", file=file@entry=0x7f6ac1380f93 "sna_damage.c", line=line@entry=502, function=function@entry=0x7f6ac13812a0 "__sna_damage_reduce") at assert.c:101 #4 0x00007f6ac12f2fa6 in __sna_damage_reduce (damage=damage@entry=0x1e7d340) at sna_damage.c:499 #5 0x00007f6ac12f46f8 in __sna_damage_subtract (region=<optimized out>, damage=0x1e7d340) at sna_damage.c:1021 #6 _sna_damage_subtract (damage=0x1e7d340, region=region@entry=0x7fffde01a310) at sna_damage.c:1070 #7 0x00007f6ac12e60ad in sna_damage_subtract (region=0x7fffde01a310, damage=0x39d7a68) at sna_damage.h:156 #8 sna_put_zpixmap_blt (stride=<optimized out>, bits=0x2531468 "[l\245\377Zk\243\377Yj\242\377[i\240\377Yh\236\377We\233\377S^\225\377NW\215\377EN\204\377=F{\377\065:p\377,1f\377#)^\377\036\"X\377\035\035U\377\034\034S\377\033\033S\377\037\035V\377\"\036Y\377$!\\\377%!]\377&!]\377'!]\377'!]\377(\"^\377)#`\377*%b\377+&c\377,(e\377/*h\377\062.k\377;9t\377CB}\377@?z\377<;u\377\063\065m\377-/g\377(*b\377(*_\377(+^\377+/`\377.2c\377\060\064d\377\063\070g\377\066<j\377\070>m\377\071@o\377:Ap\377<Bq\377;Aq\377"..., h=63, w=63, y=179, x=254, region=0x7fffde01a310, gc=0x39980e0, drawable=0x1f95730) at sna_accel.c:2935 #9 sna_put_image (drawable=0x1f95730, gc=0x39980e0, depth=32, x=254, y=179, w=63, h=63, left=0, format=2, bits=0x2531468 "[l\245\377Zk\243\377Yj\242\377[i\240\377Yh\236\377We\233\377S^\225\377NW\215\377EN\204\377=F{\377\065:p\377,1f\377#)^\377\036\"X\377\035\035U\377\034\034S\377\033\033S\377\037\035V\377\"\036Y\377$!\\\377%!]\377&!]\377'!]\377'!]\377(\"^\377)#`\377*%b\377+&c\377,(e\377/*h\377\062.k\377;9t\377CB}\377@?z\377<;u\377\063\065m\377-/g\377(*b\377(*_\377(+^\377+/`\377.2c\377\060\064d\377\063\070g\377\066<j\377\070>m\377\071@o\377:Ap\377<Bq\377;Aq\377"...) at sna_accel.c:3321 #10 0x0000000000504e92 in ?? () #11 0x0000000000431033 in ?? () #12 0x000000000043442a in ?? () #13 0x0000000000423495 in ?? () #14 0x00007f6ac2f7aba5 in __libc_start_main (main=0x423120, argc=4, ubp_av=0x7fffde01a5f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffde01a5e8) at libc-start.c:225 print damage->extents $1 = {x1 = 0, y1 = 0, x2 = 319, y2 = 243} print region->extents $2 = {x1 = 253, y1 = 178, x2 = 319, y2 = 243} print nboxes $5 = 4 print n $8 = 2 Sorry there is no Xorg function list - some wrong is with current rawhide debuginfo package
Do tell how you tweaked flash for hw acceleration... Is this gnash or adobe? Any chance you can make this crash with debug=full? Reconstructing the contents of the damage to see where the error crept in is the fun part of this.
(In reply to comment #1) > Do tell how you tweaked flash for hw acceleration... Is this gnash or adobe? > > Any chance you can make this crash with debug=full? Reconstructing the contents > of the damage to see where the error crept in is the fun part of this. I've not been doing anything 'special' you might have it probably already enabled by default - I've just disabled in past because with UXA I've been getting various crashes at one point in time - so I've disable acceleration, and now I've just enable it again via 'flash' context menu. flash-plugin-11.2.202.233-release.x86_64 firefox-12.0-1.fc18.x86_64 (For rawhide nspluginwrapper must be uninstalled - otherwise I get nspluginwrapper coredumps) It could be true that there is no XV acceleration anyway so take only as a comment, that I've just switched this flag in adobe flash player. (Hmm how do I check, it's played accelerated ?)
*g* I've never seen it do anything other than PutImage, at best ShmPutImage. Judging by bug reports, it has been known to switch to GL when fullscreen, though not for me.
Also was the crash on any site in particular? Of course, if you know how to reproduce it...
Hopefully these will help pinpoint the perpetrator: commit 1dafb4777f8378c87f34feae667582498220204c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jun 5 21:25:40 2012 +0100 sna/damage: Add some assertions to validate that each damage box is non-empty
I've tried to hit the problem again - but it's not so easy - though I've manage to get the same core already 2 times. With 'full=debug' I'm getting pretty instant problems with those signal handlers (no valgrind involved).cd /ho #1 0x00007f460de35b57 in _L_lock_52 () from /lib64/libc.so.6 #2 0x00007f460de35a38 in __GI__IO_fwrite (buf=0x1325510, size=48, count=1, fp=0xffffffffffffffff) at iofwrite.c:41 #3 0x0000000000468d60 in LogVWrite () #4 0x0000000000468bac in ErrorF () #5 0x00007f460c119ba0 in sna_crtc_set_cursor_position (crtc=0x1345760, x=718, y=257) at sna_display.c:810 #6 0x00000000004b6b44 in ?? () #7 0x000000000055a3c3 in ?? () #8 0x000000000055b3e8 in miPointerSetPosition () #9 0x0000000000448487 in ?? () #10 0x000000000044895d in ?? () #11 0x0000000000449cd3 in GetPointerEvents () #12 0x000000000044a19d in QueuePointerEvents () #13 0x000000000048fae0 in xf86PostMotionEvent () #14 0x00007f460a691d15 in HandleState (pInfo=pInfo@entry=0x17774a0, hw=hw@entry=0x1785690, now=<optimized out>, from_timer=from_timer@entry=0) at synaptics.c:3370 #15 0x00007f460a693d92 in ReadInput (pInfo=0x17774a0) at synaptics.c:1721 #16 0x0000000000480317 in ?? () #17 0x00000000004a44f0 in ?? () #18 <signal handler called> #19 0x00007f460de35a32 in __GI__IO_fwrite (buf=0x7d2a20, size=2, count=1, fp=0x1325430) at iofwrite.c:41 #20 0x0000000000468d60 in LogVWrite () #21 0x0000000000468bac in ErrorF () #22 0x00007f460c1a6a9d in vertices_float_out (ve=0x7f460c41a41c, f=0x7f4608bbc830, max=2) at kgem_debug_gen4.c:180 #23 0x00007f460c1a6b92 in ve_out (ve=0x7f460c41a41c, ptr=0x7f4608bbc830) at kgem_debug_gen4.c:192 #24 0x00007f460c1a6d23 in indirect_vertex_out (kgem=0x7f460fea9208, v=12121) at kgem_debug_gen4.c:233 #25 0x00007f460c1a6e0c in primitive_out (kgem=0x7f460fea9208, data=0x7f460fea9aa0) at kgem_debug_gen4.c:252 #26 0x00007f460c1a836f in kgem_gen4_decode_3d (kgem=0x7f460fea9208, offset=102) at kgem_debug_gen4.c:660 #27 0x00007f460c19e4bf in __kgem_batch_debug (kgem=0x7f460fea9208, nbatch=736) at kgem_debug.c:420 #28 0x00007f460c0d4e73 in _kgem_submit (kgem=0x7f460fea9208) at kgem.c:1810 #29 0x00007f460c0dd29a in kgem_submit (kgem=0x7f460fea9208) at kgem.h:254 #30 0x00007f460c108766 in sna_accel_wakeup_handler (sna=0x7f460fea9010, ready=0x7de540) at sna_accel.c:12465 #31 0x00007f460c11ecff in sna_wakeup_handler (arg=0, wakeupData=0x0, result=1, read_mask=0x7de540) at sna_driver.c:610 #32 0x000000000043839a in WakeupHandler () #33 0x0000000000462414 in WaitForSomething () #34 0x00000000004340f1 in ?? () #35 0x0000000000423495 in ?? () #36 0x00007f460dde9ba5 in __libc_start_main (main=0x423120, argc=4, ubp_av=0x7fffaaf86948, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffaaf86938) at libc-start.c:225 #37 0x000000000042376d in _start () So to fix those - I've disable UseSGIO Server { Option "UseSIGIO" "false" } But I've not managed to convince my very slow desktop with all those debugs to generate a crash again. I'll keep an eye on this - and maybe those assert will help (though I'm not use full debug now)
(In reply to comment #6) > I've tried to hit the problem again - but it's not so easy - though I've manage > to get the same core already 2 times. > > With 'full=debug' I'm getting pretty instant problems with those signal > handlers (no valgrind involved).cd /ho > > #1 0x00007f460de35b57 in _L_lock_52 () from /lib64/libc.so.6 > #2 0x00007f460de35a38 in __GI__IO_fwrite (buf=0x1325510, size=48, count=1, > fp=0xffffffffffffffff) at iofwrite.c:41 > #3 0x0000000000468d60 in LogVWrite () I should mention here it's the place of Xorg deadlock. So the binary is running - but everything stays frozen in the lock above. Just to it clear this is not coredump (unlike with valgrind, where it shutdowns the server)
(In reply to comment #7) > (In reply to comment #6) > > I've tried to hit the problem again - but it's not so easy - though I've manage > > to get the same core already 2 times. > > > > With 'full=debug' I'm getting pretty instant problems with those signal > > handlers (no valgrind involved).cd /ho > > > > #1 0x00007f460de35b57 in _L_lock_52 () from /lib64/libc.so.6 > > #2 0x00007f460de35a38 in __GI__IO_fwrite (buf=0x1325510, size=48, count=1, > > fp=0xffffffffffffffff) at iofwrite.c:41 > > #3 0x0000000000468d60 in LogVWrite () > > I should mention here it's the place of Xorg deadlock. So the binary is running > - but everything stays frozen in the lock above. Just to it clear this is not > coredump (unlike with valgrind, where it shutdowns the server) No worries, this is just the danger of doing anything from a signal handler. As this can only happen with debug=full, it's not critical except that is preventing reporters from getting useful info...
(In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6) > > > I've tried to hit the problem again - but it's not so easy - though I've manage > > > to get the same core already 2 times. > > > > > > With 'full=debug' I'm getting pretty instant problems with those signal > > > handlers (no valgrind involved).cd /ho > > > > > > #1 0x00007f460de35b57 in _L_lock_52 () from /lib64/libc.so.6 > > > #2 0x00007f460de35a38 in __GI__IO_fwrite (buf=0x1325510, size=48, count=1, > > > fp=0xffffffffffffffff) at iofwrite.c:41 > > > #3 0x0000000000468d60 in LogVWrite () > > > > I should mention here it's the place of Xorg deadlock. So the binary is running > > - but everything stays frozen in the lock above. Just to it clear this is not > > coredump (unlike with valgrind, where it shutdowns the server) > > No worries, this is just the danger of doing anything from a signal handler. As > this can only happen with debug=full, it's not critical except that is > preventing reporters from getting useful info... Maybe the debug code could be extended to skip debug print if called through signal handler then?
I was thinking of just removing the likely candidates for recursion during signal handlers... But it would be possible to wrap all the DBG and block SIGIO during the write... Hmm, there's still potential for other fail though, so I think just removing this deadlock is simplest. (For instance, SIGIO recursion is the best explanation I have so far for the memcheck explosion.)
(In reply to comment #10) > I was thinking of just removing the likely candidates for recursion during > signal handlers... But it would be possible to wrap all the DBG and block SIGIO > during the write... Hmm, there's still potential for other fail though, so I > think just removing this deadlock is simplest. (For instance, SIGIO recursion > is the best explanation I have so far for the memcheck explosion.) Right - masking the SIGIO during debug printing would probably fix the issue. Unsure how efficient is this - and how it changes the timing. But it's already slow - so probably it's not an issue to make it tiny bit slower.
I didn't like either of the options. The root cause is the use of SIGIO by X, but I'm happier with just ignoring the DBG prints for the cursor routines for the time being: commit c58d137d3eeb0e97bfd53e68404e04d9012b5697 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jun 7 11:18:27 2012 +0100 sna: Suppress DBG prints along potential SIGIO paths As ErrorF/fprintf is not re-entrant due to its mutex we can not use DBG from code that could be called by a signal handler. X's SIGIO handler attempts to move the cursor from within the handler (eek!) and so we need to be careful not to take any locks, such in as the aforementioned fprintf, along the cursor paths. Reported-by: Zdenek Kabelac <zkabelac@redhat.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=50744 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Created attachment 62940 [details] Full bt Happend again today - with git commit 392e33a62d729c64c57699505220b4029e015470 (so asserts should be compiled in - build with --enable-debug=memory) print *region {extents = {x1 = 133, y1 = 0, x2 = 542, y2 = 541}, data = 0x20aa410} print *damage {extents = {x1 = 58, y1 = 0, x2 = 542, y2 = 541}, region = {extents = {x1 = 133, y1 = 0, x2 = 542, y2 = 541}, data = 0x20aa410}, mode = DAMAGE_ADD, remain = 6, dirty = 1, box = 0x216dba4, embedded_box = {list = {next = 0x216db80, prev = 0x216db80}, size = 8, box = {{ x1 = 350, y1 = 224, x2 = 542, y2 = 242}, {x1 = 133, y1 = 242, x2 = 542, y2 = 541}, {x1 = 133, y1 = 0, x2 = 542, y2 = 224}, {x1 = 133, y1 = 224, x2 = 328, y2 = 242}, {x1 = 133, y1 = 0, x2 = 542, y2 = 160}, {x1 = 133, y1 = 160, x2 = 542, y2 = 320}, {x1 = 133, y1 = 320, x2 = 542, y2 = 480}, {x1 = 133, y1 = 480, x2 = 542, y2 = 541}}}}
Created attachment 62941 [details] Xorg log from crashed Xorg I do not have any reliable reproducer - bug it has again happened with some flash.
Since I've still full debugs and core available - I could add fill in more variables from stack.
The minor annoyance is that we've lost the damage->region. However, its boxes appears to be inside damage->embedded_boxes, so can you p nboxes and p free_boxes? If free_boxes!=NULL, it would be worth printing free_boxes[0:nboxes].
(In reply to comment #16) > The minor annoyance is that we've lost the damage->region. However, its boxes > appears to be inside damage->embedded_boxes, so can you p nboxes and p > free_boxes? > If free_boxes!=NULL, it would be worth printing free_boxes[0:nboxes]. (gdb) print boxes $1 = (BoxPtr) 0x216db94 (gdb) print free_boxes $2 = (BoxPtr) 0x0 (gdb) print nboxes $3 = 4 (gdb) print iter->size $4 = 8 (gdb) print damage->remain $5 = 6 (gdb) print n $6 = 2 so probably not big help
And then he realises he has the full bt...
That assertion looks like it should be self-contained to the damage code, yet I've reviewed every line of code there and can't see how we could end up with that assertion failling. :( I've added a few more assertions, and tweaked the likely paths, so if you could update and force DBG for src/sna/sna_damage.c that would be useful. (There is a DEBUG_DAMAGE in src/sna/sna.h, though it is probably easier to just redefine that value in src/sna/sna_damage.c)
Created attachment 62990 [details] Damage log Log from 'startkde' session which gives me same coredump for me - instantly anytime :)
Created attachment 62991 [details] full bt - from startkde crash Just to show the bt from crashed KDE session. (which is btw actually pretty slow with windowmoment) I think my kde is configure to not use compositing (But I'm not really using it often - just been testing something)
Lucky you :-p commit 221534abe2dc04fae8b8fc332104bca275d4863b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 13 22:48:43 2012 +0100 sna: Only reduce damage addition to a region operation if clean Some paths bypass operating upon the region as they do not have an YX-banded set of boxes and so prefer to defer the costly construction of the region till later. As a result, we have to be careful not to overwrite any existing information if we do operate on the region after setting the dirty boxes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50744 Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (In reply to comment #21) > (which is btw actually pretty slow with windowmoment) > I think my kde is configure to not use compositing > (But I'm not really using it often - just been testing something) It shouldn't be that slow, works acceptably if you turn of GL compositing on PNV for instance. However, given that you are exercising the damage paths implies that you are mixing rendering on the CPU and the GPU and migrating between them. Not ideal. Can you grab a debug=full log of the slow wm, so that I can see if there is anything obvious I can do?
Great, I could confirm that KDE now started without crash, so I assume same applies to flash playback. Also the window movement does not leave 'solitaire-like' trail, so the speed is ok I guess.
Created attachment 63029 [details] New crash, new bt full Doesn't want to go away so easily :) Happened when I've put all the accelerations defines from line 72-91 in sna_accel.c to 0 after some windows movement.
Created attachment 63030 [details] bzip2 damage log This time somewhat larger log - how big part is needed for attachment ?
Created attachment 63031 [details] Another bt
Created attachment 63032 [details] Next bt with new shorter damage log
Created attachment 63033 [details] shorter damage log Just thunderbird started on openbox desktop and mouse moved inside.
Onwards! Deja vu? commit b415ca05c2e1c4f09f85d8769d39e5369661ed3a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jun 14 17:06:19 2012 +0100 sna: Reset damage extents after reduction goes to zero If the reduction of the damage clears all of the boxes, we need to reset the -infinite extents so that we continue to accumulate further damage. Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=50744 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk
The shorter log also matches the same bug. \o/
If I mark this bug resolved, that will encourage the next one to bite...
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.