Created attachment 58771 [details] xorg.log When I close a window in KDE 4.8.1, X crashes: 0: /usr/bin/Xorg (xorg_backtrace+0x26) [0x462386] 1: /usr/bin/Xorg (0x400000+0x66b89) [0x466b89] 2: /lib64/libpthread.so.0 (0x7fbaeedae000+0xf270) [0x7fbaeedbd270] 3: /lib64/libc.so.6 (0x7fbaedc3d000+0x127ae0) [0x7fbaedd64ae0] 4: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fbaec1e3000+0x353ee) [0x7fbaec2183ee] 5: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fbaec1e3000+0x69f2c) [0x7fbaec24cf2c] 6: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fbaec1e3000+0x59df3) [0x7fbaec23cdf3] 7: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fbaec1e3000+0x5e4a0) [0x7fbaec2414a0] 8: /usr/bin/Xorg (0x400000+0xdeb56) [0x4deb56] 9: /usr/bin/Xorg (0x400000+0xaaa98) [0x4aaa98] 10: /usr/bin/Xorg (0x400000+0xaa8f5) [0x4aa8f5] 11: /usr/bin/Xorg (0x400000+0x166f5d) [0x566f5d] 12: /usr/bin/Xorg (BlockHandler+0x4a) [0x43704a] 13: /usr/bin/Xorg (WaitForSomething+0x11d) [0x45fd8d] 14: /usr/bin/Xorg (0x400000+0x33052) [0x433052] 15: /usr/bin/Xorg (0x400000+0x272ee) [0x4272ee] 16: /lib64/libc.so.6 (__libc_start_main+0xed) [0x7fbaedc5e3bd] 17: /usr/bin/Xorg (0x400000+0x275dd) [0x4275dd] Segmentation fault at address 0x2414 Fatal server error: Caught signal 11 (Segmentation fault). Server aborting I installed debuginfo packages, but I still see those addresses instead of symbols. Why?
(In reply to comment #0) > I installed debuginfo packages, but I still see those addresses instead of > symbols. Why? The X server is licensed under the MIT/X11 licence, whereas the code to decode the symbols in libc/libbd.a is GPL, iirc.
So if you attach gdb to X and trigger the crash, or capture the corefile (you need to pass the -core option on the X commandline) I should be able to identify which bug that was.
But this is likely opensuse problem, because this does not happen with 2.18 compiled from git.
Ah, you're right. SNA is enabled in driver from suse, not in mine.
So the trace points to SNA: #8 <signal handler called> #9 __memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:119 #10 0x00007f319dc023ee in memcpy_blt (src=<optimized out>, dst=<optimized out>, bpp=<optimized out>, src_stride=256, dst_stride=224, src_x=<optimized out>, src_y=0, dst_x=0, dst_y=0, width=56, height=1) at /usr/include/bits/string3.h:52 #11 0x00007f319dc36f2c in sna_write_boxes (sna=0x1e555c0, dst=0x1e96f30, dst_bo=0x1e97bb0, dst_dx=7, dst_dy=-2496, src=0x0, stride=256, src_dx=-1760, src_dy=-999, box=0x288f640, nbox=0) at sna_io.c:788 #12 0x00007f319dc26df3 in blt_put_composite_boxes (sna=<optimized out>, op=<optimized out>, box=<optimized out>, n=<optimized out>) at sna_blt.c:1263 #13 0x00007f319dc2b4a0 in sna_composite (op=1 '\001', src=0x2802db0, mask=0x0, dst=0x288f2d0, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=1760, dst_y=999, width=64, height=53) at sna_composite.c:499 #14 0x00000000004deb56 in damageComposite (op=1 '\001', pSrc=0x2802db0, pMask=0x0, pDst=0x288f2d0, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=1760, yDst=999, width=64, height=53) at damage.c:617 #15 0x00000000004aaa98 in compWindowUpdateAutomatic (pWin=0x24c3440) at compwindow.c:704 #16 compPaintWindowToParent (pWin=0x24c3440) at compwindow.c:734 #17 compPaintChildrenToWindow (pWin=0x1ea2010) at compwindow.c:749 #18 0x0000000000566f5d in compScreenUpdate (pScreen=<optimized out>) at compalloc.c:54 #19 compBlockHandler (i=0, blockData=0x1e555c0, pTimeout=0x7fffae717b58, pReadmask=0x7ec220) at compalloc.c:67 #20 0x000000000043704a in BlockHandler (pTimeout=0x7fffae717b58, pReadmask=0x7ec220) at dixutils.c:385 #21 0x000000000045fd8d in WaitForSomething (pClientsReady=0x2222540) at WaitFor.c:219 #22 0x0000000000433052 in Dispatch () at dispatch.c:367 #23 0x00000000004272ee in main (argc=2, argv=<optimized out>, envp=<optimized out>) at main.c:287
Was that trace from the driver built from git? (./configure --enable-sna --enable-debug)
Looking at that code path, the only thing that leapt out at me was: commit 38b0cc24b45afd4ed57187fb066896bf1e467f45 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Mar 21 01:01:47 2012 +0000 sna: Remove the short-circuiting of move-to-cpu for read if no damage The danger now is that we may have either discarded the shadow pixels or have replaced them with a GTT mapping, either way undesirable and so we should reconstruct the shadow mapping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(In reply to comment #7) > Looking at that code path, the only thing that leapt out at me was: > > commit 38b0cc24b45afd4ed57187fb066896bf1e467f45 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Wed Mar 21 01:01:47 2012 +0000 > > sna: Remove the short-circuiting of move-to-cpu for read if no damage Ok, doesn't crash by closing a window anymore with 2.18 plus cherry-pick of this commit. But I got another crash after c-a-backspace: #2 0x00007f90d868c9fe in __assert_fail_base (fmt=<optimized out>, assertion=0x7f90d6cc3f17 "bo->scanout == 0", file=0x7f90d6cc3e80 "kgem.c", line=<optimized out>, function=<optimized out>) at assert.c:94 #3 0x00007f90d868caa2 in __GI___assert_fail (assertion=0x7f90d6cc3f17 "bo->scanout == 0", file=0x7f90d6cc3e80 "kgem.c", line=1070, function=0x7f90d6cc4c30 "__kgem_bo_destroy") at assert.c:103 #4 0x00007f90d6c0c9e5 in __kgem_bo_destroy (kgem=0x1ec9860, bo=0x1f0bc80) at kgem.c:1070 #5 0x00007f90d6c1562d in kgem_bo_destroy (bo=<optimized out>, kgem=0x1ec9860) at kgem.h:275 #6 sna_destroy_private (priv=0x1f0b080, pixmap=0x1f0b000) at sna_accel.c:375 #7 sna_destroy_pixmap (pixmap=0x1f0b000) at sna_accel.c:808 #8 0x00007f90d6c40274 in sna_close_screen (scrnIndex=0, screen=0x1ede330) at sna_driver.c:787 #9 0x00000000004ab3eb in CursorCloseScreen (index=0, pScreen=0x1ede330) at cursor.c:191 #10 0x000000000056945c in AnimCurCloseScreen (index=<optimized out>, pScreen=<optimized out>) at animcur.c:106 #11 0x00000000004273a7 in main (argc=2, argv=<optimized out>, envp=<optimized out>) at main.c:327 Should I try master instead of 2.18 + that single patch?
That assertion failure is another old one. Please do give master a quick run with debugging enabled to see if anything else dies.
(In reply to comment #9) > That assertion failure is another old one. Please do give master a quick run > with debugging enabled to see if anything else dies. Sure, it does almost immediately after kde starts :): file=0x7fbb433ecda5 "sna_damage.h", line=<optimized out>, function=<optimized out>) at assert.c:94 #3 0x00007fbb44db4aa2 in __GI___assert_fail (assertion=0x7fbb433ed0f8 "*damage == ((void *)0) || (*damage)->mode != DAMAGE_ALL", file=0x7fbb433ecda5 "sna_damage.h", line=156, function=0x7fbb433ee560 "sna_damage_subtract") at assert.c:103 #4 0x00007fbb4335d3f9 in sna_damage_subtract (damage=<optimized out>, region=<optimized out>) at sna_damage.h:156 #5 sna_damage_subtract (region=0x7fff51f66dc0, damage=0xe77b40) at sna_composite.c:407 #6 sna_composite (op=1 '\001', src=0xe92290, mask=0x0, dst=0xe76370, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=-9, dst_y=-9, width=512, height=365) at sna_composite.c:491 #7 0x00000000004deb56 in damageComposite (op=1 '\001', pSrc=0xe92290, pMask=0x0, pDst=0xe76370, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=-9, yDst=-9, width=512, height=365) at damage.c:617 #8 0x00000000004aaa98 in compWindowUpdateAutomatic (pWin=0xe78160) at compwindow.c:704 #9 compPaintWindowToParent (pWin=0xe78160) at compwindow.c:734 #10 compPaintChildrenToWindow (pWin=0xefd840) at compwindow.c:749 #11 0x00000000004aa8f5 in compPaintWindowToParent (pWin=0xefd840) at compwindow.c:726 #12 compPaintChildrenToWindow (pWin=0x92e090) at compwindow.c:749 #13 0x0000000000566f5d in compScreenUpdate (pScreen=<optimized out>) at compalloc.c:54 #14 compBlockHandler (i=0, blockData=0x8e1670, pTimeout=0x7fff51f67298, pReadmask=0x7ec220) at compalloc.c:67 #15 0x000000000043704a in BlockHandler (pTimeout=0x7fff51f67298, pReadmask=0x7ec220) at dixutils.c:385 #16 0x000000000045fd8d in WaitForSomething (pClientsReady=0xcae5f0) at WaitFor.c:219 #17 0x0000000000433052 in Dispatch () at dispatch.c:367 #18 0x00000000004272ee in main (argc=2, argv=<optimized out>, envp=<optimized out>) at main.c:287
Thanks! commit 86121a3af9a9fc9a2c76d7ac9f3ec17105d20d80 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Mar 22 11:21:43 2012 +0000 sna: Adjust the damage region for the composite offset References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(In reply to comment #11) > Thanks! > > commit 86121a3af9a9fc9a2c76d7ac9f3ec17105d20d80 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Mar 22 11:21:43 2012 +0000 > > sna: Adjust the damage region for the composite offset > > References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Great, that fixes it. Thanks.
Am I right in believing we've gone a whole 24 hours without incident?
(In reply to comment #13) > Am I right in believing we've gone a whole 24 hours without incident? I had a crash yesterday. But there was no trace in xorg.log. If it happens again and I have a trace, I will create a new report anyway.
Thanks. No trace suggests we probably hit an assertion? Also check kdm.log as that should capture the stderr as well, I believe.
(In reply to comment #15) > Thanks. No trace suggests we probably hit an assertion? Also check kdm.log as > that should capture the stderr as well, I believe. Don't you know how to pass -core to xserver that is run by a DM on suse? I believe it should be somewhere in /etc/X11/xdm/.
I don't know, off hand, how to pass extra flags on SuSE. Usually it is the DM conf file, whether that may be. Or on Ubuntu, they are hardcoded!
(In reply to comment #15) > Thanks. No trace suggests we probably hit an assertion? Of course: #3 0x00007f742cefaaa2 in __GI___assert_fail (assertion=0x7f742b531ac6 "bo->base.refcnt == 1", file=0x7f742b5319e2 "kgem.c", line=3505, function=0x7f742b532d50 "kgem_create_buffer") at assert.c:103 #4 0x00007f742b47c3a2 in kgem_create_buffer (kgem=0x14b2c88, size=15744, flags=4, ret=0x7fff1c1ea048) at kgem.c:3505 #5 0x00007f742b4ae840 in sna_read_boxes (sna=0x14b2aa0, src_bo=0x51ed6e0, src_dx=0, src_dy=0, dst=0x4ca2b00, dst_dx=0, dst_dy=0, box=0x4c8c600, nbox=1) at sna_io.c:334 #6 0x00007f742b48a094 in _sna_pixmap_move_to_cpu (pixmap=0x4ca2b00, flags=2) at sna_accel.c:1056 #7 0x00007f742b48bb6a in sna_drawable_move_region_to_cpu (drawable=0x4ca2b00, region=0x7fff1c1ea230, flags=2) at sna_accel.c:1268 #8 0x00007f742b48d08e in sna_get_image (drawable=0x4ca2b00, x=0, y=0, w=164, h=24, format=2, mask=4294967295, dst=0x256ebe0 "") at sna_accel.c:11526 #9 0x000000000055b159 in miSpriteGetImage (pDrawable=0x4ca2b00, sx=0, sy=0, w=164, h=24, format=2, planemask=4294967295, pdstLine=0x256ebe0 "") at misprite.c:434 #10 0x00000000004a8083 in compGetImage (pDrawable=0x4ca2b00, sx=<optimized out>, sy=<optimized out>, w=<optimized out>, h=<optimized out>, format=<optimized out>, planemask=4294967295, pdstLine=0x256ebe0 "") at compinit.c:149 #11 0x00000000004305d7 in DoGetImage (planemask=<optimized out>, height=<optimized out>, width=164, y=0, x=0, drawable=<optimized out>, format=<optimized out>, client=0x186bd40, im_return=<optimized out>) at dispatch.c:2156 #12 ProcGetImage (client=0x186bd40) at dispatch.c:2243 #13 0x00000000004332c9 in Dispatch () at dispatch.c:431 #14 0x00000000004272ee in main (argc=7, argv=<optimized out>, envp=<optimized out>) at main.c:287 Suse & KDM there is DISPLAYMANAGER_KDM_LOCALARGS in /etc/sysconfig/displaymanager. Setting it to whatever X flags passes them to X.
Ok, the assertion actually looks benign: commit a9e9041934b2486e5661ba6c550e6670009f69f3 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Mar 26 16:03:42 2012 +0100 sna: Remove bogus assertion of no references to in-flight upload buffers As we may hold a cached reference to an upload buffer whilst it is in-flight, the assertion that there are no such references to a buffer being reused is no longer true. Those cached references will be released as soon as we retire the buffer during the readback and so we are free to reuse such an upload buffer for immediate readback. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Another assertion hit: #3 0x00007f4b5051eaa2 in __GI___assert_fail ( assertion=0x7f4b4eb58ab8 "damage->extents.x1 == 0 && damage->extents.y1 == 0 && damage->extents.x2 == width && damage->extents.y2 == height", file=0x7f4b4eb588a3 "sna_damage.c", line=953, function=0x7f4b4eb58c60 "_sna_damage_is_all") at assert.c:103 #4 0x00007f4b4eac8f6f in _sna_damage_is_all (damage=<optimized out>, width=<optimized out>, height=<optimized out>) at sna_damage.c:950 #5 0x00007f4b4eaaa20f in sna_damage_is_all (height=<optimized out>, width=<optimized out>, _damage=0x819fd98) at sna_damage.h:122 #6 sna_pixmap_move_area_to_gpu (pixmap=0x3e43050, box=0x7fff2f3078f0, flags=3) at sna_accel.c:1683 #7 0x00007f4b4eaac6a5 in sna_drawable_use_bo (drawable=<optimized out>, box=0x7fff2f307990, damage=0x7fff2f3079a8) at sna_accel.c:1962 #8 0x00007f4b4eab815e in sna_poly_fill_rect (draw=0x86cb860, gc=0x800ae40, n=4, rect=0x81a3874) at sna_accel.c:9936 #9 0x00000000004df685 in damagePolyFillRect (pDrawable=0x86cb860, pGC=0x800ae40, nRects=4, pRects=0x81a3874) at damage.c:1357 #10 0x000000000042fd13 in ProcPolyFillRectangle (client=0x73bf120) at dispatch.c:1858 #11 0x00000000004332c9 in Dispatch () at dispatch.c:431 #12 0x00000000004272ee in main (argc=7, argv=<optimized out>, envp=<optimized out>) at main.c:287
Annoying but virtually impossible to debug with the full debugging log. :( I've obviously missed a couple of checks to make sure that the damage being added is within bounds which should identify the culprit.
(In reply to comment #21) > Annoying but virtually impossible to debug with the full debugging log. :( Note that this one is 100% reproducible right after X starts. It is enough to start Mp3Tag in wine and go to the conversion menu and select "from tag to file name". The dialog which should emerge crashes the server. So if you have debug patches, I can test and report very fast.
Thanks for the offer, I'll hopefully have some more precise debugging prepared for tomorrow.
I've added a few more asserts to hopefully capture where the invalid damage is first added. commit 932743bb333e35d5f6529a701137aad4d7490555 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Apr 1 09:54:43 2012 +0100 sna: Assert that drawing boxes are within bounds More sanity checks required. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Can you compile with --enable-debug=full and reproduce?
(In reply to comment #24) > commit 932743bb333e35d5f6529a701137aad4d7490555 ... > sna: Assert that drawing boxes are within bounds ... > Can you compile with --enable-debug=full and reproduce? Here you go: #3 0x00007f6ce02eeaa2 in __GI___assert_fail ( assertion=0x7f6cde910000 "damage->extents.x1 == 0 && damage->extents.y1 == 0 && damage->extents.x2 == width && damage->extents.y2 == height", file=0x7f6cde90fc09 "sna_damage.c", line=953, function=0x7f6cde9103f0 "_sna_damage_is_all") at assert.c:103 #4 0x00007f6cde855bd8 in _sna_damage_is_all (damage=0x193b4e0, width=385, height=184) at sna_damage.c:950 #5 0x00007f6cde81eb02 in sna_damage_is_all (_damage=0x19569f8, width=385, height=184) at sna_damage.h:122 #6 0x00007f6cde823ee2 in sna_pixmap_move_area_to_gpu (pixmap=0x19c19c0, box=0x7ffff52e34a0, flags=3) at sna_accel.c:1739 #7 0x00007f6cde824c00 in sna_drawable_use_bo (drawable=0x19b92a0, box=0x7ffff52e3540, damage=0x7ffff52e3558) at sna_accel.c:2018 #8 0x00007f6cde840ae9 in sna_poly_fill_rect (draw=0x19b92a0, gc=0x19577d0, n=4, rect=0x7f6cdc758d3c) at sna_accel.c:10010 #9 0x00000000004df685 in damagePolyFillRect (pDrawable=0x19b92a0, pGC=0x19577d0, nRects=4, pRects=0x7f6cdc758d3c) at damage.c:1357 #10 0x000000000042fd13 in ProcPolyFillRectangle (client=0x1994c00) at dispatch.c:1858 #11 0x00000000004332c9 in Dispatch () at dispatch.c:431 #12 0x00000000004272ee in main (argc=2, argv=<optimized out>, envp=<optimized out>) at main.c:287 xorg.log: http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log
This is the error: [ 26105.696] sna_poly_fill_polygon(n=4, PlaneMask: ffffffffffffffff (solid 1), solid fill: 1 [style=0, tileIsPixel=1], alu=3) [ 26105.696] sna_poly_fill_polygon: extents(984, 648), (991, 652), flags=1 [ 26105.696] use_wide_spans? 1 [ 26105.696] sna_drawable_use_bo((984, 648), (991, 652))... [ 26105.696] sna_drawable_use_bo extents (329, 88), (336, 92) [ 26105.696] sna_drawable_use_bo: partial GPU damage with no CPU damage, continuing to use GPU [ 26105.696] sna_pixmap_move_area_to_gpu() [ 26105.696] _sna_damage_is_all(385, 184)*? [ 26105.696] reduce: before region.n=34 [ 26105.696] nboxes=8, residual=7 [ 26105.696] reduce: after region.n=25 [ 26105.696] _sna_damage_is_all: no, not singular [ 26105.696] _sna_damage_is_all(385, 184)? [ 26105.696] _sna_damage_is_all: no, not singular [ 26105.696] sna_drawable_use_bo: using GPU bo with damage? 1 [ 26105.696] sna_blt_fill(alu=3, pixel=0, bpp=32) [ 26105.696] kgem_add_reloc: handle=17, pos=4, delta=0, domains=00028002 [ 26105.696] kgem_add_handle: handle=17, index=0 [ 26105.696] kgem_add_reloc: marking handle=17 dirty [ 26105.696] sna_fill_spans__fill_offset: alu=3, fg=00000000 [ 26105.696] _sna_blt_fill_boxes: 00000000 x 3 [ 26105.696] _sna_damage_add([[(0, 0), (385, 184)]: [(0, 0), (385, 184) x 25: ((0, 0), (385, 80)), ...] ] + [(984, 648), (991, 652)]) [ 26105.696] _sna_damage_create_elt: prev=(remain 8), count=1 [ 26105.696] = [[(0, 0), (991, 652)]: [(0, 0), (385, 184) x 25: ((0, 0), (385, 80)), ...] +[ ...]]* I didn't apply the composite offset to the damage during the FillSpans. Now to find out how. Thanks.
Thanks! commit 0b2651dc04cef8f9692b2557684f044b4980700f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Apr 1 19:43:15 2012 +0100 sna: Apply composite offset to damage for spans fast paths Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
BTW the tree is broken. It doesn't link sna_video_sprite_setup in. I see: undefined symbol: sna_video_sprite_setup (ddd/opt/ddxd/lib64/xorg/modules/drivers/intel_drv.so) and X crashes. I tried the assert patches on top of some previous commit.
In a sense, as auto-autoreconf is broken. Haven't looked at what changed as to why, but for now you need to rerun autogen.sh to rebuild the Makefile. :(
(In reply to comment #28) > BTW the tree is broken. Ok, I had to do: rm `git ls-files -o` and re-run autogen. Autogen alone did not help.
Ok, it works now :).
24 hours without a death. Closing this will be a definite jinx. :)
(In reply to comment #32) > 24 hours without a death. Closing this will be a definite jinx. :) Yup :): #3 0x00007fc3e9013aa2 in __GI___assert_fail (assertion=0x7fc3e764d546 "pixmap_is_scanout(pixmap)", file=0x7fc3e764d4e7 "sna_display.c", line=2179, function=0x7fc3e764d850 "sna_wait_for_scanline") at assert.c:103 #4 0x00007fc3e75c1700 in sna_wait_for_scanline (sna=0x2067a40, pixmap=0x11af0fb0, crtc=0x2077bf0, clip=0x7fff81d24d70) at sna_display.c:2179 #5 0x00007fc3e75e0e79 in sna_video_textured_put_image (scrn=<optimized out>, src_x=0, src_y=0, drw_x=891, drw_y=26, src_w=720, src_h=576, drw_w=1024, drw_h=576, id=842094169, buf=0x7fc3db08d000 "\021\021\024\024\024\024\024\024\024\024\022\022\023\022\023\022\021\021\021\022\022\021\021\021\021\021\020\020\020\020\020\021\021\021\020\020\020\020\021\022\022\022\022\022\022\022\022\022\022\021\021\021\021\021\021\021\021\020\021\021\021\021\021\020\021\020\020\020\020\017\017\017\020\020\017\017\016\016\r\r\016\017\016\016\017\017\020\020\020\020\020\020\020\020\020\020\016\020\021\022\021\020\021\021\016\017\021\r\020\021\020\020\017\017\020\021\020\017\017\021\021\020\017\020\017\016\016\016\021\022\016\017\017\016\020\022\020\017\017\020\021\022\020\016\022\020\020\017\017\016\016\r\020\023\022\022\022\023\023\021\022\023\022\020\020\020\017\016\021\021\021\021\021\021\021\021\017\017\020\020\020\022\023\023\022\021\021\020\020\020\021\022\022\020\021\022\021\020\020\020"..., width=720, height=576, sync=0, clip=0x7fff81d24d70, data=0x208cfd0, drawable=0x6b5d4f0) at sna_video_textured.c:291 #6 0x0000000000531e4e in xf86XVPutImage (client=<optimized out>, pDraw=0x6b5d4f0, pPort=0x208e010, pGC=<optimized out>, src_x=<optimized out>, src_y=<optimized out>, src_w=720, src_h=576, drw_x=0, drw_y=0, drw_w=1024, drw_h=576, format=0x208dd10, data=0x7fc3db08d000 "\021\021\024\024\024\024\024\024\024\024\022\022\023\022\023\022\021\021\021\022\022\021\021\021\021\021\020\020\020\020\020\021\021\021\020\020\020\020\021\022\022\022\022\022\022\022\022\022\022\021\021\021\021\021\021\021\021\020\021\021\021\021\021\020\021\020\020\020\020\017\017\017\020\020\017\017\016\016\r\r\016\017\016\016\017\017\020\020\020\020\020\020\020\020\020\020\016\020\021\022\021\020\021\021\016\017\021\r\020\021\020\020\017\017\020\021\020\017\017\021\021\020\017\020\017\016\016\016\021\022\016\017\017\016\020\022\020\017\017\020\021\022\020\016\022\020\020\017\017\016\016\r\020\023\022\022\022\023\023\021\022\023\022\020\020\020\017\016\021\021\021\021\021\021\021\021\017\017\020\020\020\022\023\023\022\021\021\020\020\020\021\022\022\020\021\022\021\020\020\020"..., sync=0, width=720, height=576) at xf86xv.c:1858 #7 0x00007fc3e85195c4 in ProcXvShmPutImage (client=0x7328380) at xvdisp.c:1091 #8 0x00000000004332c9 in Dispatch () at dispatch.c:431 #9 0x00000000004272ee in main (argc=7, argv=<optimized out>, envp=<optimized out>) at main.c:287
It happens every time I run mplayer or kaffeine. This started occurring after update from KDE 4.8.1 to 4.8.2 (or maybe some other?).
Ok, that should hopefully have just been an assertion failure. commit 4356fae72db3a33935b575edf95c84fbb48072a7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Apr 6 21:10:50 2012 +0100 sna/video: Only wait upon the scanout pixmap Caught by the addition of the assertion. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once more into the breach... As always please report any issues.
Heh. It won't let me alone :). #3 0x00007fd43458baa2 in __GI___assert_fail (assertion=0x7fd432bc2908 "!kgem_busy(kgem, bo->handle)", file=0x7fd432bc2782 "kgem.c", line=315, function=0x7fd432bc375b "kgem_bo_retire") at assert.c:103 #4 0x00007fd432b07415 in kgem_bo_retire (kgem=<optimized out>, bo=<optimized out>) at kgem.c:315 #5 kgem_bo_retire (kgem=0xcb3c28, bo=0x17c32f0) at kgem.c:311 #6 0x00007fd432b0b118 in kgem_bo_map (kgem=0xcb3c28, bo=0x17c32f0) at kgem.c:3202 #7 0x00007fd432b40f9e in write_boxes_inplace (n=1, box=0x7fff7177f9a0, dst_dy=0, dst_dx=-708, bo=0x17c32f0, src_dy=-26, src_dx=-713, bpp=32, stride=3076, src=0x7fd43085c000, kgem=0xcb3c28) at sna_io.c:478 #8 sna_write_boxes (sna=0xcb3a40, dst=0x17c2fd0, dst_bo=0x17c32f0, dst_dx=-708, dst_dy=0, src=0x7fd43085c000, stride=3076, src_dx=-713, src_dy=-26, box=0x7fff7177f9a0, nbox=1) at sna_io.c:549 #9 0x00007fd432b1ea53 in sna_copy_boxes (src=<optimized out>, dst=<optimized out>, gc=<optimized out>, box=0x7fff7177f9a0, n=1, dx=-708, dy=-26, reverse=0, upsidedown=0, bitplane=0, closure=0x0) at sna_accel.c:3762 #10 0x000000000054cd4a in miCopyRegion (pSrcDrawable=0x1a43b20, pDstDrawable=<optimized out>, pGC=0x17d2b40, pDstRegion=<optimized out>, dx=-713, dy=-26, copyProc=0x7fd432b1d4f0 <sna_copy_boxes>, bitPlane=0, closure=0x0) at micopy.c:137 #11 0x000000000054d242 in miDoCopy (pSrcDrawable=0x1a43b20, pDstDrawable=0x102a590, pGC=0x17d2b40, xIn=20, yIn=20, widthSrc=157, heightSrc=37, xOut=733, yOut=46, copyProc=0x7fd432b1d4f0 <sna_copy_boxes>, bitPlane=0, closure=0x0) at micopy.c:334 #12 0x00007fd432b29bd3 in sna_copy_area (src=0x1a43b20, dst=0x102a590, gc=0x17d2b40, src_x=20, src_y=20, width=157, height=37, dst_x=20, dst_y=20) at sna_accel.c:4005 #13 0x00000000004e133c in damageCopyArea (pSrc=0x1a43b20, pDst=0x102a590, pGC=0x17d2b40, srcx=<optimized out>, srcy=20, width=157, height=37, dstx=20, dsty=20) at damage.c:912 #14 0x000000000042f3c3 in ProcCopyArea (client=0x1754e30) at dispatch.c:1644 #15 0x00000000004332c9 in Dispatch () at dispatch.c:431 #16 0x00000000004272ee in main (argc=7, argv=<optimized out>, envp=<optimized out>) at main.c:287
You didn't experience a GPU hang around that time by any chance? Another (weird) explanation would be passing around an invalid bo->handle. The assertion is a fairly benign warning that my second guessing of the kernel has gone wrong.
commit 09deba927daa96be6230b1c3e1b425622512d8a2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 10 20:59:36 2012 +0100 sna: Check ioctl return from set-domain Let's not assume it succeeds and so avoid altering our bookkeeping along failure paths. That looks to be one failure path that could have lead to the assertion you saw.
(In reply to comment #38) > You didn't experience a GPU hang around that time by any chance? I don't think so. I booted ~ 40 kernels todays to catch all the bugs, so I don't remember exactly :). Now running with the driver updated.
* goes to poke the bear. Let's hope everything has settled down into place... :)
Ok, I got this one today: #3 0x00007f21141e9cc2 in __GI___assert_fail (assertion=0x7f211281871d "data", file=0x7f2112820ba2 "kgem.c", line=4041, function=0x7f2112822040 "kgem_upload_source_image") at assert.c:103 #4 0x00007f211276ddc8 in kgem_upload_source_image (kgem=<optimized out>, data=0x0, box=0x7fff8ab21480, stride=<optimized out>, bpp=<optimized out>) at kgem.c:4041 #5 0x00007f21127a5207 in upload (box=0x7fff8ab21480, pixmap=0x697fea0, channel=0x7fff8ab21720, sna=0x1c3a9a0) at sna_render.c:468 #6 sna_render_pixmap_bo (sna=0x1c3a9a0, channel=0x7fff8ab21720, pixmap=0x697fea0, x=<optimized out>, y=<optimized out>, w=<optimized out>, h=<optimized out>, dst_x=1344, dst_y=150) at sna_render.c:585 #7 0x00007f21127cb0b8 in gen3_composite_picture (sna=0x1c3a9a0, picture=0x3e8c8a0, op=<optimized out>, channel=0x7fff8ab21720, x=<optimized out>, y=<optimized out>, w=61, h=23, dst_x=1344, dst_y=150, precise=true) at gen3_render.c:2388 #8 0x00007f21127d24c2 in gen3_render_composite (tmp=0x7fff8ab216d0, height=23, width=61, dst_y=150, dst_x=1344, mask_y=0, mask_x=0, src_y=0, src_x=5920, dst=0x2065ce0, mask=0x27e60d0, src=0x3e8c8a0, op=3 '\003', sna=0x1c3a9a0) at gen3_render.c:2832 #9 gen3_render_composite (sna=0x1c3a9a0, op=3 '\003', src=0x3e8c8a0, mask=0x27e60d0, dst=0x2065ce0, src_x=5920, src_y=0, mask_x=0, mask_y=0, dst_x=1344, dst_y=150, width=61, height=23, tmp=0x7fff8ab216d0) at gen3_render.c:2749 #10 0x00007f2112795611 in sna_composite (op=3 '\003', src=0x3e8c8a0, mask=0x27e60d0, dst=0x2065ce0, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=1344, dst_y=150, width=61, height=23) at sna_composite.c:504 #11 0x00000000004dbbbb in damageComposite (op=3 '\003', pSrc=0x3e8c8a0, pMask=0x27e60d0, pDst=0x2065ce0, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=1344, yDst=150, width=61, height=23) at damage.c:617 #12 0x00000000004d69a4 in ProcRenderComposite (client=0x1ff87c0) at render.c:728 #13 0x0000000000433411 in Dispatch () at dispatch.c:431 #14 0x000000000042760a in main (argc=7, argv=0x7fff8ab21b78, envp=<optimized out>) at main.c:287 with: [ 33.163] (II) Loading /opt/ddxd/lib64/xorg/modules/drivers/intel_drv.so [ 33.163] (II) Module intel: vendor="X.Org Foundation" [ 33.163] compiled for 1.10.4, module version = 2.19.0
commit 771090f25db702d25ebbd3f2b44429cf0acfe8fd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri May 4 20:56:37 2012 +0100 sna: Add a pair of asserts to track down a NULL pointer dereference Looks like the assumption for the location of the data is invalid, allocation failure, perhaps? References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Here it is: #2 0x00007fc999d41c92 in __assert_fail_base (fmt=0x7fc999e77ba0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7fc998374150 "gc->ops == (GCOps *)&sna_gc_ops", file=<optimized out>, line=<optimized out>, function=<optimized out>) at assert.c:94 #3 0x00007fc999d41d42 in __GI___assert_fail (assertion=0x7fc998374150 "gc->ops == (GCOps *)&sna_gc_ops", file=0x7fc9983736ed "sna_accel.c", line=6832, function=0x7fc9983747e1 "sna_poly_line") at assert.c:103 #4 0x00007fc9982dfef5 in sna_poly_line (drawable=0x380b070, gc=0x378e030, mode=0, n=5, pt=0x7fffc4278cd0) at sna_accel.c:6832 #5 0x0000000000550b1a in miPolyRectangle (pDraw=0x380b070, pGC=0x378e030, nrects=<optimized out>, pRects=<optimized out>) at mipolyrect.c:171 #6 0x00007fc9982d928a in sna_poly_rectangle (drawable=0x380b070, gc=0x378e030, n=1, r=0x48f483c) at sna_accel.c:8219 #7 0x00000000004f7ee9 in damagePolyRectangle (pDrawable=0x380b070, pGC=<optimized out>, nRects=1, pRects=0x48f483c) at damage.c:1111 #8 0x0000000000434edd in ProcPolyRectangle (client=0x3bfbe60) at dispatch.c:1762 #9 0x0000000000438822 in Dispatch () at dispatch.c:425 #10 0x00000000004278e5 in main (argc=7, argv=0x7fffc4279028, envp=<optimized out>) at main.c:288
Different bug, fortunately this appears just to be an assertion failure rather than anything sinister: commit 596c0a68709a93bb376647c1b566e9df4f23b35d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue May 15 18:53:39 2012 +0100 sna: Assign GCops after checking for fallback to pass sanity checks We assert that prior to installing the fallback GCops the current ops are the default set. This is broken if we point GCops to our GPU ops, but then fallback. So check for the fallback first. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
#3 0x00007fdafb366d42 in __GI___assert_fail (assertion=0x7fdaf9997f98 "bo->flush || !kgem_busy(kgem, bo->handle)", file=0x7fdaf9997302 "kgem.c", line=317, function=0x7fdaf9998209 "kgem_bo_write") at assert.c:103 #4 0x00007fdaf98e2a7f in kgem_bo_write (kgem=0x7fdafd1d7218, bo=0x1eaa740, data=0xa2dac00, length=262144) at kgem.c:317 #5 0x00007fdaf991a9ff in sna_replace (sna=0x7fdafd1d7010, pixmap=0x722a950, _bo=0x722a9c8, src=0xa2dac00, stride=1024) at sna_io.c:1169 #6 0x00007fdaf990c306 in sna_composite (op=1 '\001', src=0x74d66e0, mask=0x0, dst=0x1eaa6a0, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=0, dst_y=0, width=256, height=256) at sna_composite.c:518 #7 0x00000000004f60f9 in damageComposite (op=1 '\001', pSrc=0x74d66e0, pMask=0x0, pDst=0x1eaa6a0, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=256, height=256) at damage.c:562 #8 0x00000000004ef064 in ProcRenderComposite (client=0x124fce0) at render.c:707 #9 0x0000000000438822 in Dispatch () at dispatch.c:425 #10 0x00000000004278e5 in main (argc=7, argv=0x7fff8a4063f8, envp=<optimized out>) at main.c:288
Could be inconsequential... Either we failed to allocate a new idle buffer and so fallback to simply stalling in order to write to the current active buffer, or that the kernel considers the object to be on the active list despite having declared it to be idle and we've never reused it.. Difficult to tell from a simple stacktrace, a p *bo would help to spot any inconsistencies there, but in this case we also need to know the external state of the kernel.
(In reply to comment #47) > a p *bo would help to spot any inconsistencies there $1 = {proxy = 0x0, list = {next = 0x1eaa748, prev = 0x1eaa748}, request = {next = 0x1eaa758, prev = 0x1eaa758}, vma = {next = 0x1eaa768, prev = 0x1eaa768}, map = 0x7fdaf5c6f000, rq = 0x0, exec = 0x0, binding = {next = 0x0, format = 0, offset = 0}, unique_id = 683661, refcnt = 1, handle = 323, presumed_offset = 178593792, delta = 0, size = {pages = {count = 64, bucket = 6}, bytes = 805306432}, pitch = 1024, tiling = 0, reusable = 1, dirty = 0, domain = 3, needs_flush = 1, vmap = 0, io = 0, flush = 0, scanout = 0, sync = 0, purged = 0} > but in this case we also need to know the external state of the kernel. How this can be grabbed when this happens next time?
(In reply to comment #48) > (In reply to comment #47) > > a p *bo would help to spot any inconsistencies there > > $1 = {proxy = 0x0, list = {next = 0x1eaa748, prev = 0x1eaa748}, request = {next > = 0x1eaa758, prev = 0x1eaa758}, vma = {next = 0x1eaa768, prev = 0x1eaa768}, map > = 0x7fdaf5c6f000, rq = 0x0, exec = 0x0, binding = {next = 0x0, format = 0, > offset = 0}, unique_id = 683661, refcnt = 1, handle = 323, presumed_offset = > 178593792, delta = 0, size = {pages = {count = 64, bucket = 6}, bytes = > 805306432}, pitch = 1024, tiling = 0, reusable = 1, dirty = 0, domain = 3, > needs_flush = 1, vmap = 0, io = 0, flush = 0, scanout = 0, sync = 0, purged = > 0} Hmm, bo->rq == NULL && bo->domain = 3 is not a good sign. Ok, I can see what is causing the confusion here. bo->needs_flush is there to suggest that although the batch retired, we are not sure if the kernel considers it idle yet. In this case, that concern turns out to be true. However, the stall that the assertion is warning about is not likely to be significant - but still better avoided. > > but in this case we also need to know the external state of the kernel. > > How this can be grabbed when this happens next time? With great difficulty. I don't think it can be done as the kernel doesn't have a mapping of bo-to-handle (only handle-to-bo), so although we can dump all the buffers and their active status /debug/dri/0/i915_gem_* (active,flushing in particular), I can't identify which of those kernel objects is the one that we are interested in, and instead have to assume that it is the one with the GTT offset of 178593792.
Created attachment 61757 [details] [review] Track flushing status of live bo Hairy, quite hairy. It would be much simpler to just tweak the assertion not to fire, except for that it defeats the whole busy-tracking...
For reference I've applied a close relative of the attached patch: commit 681c6e72412fff96b203a09be6ac8d393f3489a5 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu May 17 19:43:02 2012 +0100 sna: Track flushing status of live bo Currently we only move a bo with an outstanding kernel flush onto the flushing list if it is no longer in use. This leaves us potentially stalling on a flush if we try then to write to the object believing it to be retired and idle. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Jiri, have you seen anything reminiscent of the last assertion you reported?
(In reply to comment #52) > Jiri, have you seen anything reminiscent of the last assertion you reported? Hmm, I didn't hit any asserts in the past days. So I checked xorg.log. Despite I'm using driver from git with --enable-sna, UXA is used for acceleration... Was this changed so that UXA is the default now?
(In reply to comment #53) > Despite > I'm using driver from git with --enable-sna, UXA is used for acceleration... > Was this changed so that UXA is the default now? Yes, I was encouraged to make the switch a runtime configuration option. So rather than --enable-sna exorcising UXA, they are now installed in parallel. To maintain the old behaviour you can used --enable-sna --disable-uxa, a more genteel manner would be to use --enable-sna --with-default-accel=sna, or to enable sna and add Option "AccelMethod" "SNA" to your xorg.conf under Section "Device"
(In reply to comment #54) > Yes, I was encouraged to make the switch a runtime configuration option. Ok, now running on SNA. Will report in a couple of days...
Haven't seen a single crash with a21bdbe.
Thanks for following up, though something remains fishy as I haven't worked out the reason for the most recent crash. Please keep your eyes peeled for further mistakes.
Heh, I think I have to close this bug to see some more crashes :P. I'm now at 0db789e180b6b01fb8aff547879387058f52a0b9. #3 0x00007f7d921a0cd2 in __GI___assert_fail ( assertion=0x7f7d8e56aeb8 "damage->extents.x1 == region->extents.x1 && damage->extents.y1 == region->extents.y1 && damage->extents.x2 == region->extents.x2 && damage->extents.y2 == region->extents.y2", file=0x7f7d8e56adf3 "sna_damage.c", line=504, function=0x7f7d8e56b210 "__sna_damage_reduce") at assert.c:103 #4 0x00007f7d8e4f10d6 in __sna_damage_reduce (damage=0xb7db790) at sna_damage.c:501 #5 0x00007f7d8e4f2322 in __sna_damage_get_boxes (boxes=<optimized out>, damage=0xb7db790) at sna_damage.c:1366 #6 _sna_damage_get_boxes (damage=0xb7db790, boxes=<optimized out>) at sna_damage.c:1402 #7 0x00007f7d8e4d6852 in sna_damage_get_boxes (boxes=0x7ffff4f21728, damage=<optimized out>) at sna_damage.h:239 #8 _sna_pixmap_move_to_cpu (pixmap=0x3e24dc0, flags=11) at sna_accel.c:1143 #9 0x00007f7d8e4e9652 in sna_pixmap_move_to_cpu (flags=11, pixmap=<optimized out>) at sna.h:442 #10 sna_accel_inactive (sna=0x7f7d94034010) at sna_accel.c:12333 #11 sna_accel_block_handler (sna=0x7f7d94034010, tv=0x7ffff4f217e8) at sna_accel.c:12551 #12 0x000000000043c64a in BlockHandler (pTimeout=0x7ffff4f217e8, pReadmask=0x7e5e20) at dixutils.c:387 #13 0x00000000005616a4 in WaitForSomething (pClientsReady=0x1ff7520) at WaitFor.c:210 #14 0x00000000004385f1 in Dispatch () at dispatch.c:357 #15 0x0000000000427965 in main (argc=7, argv=0x7ffff4f21bf8, envp=<optimized out>) at main.c:288
If that's all the evidence you have :-p commit 221534abe2dc04fae8b8fc332104bca275d4863b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 13 22:48:43 2012 +0100 sna: Only reduce damage addition to a region operation if clean Some paths bypass operating upon the region as they do not have an YX-banded set of boxes and so prefer to defer the costly construction of the region till later. As a result, we have to be careful not to overwrite any existing information if we do operate on the region after setting the dirty boxes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50744 Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Two new ones with b5db90aa52f10897ad2d7795df94c0e3d2878aea. ============================ #3 0x00007fce017bab56 in malloc_printerr (action=3, str=0x7fce018a6860 "free(): invalid next size (normal)", ptr=<optimized out>) at malloc.c:5007 #4 0x00007fcdfdacb530 in glyphs_format (nlist=nlist@entry=74, list=0x92602a0, list@entry=0x925fe00, glyphs=<optimized out>, glyphs@entry=0x7fff8cdd7000) at sna_glyphs.c:1134 #5 0x00007fcdfdacdc01 in sna_glyphs (op=3 '\003', src=0x925f800, dst=0x9261cc0, mask=0x1111238, src_x=<optimized out>, src_y=<optimized out>, nlist=74, list=0x925fe00, glyphs=0x7fff8cdd7000) at sna_glyphs.c:1434 #6 0x00000000004f696d in damageGlyphs (op=3 '\003', pSrc=0x925f800, pDst=0x9261cc0, maskFormat=0x1111238, xSrc=0, ySrc=0, nlist=74, list=0x925fe00, glyphs=0x7fff8cdd7000) at damage.c:628 #7 0x00000000004efb56 in ProcRenderCompositeGlyphs (client=0x92508f0) at render.c:1389 #8 0x00000000004388a1 in Dispatch () at dispatch.c:428 #9 0x0000000000427965 in main (argc=7, argv=0x7fff8cdd79f8, envp=<optimized out>) at main.c:288 =========================== #5 OsSigHandler (signo=11, sip=0x7fff08782ef0, unused=<optimized out>) at osinit.c:107 #6 <signal handler called> #7 0x0000000000000031 in ?? () #8 0x000000000042d1d9 in FreeCursor (value=0x14dcd60, cid=cid@entry=0) at cursor.c:122 #9 0x00000000004aa45d in xf86_cursors_fini (screen=screen@entry=0x14a3bf0) at xf86Cursors.c:680 #10 0x00007f8a99df64b1 in sna_close_screen (scrnIndex=<optimized out>, screen=0x14a3bf0) at sna_driver.c:761 #11 0x00000000004e80e1 in PictureCloseScreen (index=<optimized out>, pScreen=0x14a3bf0) at picture.c:82 #12 0x00000000004c117b in CursorCloseScreen (index=0, pScreen=0x14a3bf0) at cursor.c:205 #13 0x00000000004f2a06 in AnimCurCloseScreen (index=<optimized out>, pScreen=<optimized out>) at animcur.c:106 #14 0x0000000000427a0b in main (argc=3, argv=0x7fff08783498, envp=<optimized out>) at main.c:328
One down, commit 32e7f4ee64867779b2def6fcd882708d7b0e2cf5 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jul 12 19:39:10 2012 +0100 sna/glyphs: Fix array allocation for list_extents Originally I intended to skip assigning the box on the last list. However, loop simplicity failed and now we run the risk of writing beyond the end of stack_extents, and overwriting the list_extents pointer. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> The second one occurs at for (nscr = 0; nscr < screenInfo.numScreens; nscr++) { pscr = screenInfo.screens[nscr]; ---> (void) (*pscr->UnrealizeCursor) (pDev, pscr, pCurs); } So we are either chasing a NULL pscr, or NULL pscr->UnrealizeCursor. We set UnrealizeCursor through miDCInitialize(), so either that gets clobbered or the pscr is not SNA. Can you print out the locals at that point?
(In reply to comment #61) > The second one occurs at > for (nscr = 0; nscr < screenInfo.numScreens; nscr++) { > pscr = screenInfo.screens[nscr]; > ---> (void) (*pscr->UnrealizeCursor) (pDev, pscr, pCurs); > } > So we are either chasing a NULL pscr, or NULL pscr->UnrealizeCursor. We set > UnrealizeCursor through miDCInitialize(), so either that gets clobbered or the > pscr is not SNA. Can you print out the locals at that point? They are mostly optimized out. But the assembly is: 0x000000000042d1d0 <+64>: mov %rax,%rsi 0x000000000042d1d3 <+67>: callq *0x1b0(%rax) => 0x000000000042d1d9 <+73>: cmp %ebx,0x2c(%r13) rsi is 0x14a3bf0, rax 0x31. WTF? How can they be different? 0x14a3bf0 is what is in screenInfo.screens: (gdb) p screenInfo.screens $31 = {0x14a3bf0, 0x0 <repeats 15 times>} The first member of screens: (gdb) p (struct _Screen)*screenInfo.screens[0] $22 = {myNum = 0, id = 0, x = 0, y = 0, width = 1920, height = 1080, mmWidth = 508, mmHeight = 285, numDepths = 7, rootDepth = 24 '\030' ... CloseScreen = 0x7f8a99df63f0 <sna_close_screen>, ... UnrealizeCursor = 0x54ef00 <miPointerUnrealizeCursor>, ... I don't get it...
What build of the xserver are you using? master is a bit broken at the moment, with both compile failures and segfaults during CloseScreen. What's the simplest scenario to trigger the crash? X -terminate & xdpyinfo?
(In reply to comment #63) > What build of the xserver are you using? master is a bit broken at the moment, > with both compile failures and segfaults during CloseScreen. X.Org X Server 1.12.3 Release Date: 2012-07-09 > What's the simplest scenario to trigger the crash? X -terminate & xdpyinfo? Not really. This happened twice only during work in mp3tag namer under wine.
(In reply to comment #62) > I don't get it... Well, the only explanation is a stack corruption after the crash... Going to update to the latest HEAD.
(In reply to comment #65) > (In reply to comment #62) > > I don't get it... > > Well, the only explanation is a stack corruption after the crash... Or even before the crash if stored rbp was corrupted so that we point rbp to hell. > Going to update to the latest HEAD. Seems to help so far.
(In reply to comment #66) > (In reply to comment #65) > > (In reply to comment #62) > > > I don't get it... > > > > Well, the only explanation is a stack corruption after the crash... > > Or even before the crash if stored rbp was corrupted so that we point rbp to > hell. > > > Going to update to the latest HEAD. > > Seems to help so far. Pointer corruption from the glyphs_format() would seem to fit all manner of ills. Let see if we survive a day or two without crashes... Thanks.
(In reply to comment #67) > Pointer corruption from the glyphs_format() would seem to fit all manner of > ills. Let see if we survive a day or two without crashes... It crashes every time I kill X. It dies in miPointerUnrealizeCursor called as pscr->UnrealizeCursor in FreeCursor. There is an absolute jmp in miPointerUnrealizeCursor so that it is not in the backtrace posted here earlier. The jmp jumps indeed to 0x31. Look, after SetupScreen(pScreen), pScreenPriv (in %rax) is: $10 = {spriteFuncs = 0x1851e40, screenFuncs = 0x7d32a0 <xf86PointerScreenFuncs>, CloseScreen = 0x4f46b0 <damageCloseScreen>, waitForUpdate = 0, showTransparent = 0} and the assembly looks like: => 0x000000000054ef57 <+87>: mov (%rax),%rax 0x000000000054ef5a <+90>: mov 0x8(%rax),%rax 0x000000000054ef5e <+94>: add $0x8,%rsp 0x000000000054ef62 <+98>: jmpq *%rax So it fetches (invalid) spriteFuncs (0x1851e40) into %rax and then it fetches spriteFuncs->UnrealizeCursor and calls it, but (gdb) x 0x1851e40+0x8 0x1851e48: 0x00000031 For all previous invocations (sna is not in backtrace), pScreenPriv looks like: $7 = {spriteFuncs = 0x7d4ca0 <xf86CursorSpriteFuncs>, I.e. it has some sane spriteFuncs: (gdb) p xf86CursorSpriteFuncs $13 = {RealizeCursor = 0x4b2790 <xf86CursorRealizeCursor>, UnrealizeCursor = 0x4b3380 <xf86CursorUnrealizeCursor>,
commit ea9ec18505645dfec85ab96bbbbbc1793830c737 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jul 13 11:39:01 2012 +0100 sna: Split CloseScreen into two phases In order to get the ordering correct we need to free the xf86_cursors before calling the miPointerCloseScreen. This requires us to insert a hook at the top of the CloseScreen chain. However we still require the final CloseScreen hook in order to do the fundamental clean up, hence split the CloseScreen callback into two phases. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> So for the time being, I think we should no longer be exploding...
A new one with 2.20.0-62-gf92a64d: #3 0x00007f0f1d879cd2 in __GI___assert_fail (assertion=assertion@entry=0x7f0f19c495a7 "!(bo->proxy && bo->io)", file=file@entry=0x7f0f19c49595 "kgem.h", line=line@entry=428, function=function@entry=0x7f0f19c4c612 <__PRETTY_FUNCTION__.18086> "kgem_bo_size") at assert.c:103 #4 0x00007f0f19b8f726 in kgem_bo_size (bo=<optimized out>) at kgem.h:428 #5 0x00007f0f19bc4cf9 in sna_blt_alpha_fixup_one (sna=0x7f0f1f70d010, blt=0x7fff264a8240, src_x=0, src_y=0, width=<optimized out>, height=<optimized out>, dst_x=733, dst_y=dst_y@entry=72) at sna_blt.c:400 #6 0x00007f0f19bc4f16 in blt_composite_copy_box_with_alpha (sna=<optimized out>, op=<optimized out>, box=<optimized out>) at sna_blt.c:1112 #7 0x00007f0f19bca5ba in sna_composite (op=1 '\001', src=0x2183330, mask=0x0, dst=0x2127950, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=733, dst_y=72, width=256, height=256) at sna_composite.c:538 #8 0x00000000004f66d9 in damageComposite (op=1 '\001', pSrc=0x2183330, pMask=0x0, pDst=0x2127950, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=733, yDst=72, width=256, height=256) at damage.c:562 #9 0x00000000004ef644 in ProcRenderComposite (client=0x2c87790) at render.c:707 #10 0x00000000004388a1 in Dispatch () at dispatch.c:428 #11 0x0000000000427965 in main (argc=7, argv=0x7fff264a8598, envp=<optimized out>) at main.c:288
Sigh. That was what commit 15d3eea7004822e5cbd48d676692e1b6a2b26d3e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jul 19 16:22:20 2012 +0100 sna: Handle mixed bo/buffers in assertions was *meant* to fix. Apparently I didn't review the patch hard enough as I made the specialised __kgem_bo_size the generic code and left the assertions inside kgem_bo_size! Patch on its way...
commit 6acc9e6a6e1de2a11597c810e02f793774cef2dd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Jul 21 12:07:46 2012 +0100 sna: Fix role reversal of __kgem_bo_size() and kgem_bo_size()! Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> And back to dormancy....
And as a side effect of the code in effect not being exercised before: commit 37dfdb0e9e86effc3ca8b590c98aa2382e8f0cea Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Jul 21 13:01:09 2012 +0100 sna: Correct assertion for __kgem_bo_size() Only proxies are measured in bytes not pages. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A new one, 100% reproducible with xdvi using my dvi paper: #3 0x00007f2e024a5cd2 in __GI___assert_fail (assertion=0x7f2dfe87aa2d "e && damage->mode != DAMAGE_ALL", file=0x7f2dfe87a9e3 "& damage->extents.y2 == height", line=751, function=0x7f2dfe87b030 "form_in_use") at assert.c:103 #4 0x00007f2dfe7e9621 in _sna_damage_contains_box__no_reduce (damage=<optimized out>, box=<optimized out>) at sna_damage.c:1302 #5 0x002d000500000002 in ?? () #6 0x0000000007cdc8e0 in ?? () #7 0x0000000000000003 in ?? () #8 0x000000000d6fbf78 in ?? () #9 0x000000000d7fb128 in ?? () #10 0x0000000000000002 in ?? () #11 0x0000000000000005 in ?? () #12 0x000000000a672c49 in ?? () #13 0x000000000000002d in ?? ()
(In reply to comment #74) > #13 0x000000000000002d in ?? () That is somehow mangled. This one is better: #3 0x00007fcc892c7cd2 in __GI___assert_fail (assertion=assertion@entry=0x7fcc8569c6bd "r[0].width && r[0].height", file=file@entry=0x7fcc8569c673 "sna_damage.c", line=line@entry=751, function=function@entry=0x7fcc8569ccb0 <__PRETTY_FUNCTION__.20213> "__sna_damage_add_rectangles") at assert.c:103 #4 0x00007fcc8560afe1 in __sna_damage_add_rectangles (dy=45, dx=5, n=2, r=0x2429398, damage=0x2) at sna_damage.c:751 #5 _sna_damage_add_rectangles (damage=<optimized out>, r=r@entry=0x2429398, n=n@entry=2, dx=dx@entry=5, dy=dy@entry=45) at sna_damage.c:822 #6 0x00007fcc855ea837 in sna_damage_add_rectangles (dy=45, dx=5, n=2, r=0x2429398, damage=0x164b9e8) at sna_damage.h:81 #7 sna_poly_fill_rect_blt (drawable=drawable@entry=0x1b4bd00, bo=<optimized out>, damage=0x164b9e8, pixel=14024704, n=n@entry=2, rect=rect@entry=0x2429398, extents=extents@entry=0x7fffa0d7b360, clipped=false, gc=<error reading variable: Unhandled dwarf expression opcode 0xfa>, gc=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at sna_accel.c:9114 #8 0x00007fcc855febe3 in sna_poly_fill_rect (draw=0x1b4bd00, gc=0x1c57d00, n=2, rect=0x2429398) at sna_accel.c:11513 #9 0x00000000004f7c1d in damagePolyFillRect (pDrawable=0x1b4bd00, pGC=0x1c57d00, nRects=2, pRects=0x2429398) at damage.c:1253 #10 0x000000000043526d in ProcPolyFillRectangle (client=0x2420450) at dispatch.c:1837 #11 0x0000000000438891 in Dispatch () at dispatch.c:428 #12 0x0000000000427965 in main (argc=7, argv=0x7fffa0d7b618, envp=<optimized out>) at main.c:288
This happened with 2.20.3-39-g0c882f4.
I don't think this is the serious bug, just an assertion about the application behaving unexpectedly, commit 7f5a9e30c953c174a4d6bc83e0b0a5e068cbda52 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Aug 15 09:34:24 2012 +0100 sna: Remove zero-size rectangles fro PolyFillRectangle Some applications like to waste precious bw by attempting to fill zero-sized rectangles; remove them upfront rather than process them. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> can you keep debugging after applying that patch?
(In reply to comment #77) > commit 7f5a9e30c953c174a4d6bc83e0b0a5e068cbda52 Looks good.
A new one with 2.20.4-55-gc5b46e4. It happens when KDE 4 is starting: #3 0x00007f1cad7aecd2 in __GI___assert_fail ( assertion=assertion@entry=0x7f1ca9b7fd48 "_bo->exec == &_kgem_dummy_exec", file=file@entry=0x7f1ca9b7ebee "kgem.c", line=line@entry=4606, function=function@entry=0x7f1ca9b802f0 <__PRETTY_FUNCTION__.22514> "kgem_buffer_read_sync") at assert.c:103 #4 0x00007f1ca9ac5d31 in kgem_buffer_read_sync (kgem=kgem@entry=0x7f1caf642248, _bo=_bo@entry=0x2b0c6f0) at kgem.c:4606 #5 0x00007f1ca9afda9b in sna_read_boxes (sna=sna@entry=0x7f1caf642010, src_bo=0x252f810, src_dx=12984, src_dx@entry=0, src_dy=-1120, src_dy@entry=0, dst=dst@entry=0x231fba0, dst_dx=dst_dx@entry=0, dst_dy=dst_dy@entry=0, box=0x26232b0, nbox=nbox@entry=1) at sna_io.c:444 #6 0x00007f1ca9ad494c in _sna_pixmap_move_to_cpu (pixmap=0x231fba0, flags=2) at sna_accel.c:1395 #7 0x00007f1ca9ad8e12 in sna_pixmap_move_to_cpu (flags=<optimized out>, pixmap=<optimized out>) at sna.h:447 #8 sna_drawable_move_to_cpu (drawable=<optimized out>, flags=flags@entry=2) at sna_accel.c:2052 #9 0x00007f1ca9aee1d4 in sna_composite (op=3 '\003', src=0x26265a0, mask=0x0, dst=0x2556ba0, src_x=0, src_y=0, mask_x=1, mask_y=1, dst_x=1, dst_y=1, width=92, height=13) at sna_composite.c:583
commit 71ac12e9b6ed00c28993637aafd5186a2ba26256 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Aug 27 15:49:59 2012 +0100 sna: Assert that the bo is marked as 'flush' when exported to DRI clients Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Yesterday update to 2.20.9-15-gbf44d99, today crash: #3 0x00007f67a85ef0b2 in __GI___assert_fail (assertion=assertion@entry=0x7f67a53db7c9 "priv->cpu == 0", file=file@entry=0x7f67a53db3dd "sna_accel.c", line=line@entry=2786, function=function@entry=0x7f67a53dcbd0 <__PRETTY_FUNCTION__.23726> "sna_drawable_use_bo") at assert.c:101 #4 0x00007f67a532cd30 in sna_drawable_use_bo (drawable=drawable@entry=0x2b586b0, flags=flags@entry=0, box=box@entry=0x7fffb65013a0, damage=damage@entry=0x7fffb6501168) at sna_accel.c:2786 #5 0x00007f67a5334f35 in sna_copy_boxes (src=0x1760c90, dst=0x235d880, gc=0x2b49120, region=0x7fffb65013a0, dx=-1035, dy=-26, bitplane=0, closure=0x0) at sna_accel.c:4446 #6 0x00007f67a531c958 in sna_do_copy (src=src@entry=0x1760c90, dst=dst@entry=0x235d880, gc=gc@entry=0x2b49120, sx=sx@entry=0, sy=sy@entry=32, width=width@entry=9, height=height@entry=16, dx=1035, dx@entry=0, dy=58, dy@entry=32, copy=copy@entry=0x7f67a5334cf0 <sna_copy_boxes>, bitPlane=bitPlane@entry=0, closure=closure@entry=0x0) at sna_accel.c:4978 #7 0x00007f67a5334b5a in sna_copy_area (src=0x1760c90, dst=0x235d880, gc=0x2b49120, src_x=0, src_y=32, width=9, height=16, dst_x=0, dst_y=32) at sna_accel.c:5083 #8 0x000000000051c74d in damageCopyArea (pSrc=0x1760c90, pDst=0x235d880, pGC=0x2b49120, srcx=0, srcy=<optimized out>, width=9, height=16, dstx=0, dsty=32) at damage.c:821 #9 0x00000000004d98cf in doShmPutImage (data=0x7f679db42000 "", dy=32, dx=0, sh=16, sw=9, sy=32, sx=0, h=400, w=<optimized out>, format=<optimized out>, depth=<optimized out>, pGC=0x2b49120, dst=0x235d880) at shm.c:481 #10 ProcShmPutImage (client=0x2b4a0f0) at shm.c:591 #11 0x00000000004da9a5 in ProcShmDispatch (client=0x2b4a0f0) at shm.c:1115 #12 0x000000000043d4e1 in Dispatch () at dispatch.c:428 #13 0x000000000042c01a in main (argc=7, argv=0x7fffb6501898, envp=<optimized out>) at main.c:295
(In reply to comment #81) > Yesterday update to 2.20.9-15-gbf44d99, today crash: Reproducible by running a qemu-kvm machine.
So yesterday! That should be fixed by commit b3a502a7057649b284601bdd5110b136ab1f5fbb Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Oct 3 11:46:57 2012 +0100 sna: Review sna_pixmap->cpu tracking Notably 52b211cb15b3 was triggering a few assertions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A new one from 2.20.12-15-g31eb704. #3 0x00007f5952a460d2 in __GI___assert_fail (assertion=assertion@entry=0x7f594f830e7a "kgem->nbatch", file=file@entry=0x7f594f830b19 "kgem.c", line=line@entry=2211, function=function@entry=0x7f594f832061 <__PRETTY_FUNCTION__.23604> "_kgem_submit") at assert.c:101 #4 0x00007f594f769ccc in _kgem_submit (kgem=0x7f5954ae8228) at kgem.c:2211 #5 0x00007f594f7a29d8 in kgem_set_mode (kgem=0x7f5954ae8228, mode=<optimized out>) at kgem.h:345 #6 sna_emit_wait_for_scanline_gen2 (full_height=true, y2=1078, y1=0, pipe=0, sna=0x7f5954ae8010) at sna_display.c:2833 #7 sna_wait_for_scanline (sna=sna@entry=0x7f5954ae8010, pixmap=pixmap@entry=0x2877a90, crtc=<optimized out>, clip=clip@entry=0x7ffff9fc4580) at sna_display.c:2897 #8 0x00007f594f7ca257 in sna_video_textured_put_image (scrn=<optimized out>, src_x=0, src_y=0, drw_x=3, drw_y=<optimized out>, src_w=624, src_h=352, drw_w=1914, drw_h=1080, id=842094169, buf=0x7f5949ccb000 "((((((((())*))('&&%&&()+******))))*)('&%%%%$$\"!! \037 \037\037\037\037\037\037\036\036\037\037\037\037\037\037\037\037!!!\"\"\"\"\"\"#\"\"#$$$%%%%&')++,./1110000//..---,-----,,--", ',' <repeats 11 times>, "+)))))*))))))'&&&&\"\037 !!\037\035\035\035\036\036\036\036\036\037\037 &+/33455667788989::::99::<>>@@@?="..., width=624, height=352, sync=0, clip=0x7ffff9fc4580, data=0x2853a10, drawable=0x3d2fe70) at sna_video_textured.c:281 #9 0x0000000000494f7c in xf86XVPutImage (client=<optimized out>, pDraw=0x3d2fe70, pPort=0x2854d58, pGC=<optimized out>, src_x=<optimized out>, src_y=<optimized out>, src_w=624, src_h=352, drw_x=3, drw_y=0, drw_w=1914, drw_h=1080, format=0x2854a50, data=0x7f5949ccb000 "((((((((())*))('&&%&&()+******))))*)('&%%%%$$\"!! \037 \037\037\037\037\037\037\036\036\037\037\037\037\037\037\037\037!!!\"\"\"\"\"\"#\"\"#$$$%%%%&')++,./1110000//..---,-----,,--", ',' <repeats 11 times>, "+)))))*))))))'&&&&\"\037 !!\037\035\035\035\036\036\036\036\036\037\037 &+/33455667788989::::99::<>>@@@?="..., sync=0, width=624, height=352) at xf86xv.c:1827 #10 0x00000000004defe5 in ProcXvShmPutImage (client=0x3df5850) at xvdisp.c:1025 ---Type <return> to continue, or q <return> to quit--- #11 0x000000000043d4e1 in Dispatch () at dispatch.c:428 #12 0x000000000042c01a in main (argc=7, argv=0x7ffff9fc4878, envp=<optimized out>) at main.c:295
With the current HEAD, I get an instant death: #3 0x00007fc8a77b50d2 in __GI___assert_fail (assertion=assertion@entry=0x7fc8a45a1508 "box->x2 > box->x1 && box->y2 > box->y1", file=file@entry=0x7fc8a45a0770 "sna_accel.c", line=line@entry=2579, function=function@entry=0x7fc8a45a2030 <__PRETTY_FUNCTION__.23746> "sna_drawable_use_bo") at assert.c:101 #4 0x00007fc8a44ef9f4 in sna_drawable_use_bo (drawable=0x2afa970, flags=flags@entry=1, box=box@entry=0x7fff445adc00, damage=damage@entry=0x7fff445ade40) at sna_accel.c:2579 #5 0x00007fc8a4504229 in sna_blt_composite (sna=sna@entry=0x7fc8a9857010, op=op@entry=12, src=src@entry=0x278d550, dst=dst@entry=0x31f8a60, x=x@entry=0, y=y@entry=0, dst_x=dst_x@entry=0, dst_y=dst_y@entry=0, width=0, height=0, tmp=tmp@entry=0x7fff445ade20, fallback=fallback@entry=false) at sna_blt.c:1720 #6 0x00007fc8a454e89f in gen3_render_composite (tmp=0x7fff445ade20, height=0, width=0, dst_y=0, dst_x=0, mask_y=0, mask_x=0, src_y=0, src_x=0, dst=0x31f8a60, mask=0x0, src=0x278d550, op=12 '\f', sna=0x7fc8a9857010) at gen3_render.c:2815 #7 gen3_render_composite (sna=0x7fc8a9857010, op=12 '\f', src=0x278d550, mask=0x0, dst=0x31f8a60, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=0, dst_y=0, width=0, height=0, tmp=0x7fff445ade20) at gen3_render.c:2791 #8 0x00007fc8a451574a in glyphs_via_mask (sna=sna@entry=0x7fc8a9857010, op=op@entry=3 '\003', src=src@entry=0x2fabdf0, dst=dst@entry=0x2a8f950, format=format@entry=0x2761f28, src_x=332, src_x@entry=331, src_y=114, src_y@entry=124, nlist=nlist@entry=1, list=list@entry=0x7fff445ae200, glyphs=0x7fff445ae608, glyphs@entry=0x7fff445ae600) at sna_glyphs.c:1100 #9 0x00007fc8a4515fcb in sna_glyphs (op=<optimized out>, src=0x2fabdf0, dst=0x2a8f950, mask=0x2761f28, src_x=<optimized out>, src_y=<optimized out>, nlist=1, list=0x7fff445ae200, glyphs=0x7fff445ae600) at sna_glyphs.c:1688 #10 0x00000000005198b1 in damageGlyphs (op=3 '\003', pSrc=0x2fabdf0, pDst=0x2a8f950, maskFormat=0x2761f28, xSrc=331, ySrc=124, nlist=1, ---Type <return> to continue, or q <return> to quit--- list=0x7fff445ae200, glyphs=0x7fff445ae600) at damage.c:625 #11 0x0000000000512ab6 in ProcRenderCompositeGlyphs (client=0x2fd7fb0) at render.c:1390 #12 0x000000000043d4e1 in Dispatch () at dispatch.c:428 #13 0x000000000042c01a in main (argc=7, argv=0x7fff445aeff8, envp=<optimized out>) at main.c:295
Looks like the new assertions are hitting pay-dirt: commit 4e363906a5ef15e1eb0a387cfb6b3445ac185b9d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Oct 31 22:58:59 2012 +0000 sna: Set a valid box when checking for GPU bo for BLT composite ops Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> commit b924831e445615b82a53b10e1849720e933eddfe Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Oct 31 22:55:56 2012 +0000 sna: Preserve mode if flushing before a scanline wait Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After a quick test to try and reproduce these, I guess the patches are doing their job. Thanks.
This was fast :D. This is with gt2: #3 0x00007fe90689acd2 in __GI___assert_fail ( assertion=assertion@entry=0x7fe902c65f18 "priv->gpu_damage == ((void *)0)", file=file@entry=0x7fe902c65050 "sna_accel.c", line=line@entry=2598, function=function@entry=0x7fe902c66750 <__PRETTY_FUNCTION__.21819> "sna_drawable_use_bo") at assert.c:103 #4 0x00007fe902b9f238 in sna_drawable_use_bo (drawable=drawable@entry=0x1bd0600, flags=9, box=box@entry=0x7fff05d60ab0, damage=damage@entry=0x7fff05d60aa8) at sna_accel.c:2598 #5 0x00007fe902bb7da1 in sna_composite_rectangles (op=0 '\000', dst=0x13d9550, color=0x1ba0648, num_rects=1, rects=0x1ba0650) at sna_composite.c:863 #6 0x00000000004ef755 in ProcRenderFillRectangles (client=0x18393c0) at render.c:1426 #7 0x0000000000438891 in Dispatch () at dispatch.c:428 #8 0x0000000000427965 in main (argc=9, argv=0x7fff05d60cf8, envp=<optimized out>) at main.c:288
Hmm. That is a bad sign. Any chance you can compile with --enable-debug=full and attach the Xorg.log leading to the assertion. Or narrow down the reproduction steps?
(In reply to comment #89) > Hmm. That is a bad sign. Any chance you can compile with --enable-debug=full > and attach the Xorg.log leading to the assertion. Or narrow down the > reproduction steps? I will try to reproduce. All I did was to exit and run psi+ (IM client) again. On next X start they started without any issues. I'll try to restart them later (I'm not at the machine ATM). The specialty is that this is perhaps the only window with opacity set to 90% (other have set that to 100%).
Added a few more assertions to hopefully pinpoint where the error first creeps in: commit 03fb9ded43f9bf8b73f99d5b3a8dc592fe22b523 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 20 21:13:07 2012 +0000 sna: Assert that we do not create a proxy with existent GPU damage References: http://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(In reply to comment #91) > commit 03fb9ded43f9bf8b73f99d5b3a8dc592fe22b523 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Nov 20 21:13:07 2012 +0000 > > sna: Assert that we do not create a proxy with existent GPU damage Yes, one of those trigger: #3 0x00007f4197a93cd2 in __GI___assert_fail (assertion=assertion@entry=0x7f4193e60dd0 "priv->gpu_damage == ((void *)0)", file=file@entry=0x7f4193e601b0 "sna_accel.c", line=line@entry=2602, function=function@entry=0x7f4193e61910 <__PRETTY_FUNCTION__.21820> "sna_drawable_use_bo") at assert.c:103 #4 0x00007f4193d9baa8 in sna_drawable_use_bo (drawable=drawable@entry=0x168d3a0, flags=9, box=box@entry=0x7fffc193c900, damage=damage@entry=0x7fffc193c8f8) at sna_accel.c:2602 #5 0x00007f4193db46c1 in sna_composite_rectangles (op=0 '\000', dst=0x12628d0, color=0x167fda8, num_rects=1, rects=0x167fdb0) at sna_composite.c:863 #6 0x00000000004ef755 in ProcRenderFillRectangles (client=0x13fd440) at render.c:1426 #7 0x0000000000438891 in Dispatch () at dispatch.c:428 #8 0x0000000000427965 in main (argc=9, argv=0x7fffc193cb48, envp=<optimized out>) at main.c:288
Hmm, I missed. The stacktrace is the same as before, just pushed down a few lines by the new assertions. Puzzling.
(In reply to comment #93) > Hmm, I missed. The stacktrace is the same as before, just pushed down a few > lines by the new assertions. Puzzling. Ah, you're right. Do you want the full-debug xorg.log now?
(In reply to comment #94) > (In reply to comment #93) > > Hmm, I missed. The stacktrace is the same as before, just pushed down a few > > lines by the new assertions. Puzzling. > > Ah, you're right. Do you want the full-debug xorg.log now? If it is possible to do so, yes please. Otherwise, I will just sit here, sulking in the corner, cowering away from the cluebat.
Created attachment 70368 [details] xorg log with debug=full
(In reply to comment #96) > Created attachment 70368 [details] > xorg log with debug=full Weird. Not a single squeak from a function that actually attaches a proxy. Memory corruption? I've pushed a few more assertions for refcnt to: commit c0c48c7a5aca4d24936efbeaefc7674ada2ef87f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 21 14:46:45 2012 +0000 sna: Add a few refcnt assertions Can you please give that a quick spin, and perhaps valgrind?
Created attachment 70377 [details] valgrind log (In reply to comment #97) > Can you please give that a quick spin, and perhaps valgrind? Nope, it still dies at sna_accel 2603. valgrind log attached.
Still nothing! Next request, can you please attach gdb and p *pixmap, p *priv, p *priv->gpu_bo, p*priv->gpu_bo->proxy when you hit the assertion. If you can grab the full debug log for that session as well that would be useful. Then I can see what those pointers tally with from the logs to try and work out the origin.
(In reply to comment #99) > Still nothing! Next request, can you please attach gdb and With gdb, I'm hitting other assertions. Maybe due to the slowness. Once I hit the newly added assert: #3 0x00007ffff608acd2 in __GI___assert_fail ( assertion=0x7ffff243864a "bo->refcnt", file=0x7ffff2438643 "kgem.h", line=491, function=0x7ffff243b8e0 <__PRETTY_FUNCTION__.17290> "kgem_bo_is_mappable") at assert.c:103 #4 0x00007ffff232836a in kgem_bo_is_mappable (kgem=0x7ffff7f1b218, bo=0x15796c0) at kgem.h:491 #5 0x00007ffff232bbea in kgem_bo_move_to_inactive (kgem=0x7ffff7f1b218, bo=0x15796c0) at kgem.c:1377 #6 0x00007ffff232ccf1 in __kgem_bo_destroy (kgem=0x7ffff7f1b218, bo=0x15796c0) at kgem.c:1640 #7 0x00007ffff2333cae in _kgem_bo_destroy (kgem=0x7ffff7f1b218, bo=0x15796c0) at kgem.c:3692 #8 0x00007ffff23393ee in kgem_bo_destroy (kgem=0x7ffff7f1b218, bo=0x15796c0) ---Type <return> to continue, or q <return> to quit--- at kgem.h:332 #9 0x00007ffff233d278 in sna_destroy_pixmap (pixmap=0x15adfe0) at sna_accel.c:1285 #10 0x00000000004f53df in damageDestroyPixmap (pPixmap=0x15adfe0) at damage.c:1556 #11 0x00007ffff2f101e9 in XvDestroyPixmap (pPix=0x15adfe0) at xvmain.c:371 #12 0x00000000004ccfb8 in ShmDestroyPixmap (pPixmap=0x15adfe0) at shm.c:273 #13 0x0000000000459892 in doFreeResource (res=0x1578c60, skip=0) at resource.c:549 #14 0x0000000000459ff3 in FreeResource (id=62919061, skipDeleteFuncType=skipDeleteFuncType@entry=0) at resource.c:579 #15 0x000000000043434e in ProcFreePixmap (client=0xf27540) at dispatch.c:1418 #16 0x0000000000438891 in Dispatch () at dispatch.c:428 #17 0x0000000000427965 in main (argc=2, argv=0x7fffffffe378,
Assertion retracted: commit 9c627a05247690891062a2c0c1c8f7bbc0273104 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 21 16:03:02 2012 +0000 sna: Remove the kgem_bo_is_mappable refcnt assertion from freed paths A few callers of kgem_bo_is_mappable operate on freed bo, and so need to avoid the assert(bo->refcnt). Can you give hunting with gdb another shot?
(In reply to comment #101) > Can you give hunting with gdb another shot? Now I'm getting: #3 0x00007ffff608acd2 in __GI___assert_fail (assertion=0x7ffff243c9c9 "priv->stride", file=0x7ffff243c80e "sna_accel.c", line=423, function=0x7ffff2442e70 <__PRETTY_FUNCTION__.21293> "sna_pixmap_alloc_cpu") at assert.c:103 #4 0x00007ffff233b3de in sna_pixmap_alloc_cpu (sna=0x7ffff7f1b010, pixmap=0x13d5880, priv=0xf4f4b0, from_gpu=true) at sna_accel.c:423 #5 0x00007ffff233e06e in _sna_pixmap_move_to_cpu (pixmap=0x13d5880, flags=34) at sna_accel.c:1560 #6 0x00007ffff233ed10 in sna_drawable_move_region_to_cpu (drawable=0x13d5880, region=0x7fffffffdfd0, flags=34) at sna_accel.c:1809 #7 0x00007ffff23687ab in sna_get_image (drawable=0x13d5880, x=0, y=0, w=5, h=5, format=2, mask=4294967295, dst=0xd6e970 "") at sna_accel.c:13580 #8 0x00000000005548d3 in miSpriteGetImage (pDrawable=0x13d5880, sx=0, sy=0, w=5, h=5, format=2, planemask=4294967295, pdstLine=0xd6e970 "") at misprite.c:414 #9 0x00000000004bccd0 in compGetImage (pDrawable=0x13d5880, sx=<optimized out>, sy=<optimized out>, w=<optimized out>, h=<optimized out>, format=<optimized out>, planemask=4294967295, pdstLine=0xd6e970 "") at compinit.c:148 #10 0x0000000000435aea in DoGetImage (planemask=<optimized out>, height=5, width=<optimized out>, y=<optimized out>, x=0, drawable=<optimized out>, format=2, client=0x10dcf70, im_return=<optimized out>) at dispatch.c:2128 #11 ProcGetImage (client=0x10dcf70) at dispatch.c:2205 #12 0x0000000000438891 in Dispatch () at dispatch.c:428 #13 0x0000000000427965 in main (argc=2, argv=0x7fffffffe378, envp=<optimized out>) at main.c:288
Bah, sorry about that: commit 44dad490140d85a4c0dcb916030c36a838670c01 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 21 16:22:35 2012 +0000 sna: Do not dispose of a shadow pixmap Fixes regression from 2249e9edc37811c07e2807d6b4def05585b44c22
Created attachment 70381 [details] xorg log from that run (In reply to comment #99) > p *pixmap, p *priv, p *priv->gpu_bo, p*priv->gpu_bo->proxy This is dependent on timing. When all those messages from X's stderr are sent over network, the issue does not happen. As soon as I do 'set args :0 2> /dev/null', it hits whenever I run psi+. (gdb) p *pixmap $1 = {drawable = {type = 1 '\001', class = 0 '\000', depth = 32 ' ', bitsPerPixel = 32 ' ', id = 0, x = 0, y = 0, width = 22, height = 22, pScreen = 0x817890, serialNumber = 4023}, devPrivates = 0x1253950, refcnt = 1, devKind = 88, devPrivate = {ptr = 0x1253980, val = 19216768, uval = 19216768, fptr = 0x1253980}, screen_x = 1065, screen_y = 740, usage_hint = 0} (gdb) p *priv $2 = {pixmap = 0x1253910, gpu_bo = 0x12512e0, cpu_bo = 0x0, gpu_damage = 0x12541d1, cpu_damage = 0x0, ptr = 0x0, list = {next = 0x107f470, prev = 0x107f470}, stride = 0, clear_color = 0, flush = 0, source_count = 5, pinned = 0 '\000', mapped = 0 '\000', shm = 0 '\000', clear = 0 '\000', undamaged = 0 '\000', create = 0 '\000', header = 0 '\000', cpu = 0 '\000'} (gdb) *priv->gpu_bo Undefined command: "". Try "help". (gdb) p *priv->gpu_bo $3 = {proxy = 0x1251090, list = {next = 0x12512e8, prev = 0x12512e8}, request = {next = 0x12512f8, prev = 0x12512f8}, vma = {next = 0x12510b8, prev = 0x12510b8}, map = 0x107f448, rq = 0x0, exec = 0x0, binding = {next = 0x0, format = 192, offset = 0}, unique_id = 950, refcnt = 1, handle = 370, target_handle = 0, presumed_offset = 0, delta = 0, size = { pages = {count = 1936, bucket = 0}, bytes = 1936}, pitch = 88, tiling = 0, reusable = 0, dirty = 0, domain = 3, needs_flush = 0, snoop = 0, io = 1, flush = 0, scanout = 0, purged = 0} (gdb) p *priv->gpu_bo->proxy $4 = {proxy = 0x0, list = {next = 0x1251098, prev = 0x1251098}, request = {next = 0x12510a8, prev = 0x12510a8}, vma = {next = 0x1251308, prev = 0x1251308}, map = 0x7ffff08a2001, rq = 0x0, exec = 0x0, binding = {next = 0x0, format = 0, offset = 0}, unique_id = 0, refcnt = 1, handle = 370, target_handle = 370, presumed_offset = 38563840, delta = 0, size = {pages = {count = 1, bucket = 0}, bytes = 1}, pitch = 0, tiling = 0, reusable = 1, dirty = 0, domain = 0, needs_flush = 0, snoop = 0, io = 1, flush = 0, scanout = 0, purged = 0}
It very much looks like we create a proxy for an incomplete pixmap, in fact it looks like the pixmap is entirely uninitialised. This causes it to skip the proxy removal when the pixmap is later cleared. So commit ee72375ecd4f6d6e756bc361fa512b6675309540 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 21 17:08:53 2012 +0000 sna: Do not migrate uninitialised pixmaps should prevent this, but this is a very strange thing for the application to be performing, so I'm hesitant that this is the final answer.
Reading the debug log, it looks pretty clear that the pixmap is not drawn to between creation and use as a source... However, the earlier patch was not complete and so you also need commit 8da12a00ee653510e1f1a6fecb28dbb36faa8400 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 21 17:28:46 2012 +0000 sna: Skip uploading unintialiased pixmap data *crosses fingers*
(In reply to comment #106) > *crosses fingers* Yes, it works.
Until next time... Thanks for all the digging.
Another one today, with 2.20.15-6-gb82bfcb: #3 0x00007f99ffea73c2 in __GI___assert_fail (assertion=assertion@entry=0x7f99fca892f0 "kgem->nbatch", file=file@entry=0x7f99fca88f99 "kgem.c", line=line@entry=2278, function=function@entry=0x7f99fca8a4d1 <__PRETTY_FUNCTION__.23700> "_kgem_submit") at assert.c:101 #4 0x00007f99fc9beb80 in _kgem_submit (kgem=0x7f9a01d5d228) at kgem.c:2278 #5 0x00007f99fca20dcf in kgem_set_mode (kgem=0x7f9a01d5d228, mode=<optimized out>) at kgem.h:348 #6 sna_video_textured_put_image (scrn=<optimized out>, src_x=0, src_y=0, drw_x=0, drw_y=<optimized out>, src_w=640, src_h=360, drw_w=1920, drw_h=1080, id=842094169, buf=0x7f99f797e000 "\a1 \033$$\031#0)#\037\033\027\030\034\027\034#&&$##(&%%(('%\037\"%&&&)+(,-(%'('&&&&&&&&&'()+,-..-+(%\" \037*)&$#$%')('&#\"$'('&$\"\"%&%&%\"#\031\024\022\033\037\"$\036\036\036\036\036\037\037\037 \037\037\037\037\037\037 ! \037\037\036\036\036\036\035\035\035\035\035\035\035\035\035\034\034\034\034\034\034\034\033\032\031\032\033\033\034\034\035\033\030\027\027\027\030\032\031\031\030\030\027\027\030\032\035\034\033\033\032\031\031\031\031\032\033\034\035\034\033\031"..., width=640, height=360, sync=0, clip=0x7fffc27bd430, data=0x2234ad0, drawable=0x33c0d00) at sna_video_textured.c:278 #7 0x0000000000494f9c in xf86XVPutImage (client=<optimized out>, pDraw=0x33c0d00, pPort=0x2235e48, pGC=<optimized out>, src_x=<optimized out>, src_y=<optimized out>, src_w=640, src_h=360, drw_x=0, drw_y=0, drw_w=1920, drw_h=1080, format=0x2235b40, data=0x7f99f797e000 "\a1 \033$$\031#0)#\037\033\027\030\034\027\034#&&$##(&%%(('%\037\"%&&&)+(,-(%'('&&&&&&&&&'()+,-..-+(%\" \037*)&$#$%')('&#\"$'('&$\"\"%&%&%\"#\031\024\022\033\037\"$\036\036\036\036\036\037\037\037 \037\037\037\037\037\037 ! \037\037\036\036\036\036\035\035\035\035\035\035\035\035\035\034\034\034\034\034\034\034\033\032\031\032\033\033\034\034\035\033\030\027\027\027\030\032\031\031\030\030\027\027\030\032\035\034\033\033\032\031\031\031\031\032\033\034\035\034\033\031"..., sync=0, width=640, height=360) at xf86xv.c:1827 #8 0x00000000004df015 in ProcXvShmPutImage (client=0x52979d0) at xvdisp.c:1025 #9 0x000000000043d501 in Dispatch () at dispatch.c:428 #10 0x000000000042c03a in main (argc=7, argv=0x7fffc27bd728, envp=<optimized out>) at main.c:295
commit f522fbe7c98ffad86126c3666b2d9f7e616480b8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Dec 17 23:04:25 2012 +0000 sna: Refine check for an unset context switch So it appears that we end up performing a context switch on an empty batch, but already has a mode. This is caught later, too late, by assertions. However, we can change the guards slightly to prevent those assertions without altering the code too greatly. And I can then think how to detect where we are setting a mode on the batch but doing no work - which is likely masking a bigger bug. Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> That should prevent that assertion, but there is still a mistake in the code somewhere. I'll try adding a few more assertions to flush it out -- it is likely harmless, but worth investigating. Thanks.
This is with 2.20.16-32-g1f4ede0. #3 0x00007fe7ac3353c2 in __GI___assert_fail (assertion=assertion@entry=0x7fe7a912673a "size", file=file@entry=0x7fe7a911a3d9 "kgem.c", line=line@entry=4811, function=function@entry=0x7fe7a911bec0 <__PRETTY_FUNCTION__.24228> "kgem_create_buffer") at assert.c:101 #4 0x00007fe7a9050af9 in kgem_create_buffer (kgem=kgem@entry=0x7fe7ae1eb228, size=0, flags=flags@entry=3, ret=ret@entry=0x7fffccc23838) at kgem.c:4811 #5 0x00007fe7a90ae209 in sna_video_copy_data (sna=sna@entry=0x7fe7ae1eb010, video=video@entry=0x1114060, frame=frame@entry=0x7fffccc23900, buf=buf@entry=0x7fe7a3142000 '\022' <repeats 48 times>, '\023' <repeats 48 times>, "\022\022\022\022\022\022\022\023\023\023\023\023\023\023\023\023\024\024\024\024\024\024\024\024\033\031\025\022\021\020\020\021\021\021\021\021\021\021\021\021", '\022' <repeats 16 times>, "\020\020\021\021\021\021\022\022\022\025\025\027\030\032\033\034 #!\035\034 !\035\034\035\033\032\031\030\031\033\034\035\036\037\036\037 \037%*169;>"...) at sna_video.c:509 #6 0x00007fe7a90b0930 in sna_video_textured_put_image (scrn=0x10f9140, src_x=0, src_y=0, drw_x=891, drw_y=26, src_w=720, src_h=576, drw_w=1024, drw_h=576, id=842094169, buf=0x7fe7a3142000 '\022' <repeats 48 times>, '\023' <repeats 48 times>, "\022\022\022\022\022\022\022\023\023\023\023\023\023\023\023\023\024\024\024\024\024\024\024\024\033\031\025\022\021\020\020\021\021\021\021\021\021\021\021\021", '\022' <repeats 16 times>, "\020\020\021\021\021\021\022\022\022\025\025\027\030\032\033\034 #!\035\034 !\035\034\035\033\032\031\030\031\033\034\035\036\037\036\037 \037%*169;>"..., width=720, height=576, sync=0, clip=0x7fffccc23a20, data=0x1114060, drawable=0x234a4d0) at sna_video_textured.c:275 #7 0x0000000000494f9c in xf86XVPutImage (client=<optimized out>, pDraw=0x234a4d0, pPort=0x11153a8, pGC=<optimized out>, src_x=<optimized out>, src_y=<optimized out>, src_w=720, src_h=576, drw_x=0, drw_y=0, drw_w=1024, drw_h=576, format=0x11150a0, data=0x7fe7a3142000 '\022' <repeats 48 times>, '\023' <repeats 48 times>, "\022\022\022\022\022\022\022\023\023\023\023\023\023\023\023\023\024\024\024\024\024\024\024\024\033\031\025\022\021\020\020\021\021\021\021\021\021\021\021\021", '\022' <repeats 16 times>, "\020\020\021\021\021\021\022\022\022\025\025\027\030\032\033\034 #!\035\034 !\035\034\035\033\032\031\030\031\033\034\035\036\037\036\037 \037%*169;>"..., sync=0, width=720, height=576) at xf86xv.c:1827 #8 0x00000000004df015 in ProcXvShmPutImage (client=0x2531750) at xvdisp.c:1025 #9 0x000000000043d501 in Dispatch () at dispatch.c:428 #10 0x000000000042c03a in main (argc=7, argv=0x7fffccc23d18, envp=<optimized out>) at main.c:295
Ok, that doesn't make any sense as we should not have done anything between initialising the frame->size and using it there to create the buffer (with the failed assertion of size==0). Can you reproduce this? And possibly attach the full debug log?
(In reply to comment #112) > Can you reproduce this? And possibly > attach the full debug log? Yep, 100% reproducible by running mplayer. Will attach later.
(In reply to comment #112) > And possibly attach the full debug log? It's 78 megs, 3 megs compressed. I put it here: http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.xz
As always the unexplainable is a typo! Many thanks for the debug log. commit 52fd223fc970118cbdcb31f9574414debc905e9c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Dec 21 21:36:30 2012 +0000 sna/video: Initialise alignment for video ports > 0 We repeatedly set the alignment value on the first port, rather than once for each. Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oh fuck. This destroyed my conference presentation: #2 0x00007fab73f79c22 in __assert_fail_base (fmt=0x7fab740afb60 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7fab7034e181 <__PRETTY_FUNCTION__.19893+33> "ce", file=file@entry=0x7fab7034e14d "", line=line@entry=159, function=function@entry=0x7fab7034e947 <__PRETTY_FUNCTION__.10673+7> "rivateAddr") at assert.c:94 #3 0x00007fab73f79cd2 in __GI___assert_fail (assertion=0x7fab7034e181 <__PRETTY_FUNCTION__.19893+33> "ce", file=0x7fab7034e14d "", line=159, function=0x7fab7034e947 <__PRETTY_FUNCTION__.10673+7> "rivateAddr") at assert.c:103 #4 0x00007fab702b9eab in sna_output_backlight_get (output=<error reading variable: Unhandled dwarf expression opcode 0xfa>, output=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at sna_display.c:259 I don't think there is anything which can be done about as this was an older snapshot. However I updated I got when I moved with a firefox window (IVB, gl composition): #3 0x00007fd140c16cd2 in __GI___assert_fail (assertion=assertion@entry=0x7fd13cfe6471 "bo->reusable", file=file@entry=0x7fd13cfe6359 "kgem.c", line=line@entry=1870, function=function@entry=0x7fd13cfe7a00 <__PRETTY_FUNCTION__.21565> "kgem_retire__flushing") at assert.c:103 #4 0x00007fd13cf1b9d6 in kgem_retire__flushing (kgem=0x7fd142aa9210) at kgem.c:1870 #5 kgem_retire (kgem=kgem@entry=0x7fd142aa9210) at kgem.c:2033 #6 0x00007fd13cf49f94 in sna_accel_wakeup_handler (sna=sna@entry=0x7fd142aa9010) at sna_accel.c:14331 #7 0x00007fd13cf5d1b6 in sna_wakeup_handler (arg=<optimized out>, wakeupData=<optimized out>, result=<optimized out>, read_mask=0x7e5e40 <LastSelectMask>) at sna_driver.c:653 #8 0x000000000043c84a in WakeupHandler (result=result@entry=1, pReadmask=pReadmask@entry=0x7e5e40 <LastSelectMask>) at dixutils.c:424 #9 0x0000000000561ab4 in WaitForSomething (pClientsReady=pClientsReady@entry=0x18023e0) at WaitFor.c:224 #10 0x0000000000438611 in Dispatch () at dispatch.c:357 #11 0x0000000000427965 in main (argc=9, argv=0x7ffff4d27698, envp=<optimized out>) at main.c:288
The first one happened when I tried to set resolution of the LCD display. Nicely reproducible with: for aa in `seq 1 5`; do xrandr -s 1024x768; xrandr -s 1366x768; done the trace for the new snapshot is: #3 0x00007ffd7da9acd2 in __GI___assert_fail (assertion=assertion@entry=0x7ffd79e71418 "bo->delta", file=file@entry=0x7ffd79e6a359 "kgem.c", line=line@entry=3488, function=function@entry=0x7ffd79e6bd82 <__PRETTY_FUNCTION__.21912> "kgem_create_2d") at assert.c:103 #4 0x00007ffd79da278d in kgem_create_2d (kgem=<optimized out>, width=1366, height=768, bpp=32, tiling=1, flags=flags@entry=17) at kgem.c:3488 #5 0x00007ffd79e4a219 in sna_dri_create_buffer (draw=0x1722410, attachment=1, format=32) at sna_dri.c:303 #6 0x00007ffd7cfa69f3 in allocate_or_reuse_buffer (pDraw=0x1722410, pPriv=pPriv@entry=0x1722630, attachment=attachment@entry=1, format=format@entry=32, dimensions_match=dimensions_match@entry=0, buffer=buffer@entry=0x210a460, ds=<error reading variable: Unhandled dwarf expression opcode 0xfa>, ds=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at dri2.c:399 #7 0x00007ffd7cfa763d in do_get_buffers (pDraw=<optimized out>, width=width@entry=0x7ffff0dc46d0, height=height@entry=0x7ffff0dc46d8, attachments=0x2187084, attachments@entry=0x218707c, count=1, out_count=out_count@entry=0x7ffff0dc46e0, has_format=has_format@entry=1) at dri2.c:476 #8 0x00007ffd7cfa79a0 in DRI2GetBuffersWithFormat (pDraw=<optimized out>, width=width@entry=0x7ffff0dc46d0, height=height@entry=0x7ffff0dc46d8, attachments=attachments@entry=0x218707c, count=<optimized out>, out_count=out_count@entry=0x7ffff0dc46e0) at dri2.c:593 #9 0x00007ffd7cfa9068 in ProcDRI2GetBuffersWithFormat (client=0x16626a0) at dri2ext.c:299 #10 ProcDRI2Dispatch (client=0x16626a0) at dri2ext.c:564 #11 0x00000000004388c1 in Dispatch () at dispatch.c:428 #12 0x0000000000427965 in main (argc=9, argv=0x7ffff0dc48e8, envp=<optimized out>) at main.c:288
(In reply to comment #117) > The first one happened when I tried to set resolution of the LCD display. > Nicely reproducible with: > for aa in `seq 1 5`; do xrandr -s 1024x768; xrandr -s 1366x768; done commit 0507d55dd1bc8fedae524a410a9e7b53f1dad920 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 21 16:24:49 2013 +0000 sna: Only add bound scanouts to the scanout list If we never used the bo as an actual scanout it will never have had been moved to the uncached domain and so we can return it back to the system cache. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(In reply to comment #116) > Oh fuck. This destroyed my conference presentation: > #2 0x00007fab73f79c22 in __assert_fail_base (fmt=0x7fab740afb60 "%s%s%s:%u: > %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7fab7034e181 <__PRETTY_FUNCTION__.19893+33> > "ce", file=file@entry=0x7fab7034e14d "", line=line@entry=159, > function=function@entry=0x7fab7034e947 <__PRETTY_FUNCTION__.10673+7> > "rivateAddr") at assert.c:94 > #3 0x00007fab73f79cd2 in __GI___assert_fail (assertion=0x7fab7034e181 > <__PRETTY_FUNCTION__.19893+33> "ce", file=0x7fab7034e14d "", > line=159, function=0x7fab7034e947 <__PRETTY_FUNCTION__.10673+7> > "rivateAddr") at assert.c:103 > #4 0x00007fab702b9eab in sna_output_backlight_get (output=<error reading > variable: Unhandled dwarf expression opcode 0xfa>, > output=<error reading variable: Unhandled dwarf expression opcode 0xfa>) > at sna_display.c:259 > > I don't think there is anything which can be done about as this was an older > snapshot. I'm not sure just what assert you managed to hit there. I would say it tried to resolve the backtrace against the wrong source tree. However I updated I got when I moved with a firefox window (IVB, > gl composition): > #3 0x00007fd140c16cd2 in __GI___assert_fail > (assertion=assertion@entry=0x7fd13cfe6471 "bo->reusable", > file=file@entry=0x7fd13cfe6359 "kgem.c", line=line@entry=1870, > function=function@entry=0x7fd13cfe7a00 <__PRETTY_FUNCTION__.21565> > "kgem_retire__flushing") at assert.c:103 > #4 0x00007fd13cf1b9d6 in kgem_retire__flushing (kgem=0x7fd142aa9210) at > kgem.c:1870 Puzzling, but this will paper over the nuisance: commit 10f549332e315cfe2cc86aadab94a95ae6757c34 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 21 16:48:34 2013 +0000 sna: Free a non-reusable bo if it expires on the flushing list Still no sure just how the bo ends up there, but as there seems to be the occasional malinger, just free it. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(In reply to comment #119) > > I don't think there is anything which can be done about as this was an older > > snapshot. > > I'm not sure just what assert you managed to hit there. I would say it tried > to resolve the backtrace against the wrong source tree. Yeah, I updated, then ran gdb. But it should be the same as in comment 117. OK, the xrandr -s issue is gone now. Cannot tell anything about the other as I have no idea how to reproduce that. What am I seeing right now in this "Additional Comments" window that some text is hiding and showing when I hover the "Save Changes" button. "as this was an older" in particular.
(In reply to comment #120) > What am I seeing right now in this "Additional Comments" window that some > text is hiding and showing when I hover the "Save Changes" button. "as this > was an older" in particular. commit c8817e24a9d97110a961c3803290e38ff5cbfc9a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jan 22 09:06:50 2013 +0000 sna/gen7: Fix inversion of bool return code from CA pass As we inverted the predicate, we no longer restored the original operation after performing a CA pass - glyph would randomly become white. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The previous issues seem to be gone with 2d92d8e. I got another crash right now though: #3 0x00007fae724eccd2 in __GI___assert_fail (assertion=assertion@entry=0x7fae6e8bf180 "!priv->flush || !list_is_empty(&priv->list)", file=file@entry=0x7fae6e8be1ce "sna_accel.c", line=line@entry=2265, function=function@entry=0x7fae6e8bf980 <__PRETTY_FUNCTION__.21789> "sna_drawable_move_region_to_cpu") at assert.c:103 #4 0x00007fae6e80f206 in sna_drawable_move_region_to_cpu (drawable=drawable@entry=0x2ff5eb0, region=region@entry=0x7fff13f15c50, flags=flags@entry=1) at sna_accel.c:2265 #5 0x00007fae6e81aab2 in sna_put_zpixmap_blt (stride=5464, bits=0x7fae6777c000 ".2/\377~\201~\377~\201~\377\271\274\271\377\334\337\334\377\276\301\276\377\276\301\276\377\276\301\276\377\314\317\314\377\306\311\306\377\303\307\304\377\303\307\304\377\302\305\303\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377"..., h=768, w=1366, y=0, x=0, region=0x7fff13f15c50, drawable=0x3470fc0, gc=<optimized out>) at sna_accel.c:3417 #6 sna_put_image (drawable=0x3470fc0, gc=0x35e4550, depth=24, x=0, y=0, w=1366, h=768, left=0, format=2, bits=0x7fae6777c000 ".2/\377~\201~\377~\201~\377\271\274\271\377\334\337\334\377\276\301\276\377\276\301\276\377\276\301\276\377\314\317\314\377\306\311\306\377\303\307\304\377\303\307\304\377\302\305\303\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377"...) at sna_accel.c:3795 #7 0x00000000004f9aa2 in damagePutImage (pDrawable=0x3470fc0, pGC=0x35e4550, depth=24, x=0, y=0, w=<optimized out>, h=768, leftPad=0, format=2, pImage=0x7fae6777c000 ".2/\377~\201~\377~\201~\377\271\274\271\377\334\337\334\377\276\301\276\377\276\301\276\377\276\301\276\377\314\317\314\377\306\311\306\377\303\307\304\377\303\307\304\377\302\305\303\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377\305\310\305\377"...) at damage.c:795 #8 0x00000000004cc3bc in ProcShmPutImage (client=0x3357540) at shm.c:583 #9 0x00000000004cd335 in ProcShmDispatch (client=0x3357540) at shm.c:1108 #10 0x00000000004388c1 in Dispatch () at dispatch.c:428 #11 0x0000000000427965 in main (argc=9, argv=0x7fff13f160a8, envp=<optimized out>) at main.c:288
Hmm, that assert it to verify that if we a have pixmap exported through DRI2 (that we need to synchronize with before the next reply to the client) that it is in the list to be checked upon sending replies. To get to that assert without adding it to the flushing-list, we would have to have been passed an Pixmap that was already damaged and so in theory already in the flushing list. Can you reproduce this easily enough to record a debug=full Xorg.log? I have to hunt whereabouts in the earlier path is was either not added to the list as expected, or unexpectedly removed from that.
(In reply to comment #123) > Can you reproduce this easily enough to record a debug=full Xorg.log? Fairly harder to reproduce with debug full. But it happened eventually. http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old.xz It happens when I am constantly switching a flash video in firefox to fullscreen and back.
Thanks, commit b0c3170c1092d01b4937f352a3962854785ee549 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Jan 27 19:09:38 2013 +0000 sna: Add the pixmap to the flushing list when creating for inplace CPU writes Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We should be in the clear for the moment. Until the next time!
Not for that long ;). Setting VGA1 output to 1920x1080 gives me this: #3 0x00007f1b853c9cd2 in __GI___assert_fail (assertion=assertion@entry=0x7f1b8179f758 "bo->delta", file=file@entry=0x7f1b81798379 "kgem.c", line=line@entry=1713, function=function@entry=0x7f1b81799b40 <__PRETTY_FUNCTION__.22007> "__kgem_bo_destroy") at assert.c:103 #4 0x00007f1b816c3be5 in __kgem_bo_destroy (kgem=0x7f1b8725c210, bo=0x297b240) at kgem.c:1713 #5 0x00007f1b816c482b in _kgem_bo_destroy (kgem=kgem@entry=0x7f1b8725c210, bo=<optimized out>) at kgem.c:4031 #6 0x00007f1b81776a9c in kgem_bo_destroy (bo=<optimized out>, kgem=0x7f1b8725c210) at kgem.h:356 #7 _sna_dri_destroy_buffer (buffer=0x2c01a20, sna=0x7f1b8725c010) at sna_dri.c:430 #8 _sna_dri_destroy_buffer (sna=0x7f1b8725c010, buffer=0x2c01a20) at sna_dri.c:398 #9 0x00007f1b848d5ebf in update_dri2_drawable_buffers (pPriv=pPriv@entry=0x2179060, pDraw=pDraw@entry=0x2178e40, buffers=buffers@entry=0x2b14920, out_count=out_count@entry=2, width=width@entry=0x7fff34c47680, height=height@entry=0x7fff34c47688) at dri2.c:426 #10 0x00007f1b848d67ca in do_get_buffers (pDraw=<optimized out>, width=width@entry=0x7fff34c47680, height=height@entry=0x7fff34c47688, attachments=0x2a79664, attachments@entry=0x2a7965c, count=1, out_count=out_count@entry=0x7fff34c47690, has_format=has_format@entry=1) at dri2.c:536 #11 0x00007f1b848d69a0 in DRI2GetBuffersWithFormat (pDraw=<optimized out>, width=width@entry=0x7fff34c47680, height=height@entry=0x7fff34c47688, attachments=attachments@entry=0x2a7965c, count=<optimized out>, out_count=out_count@entry=0x7fff34c47690) at dri2.c:593 #12 0x00007f1b848d8068 in ProcDRI2GetBuffersWithFormat (client=0x2090900) at dri2ext.c:299 #13 ProcDRI2Dispatch (client=0x2090900) at dri2ext.c:564 #14 0x00000000004388c1 in Dispatch () at dispatch.c:428 #15 0x0000000000427965 in main (argc=9, argv=0x7fff34c47898, envp=<optimized out>) at main.c:288
I can not see how that can occur. I went through every place we touch either bo->scanout or bo->delta to see where the mixup may occur, and no clue. So please try with commit a31fd03bd4c87c48dc3ca15e3082e29348224b8c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jan 30 17:26:28 2013 +0000 sna: Add a bunch of assertions to make sure we do not misplace scanouts and --enable-debug=full. Sorry :(
(In reply to comment #128) > So please try with I'll try later. In the meantime I have another one with a31fd03: #3 0x00007f606bf0dcd2 in __GI___assert_fail (assertion=assertion@entry=0x7f60682db81e "!kgem_busy(kgem, bo->handle)", file=file@entry=0x7f60682db6f9 "kgem.c", line=line@entry=1492, function=function@entry=0x7f60682dce00 <__PRETTY_FUNCTION__.21968> "kgem_bo_move_to_inactive") at assert.c:103 #4 0x00007f6068206e99 in kgem_bo_move_to_inactive (bo=0x1e016d0, kgem=0x7f606dda0210) at kgem.c:1492 #5 __kgem_bo_destroy (kgem=0x7f606dda0210, bo=0x1e016d0) at kgem.c:1792 #6 0x00007f606820782b in _kgem_bo_destroy (kgem=kgem@entry=0x7f606dda0210, bo=<optimized out>) at kgem.c:4039 #7 0x00007f6068220394 in kgem_bo_destroy (bo=<optimized out>, kgem=0x7f606dda0210) at kgem.h:356 #8 sna_destroy_pixmap (pixmap=0x1d889e0) at sna_accel.c:1320 #9 sna_destroy_pixmap (pixmap=0x1d889e0) at sna_accel.c:1299 #10 0x00000000004f540f in damageDestroyPixmap (pPixmap=0x1d889e0) at damage.c:1556 #11 0x00007f6068d931e9 in XvDestroyPixmap (pPix=0x1d889e0) at xvmain.c:371 #12 0x00000000004ccfe8 in ShmDestroyPixmap (pPixmap=0x1d889e0) at shm.c:273 #13 0x00000000004598c2 in doFreeResource (res=0x13afb60, skip=0) at resource.c:549 #14 0x000000000045a023 in FreeResource (id=27583811, skipDeleteFuncType=skipDeleteFuncType@entry=0) at resource.c:579 #15 0x000000000043437e in ProcFreePixmap (client=0xed1900) at dispatch.c:1418 #16 0x00000000004388c1 in Dispatch () at dispatch.c:428 #17 0x0000000000427965 in main (argc=9, argv=0x7ffffa49b258, envp=<optimized out>) at main.c:288
(In reply to comment #129) > (In reply to comment #128) > > So please try with > > I'll try later. In the meantime I have another one with a31fd03: > #3 0x00007f606bf0dcd2 in __GI___assert_fail > (assertion=assertion@entry=0x7f60682db81e "!kgem_busy(kgem, bo->handle)", > file=file@entry=0x7f60682db6f9 "kgem.c", line=line@entry=1492, > function=function@entry=0x7f60682dce00 <__PRETTY_FUNCTION__.21968> > "kgem_bo_move_to_inactive") at assert.c:103 > #4 0x00007f6068206e99 in kgem_bo_move_to_inactive (bo=0x1e016d0, I believe I have fixed that with: commit 033f75e5bd94e226e719f87ed4e0091845384679 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jan 31 01:38:01 2013 +0000 sna: Stage retirement through the flushing list If the kernel replies that a bo is still busy, stage its retirement through the flushing list to be certain that we never stall on a subsequent write. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> It appears to be a consequence of the READ-READ optimisations.
> commit 033f75e5bd94e226e719f87ed4e0091845384679 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Jan 31 01:38:01 2013 +0000 > > sna: Stage retirement through the flushing list That very commit causes an instant death. 033f75e5b^ is OK. #3 0x00007fb8842d5cd2 in __GI___assert_fail (assertion=assertion@entry=0x7fb8806a3f71 "!bo->needs_flush", file=file@entry=0x7fb8806a3e39 "kgem.c", line=line@entry=413, function=function@entry=0x7fb8806a539d <__PRETTY_FUNCTION__.21647> "kgem_bo_retire") at assert.c:103 #4 0x00007fb8805d3705 in kgem_bo_retire (kgem=kgem@entry=0x7fb886168210, bo=bo@entry=0x2b42f70) at kgem.c:413 #5 0x00007fb8805d69fb in kgem_bo_sync__cpu (kgem=kgem@entry=0x7fb886168210, bo=bo@entry=0x2b42f70) at kgem.c:4721 #6 0x00007fb88061825f in write_boxes_inplace__tiled (kgem=kgem@entry=0x7fb886168210, src=src@entry=0x7fb87dcdd000 "", stride=stride@entry=1568, bpp=bpp@entry=32, src_dx=src_dx@entry=0, src_dy=src_dy@entry=0, bo=bo@entry=0x2b42f70, dst_dx=dst_dx@entry=0, dst_dy=dst_dy@entry=0, box=box@entry=0x7fff1365e9c0, n=n@entry=1) at sna_io.c:527 #7 0x00007fb88061a456 in write_boxes_inplace (n=1, box=0x7fff1365e9c0, dst_dy=0, dst_dx=0, bo=0x2b42f70, src_dy=0, src_dx=0, bpp=32, stride=1568, src=0x7fb87dcdd000, kgem=0x7fb886168210) at sna_io.c:552 #8 sna_write_boxes (sna=sna@entry=0x7fb886168010, dst=dst@entry=0x2b42ad0, dst_bo=0x2b42f70, dst_dx=dst_dx@entry=0, dst_dy=dst_dy@entry=0, src=0x7fb87dcdd000, stride=1568, src_dx=src_dx@entry=0, src_dy=src_dy@entry=0, box=box@entry=0x7fff1365e9c0, nbox=nbox@entry=1) at sna_io.c:652 #9 0x00007fb8805f867c in sna_copy_boxes (src=0x2b52150, dst=0x2ae2c90, gc=0x2ae34b0, region=0x7fff1365e9c0, dx=-487, dy=-614, bitplane=0, closure=0x0) at sna_accel.c:4485 #10 0x00007fb8805dd828 in sna_do_copy (src=0x2b52150, dst=0x2ae2c90, gc=0x2ae34b0, sx=sx@entry=4, sy=4, width=28, height=height@entry=28, dx=491, dx@entry=4, dy=618, dy@entry=4, copy=0x7fb8805f71f0 <sna_copy_boxes>, bitPlane=bitPlane@entry=0, closure=closure@entry=0x0) at sna_accel.c:4761 #11 0x00007fb8805dda85 in sna_copy_area (src=<optimized out>, dst=<optimized out>, gc=<optimized out>, src_x=4, src_y=<optimized out>, width=<optimized out>, height=28, dst_x=4, dst_y=4) at sna_accel.c:4839 #12 0x00000000004f985d in damageCopyArea (pSrc=0x2b52150, pDst=0x2ae2c90, pGC=0x2ae34b0, srcx=4, srcy=<optimized out>, width=28, height=28, dstx=4, dsty=4) at damage.c:824 #13 0x000000000043495b in ProcCopyArea (client=0x2a68e60) at dispatch.c:1623 #14 0x00000000004388c1 in Dispatch () at dispatch.c:428 #15 0x0000000000427965 in main (argc=9, argv=0x7fff1365ecf8, envp=<optimized out>) at main.c:288
Wow, I think that actually caught a subtle bug. commit 0a08de1f02577aef0da289108270c1b35e5d9703 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jan 31 16:39:47 2013 +0000 sna: After removing the bo from a batch, check whether it is still busy If we transfer a bo to the current batch, then subsequently discard it, we lose the information about its current active state. Try to recover this information, by querying the kernel and adding it to the flushing list if necessary. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I think I have the cause of the scanout confusion: commit 38376b56cfe0dfc603bce48e37432622ef9a0135 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jan 31 17:29:10 2013 +0000 sna: Remember to move scanouts to the scanout cache after retiring Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Thanks.
Still no cookies: #3 0x00007f2552a1ecd2 in __GI___assert_fail (assertion=assertion@entry=0x7f254eded15e "!bo->needs_flush", file=file@entry=0x7f254eded019 "kgem.c", line=line@entry=413, function=function@entry=0x7f254edee57d <__PRETTY_FUNCTION__.21647> "kgem_bo_retire") at assert.c:103 #4 0x00007f254ed1c8d5 in kgem_bo_retire (kgem=kgem@entry=0x7f25548b1210, bo=bo@entry=0x25b8f30) at kgem.c:413 #5 0x00007f254ed1fbcb in kgem_bo_sync__cpu (kgem=kgem@entry=0x7f25548b1210, bo=bo@entry=0x25b8f30) at kgem.c:4743 #6 0x00007f254ed6143f in write_boxes_inplace__tiled (kgem=kgem@entry=0x7f25548b1210, src=src@entry=0x7f254c402000 "", stride=stride@entry=1568, bpp=bpp@entry=32, src_dx=src_dx@entry=0, src_dy=src_dy@entry=0, bo=bo@entry=0x25b8f30, dst_dx=dst_dx@entry=0, dst_dy=dst_dy@entry=0, box=box@entry=0x7ffffa388d40, n=n@entry=1) at sna_io.c:527 #7 0x00007f254ed63636 in write_boxes_inplace (n=1, box=0x7ffffa388d40, dst_dy=0, dst_dx=0, bo=0x25b8f30, src_dy=0, src_dx=0, bpp=32, stride=1568, src=0x7f254c402000, kgem=0x7f25548b1210) at sna_io.c:552 #8 sna_write_boxes (sna=sna@entry=0x7f25548b1010, dst=dst@entry=0x25b8a90, dst_bo=0x25b8f30, dst_dx=dst_dx@entry=0, dst_dy=dst_dy@entry=0, src=0x7f254c402000, stride=1568, src_dx=src_dx@entry=0, src_dy=src_dy@entry=0, box=box@entry=0x7ffffa388d40, nbox=nbox@entry=1) at sna_io.c:652 #9 0x00007f254ed4185c in sna_copy_boxes (src=0x2582950, dst=0x2583a20, gc=0x25841f0, region=0x7ffffa388d40, dx=-487, dy=-614, bitplane=0, closure=0x0) at sna_accel.c:4485 #10 0x00007f254ed26a08 in sna_do_copy (src=0x2582950, dst=0x2583a20, gc=0x25841f0, sx=sx@entry=4, sy=4, width=28, height=height@entry=28, dx=491, dx@entry=4, dy=618, dy@entry=4, copy=0x7f254ed403d0 <sna_copy_boxes>, bitPlane=bitPlane@entry=0, closure=closure@entry=0x0) at sna_accel.c:4761 #11 0x00007f254ed26c65 in sna_copy_area (src=<optimized out>, dst=<optimized out>, gc=<optimized out>, src_x=4, src_y=<optimized out>, width=<optimized out>, height=28, dst_x=4, dst_y=4) at sna_accel.c:4839 #12 0x00000000004f985d in damageCopyArea (pSrc=0x2582950, pDst=0x2583a20, pGC=0x25841f0, srcx=4, srcy=<optimized out>, width=28, height=28, dstx=4, dsty=4) at damage.c:824 #13 0x000000000043495b in ProcCopyArea (client=0x24dee60) at dispatch.c:1623 #14 0x00000000004388c1 in Dispatch () at dispatch.c:428 #15 0x0000000000427965 in main (argc=9, argv=0x7ffffa389078, envp=<optimized out>) at main.c:288
How about: commit 1239e012ae6d4f00ce73f32d7244905a601170ea Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jan 31 19:18:17 2013 +0000 sna: Make sure the needs_flush is always accompanied by a tracking request
We should be safe for this very moment :).
I got this one on g33 with older commit 2.20.19-47-gde28027. Maybe this is fixed by some of the later patches (I updated now): #3 0x00007fae72bde392 in __GI___assert_fail (assertion=assertion@entry=0x7fae6f9c424c "!priv->mapped", file=file@entry=0x7fae6f9c40ce "sna_accel.c", line=line@entry=1942, function=function@entry=0x7fae6f9c5a80 <__PRETTY_FUNCTION__.24382> "sna_drawable_move_region_to_cpu") at assert.c:101 #4 0x00007fae6f90afc6 in sna_drawable_move_region_to_cpu (drawable=drawable@entry=0x1272740, region=<optimized out>, flags=6) at sna_accel.c:1942 #5 0x00007fae6f90b1f5 in sna_get_image (drawable=0x1272740, x=0, y=0, w=1, h=1, format=2, mask=4294967295, dst=0x172d210 "") at sna_accel.c:13474 #6 0x000000000057fb93 in miSpriteGetImage (pDrawable=0x1272740, sx=0, sy=0, w=1, h=1, format=2, planemask=4294967295, pdstLine=0x172d210 "") at misprite.c:414 #7 0x00000000004ca080 in compGetImage (pDrawable=0x1272740, sx=<optimized out>, sy=<optimized out>, w=<optimized out>, h=<optimized out>, format=<optimized out>, planemask=4294967295, pdstLine=0x172d210 "") at compinit.c:148 #8 0x000000000043a701 in DoGetImage (planemask=<optimized out>, height=1, width=<optimized out>, y=<optimized out>, x=0, drawable=<optimized out>, format=2, client=0x2295a80, im_return=<optimized out>) at dispatch.c:2127 #9 ProcGetImage (client=0x2295a80) at dispatch.c:2202 #10 0x000000000043d541 in Dispatch () at dispatch.c:428 #11 0x000000000042c06a in main (argc=7, argv=0x7fffc46ef398, envp=<optimized out>) at main.c:295
I believe that assert(priv->mapped) was addressed by: commit 04d48fee713e7bbc9cdf4f09855f6663a4bdc59f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jan 30 11:46:20 2013 +0000 sna: Fix errors found from asserts in a66c5f9ed51e As always many thanks for the reports, and sorry there have been so many!
g33 and 2.21.0-3-g9640640: #3 0x00007f3b6b446392 in __GI___assert_fail (assertion=0x7f3b6822c24c ">refcnt == 1", file=0x7f3b6822c0ce "riv->mapped", line=1942, function=0x7f3b6822da80 <__PRETTY_FUNCTION__.24507+16> "e_upload") at assert.c:101 #4 0x00007f3b68172fc6 in sna_get_image (drawable=0x1f17aa0, x=0, y=0, w=32602640, h=<optimized out>, format=32571, mask=1, dst=0x7f3b681731f5 <sna_get_image+997> "\300y\330A\366G\030\001t\321I\213W\bH\213L$p\213rdH\201\301\020\002") at sna_accel.c:13513 #5 0x0000000000000000 in ?? ()
Another one: #3 0x00007f5808315392 in __GI___assert_fail (assertion=assertion@entry=0x7f58050fa5e0 "(&bo->base)->size.pages.count == alloc", file=file@entry=0x7f58050f9219 "kgem.c", line=line@entry=5313, function=function@entry=0x7f58050fada0 <__PRETTY_FUNCTION__.24782> "kgem_create_buffer") at assert.c:101 #4 0x00007f580502cec7 in kgem_create_buffer (kgem=kgem@entry=0x7f580a1cb220, size=size@entry=4992, flags=flags@entry=1, ret=ret@entry=0x7fff19886cd8) at kgem.c:5313 #5 0x00007f580502cfff in kgem_create_buffer_2d (kgem=0x7f580a1cb220, width=width@entry=192, height=height@entry=25, bpp=bpp@entry=8, flags=flags@entry=1, ret=ret@entry=0x7fff19886cd8) at kgem.c:5358 #6 0x00007f580503b4e7 in sna_pixmap_create_upload (screen=screen@entry=0x232c280, width=width@entry=192, height=height@entry=25, depth=depth@entry=8, flags=flags@entry=1) at sna_accel.c:2970 #7 0x00007f580508353d in trapezoids_fallback (op=op@entry=3 '\003', src=src@entry=0x2eeda70, dst=dst@entry=0x335a980, maskFormat=maskFormat@entry=0x232cc68, xSrc=xSrc@entry=388, ySrc=ySrc@entry=691, ntrap=ntrap@entry=16, traps=traps@entry=0x33a8be0) at sna_trapezoids.c:2642 #8 0x00007f580508a8c3 in sna_composite_trapezoids (op=3 '\003', src=0x2eeda70, dst=0x335a980, maskFormat=0x232cc68, xSrc=<optimized out>, ySrc=<optimized out>, ntrap=16, traps=0x33a8be0) at sna_trapezoids.c:6594 #9 0x00000000005132cb in ProcRenderTrapezoids (client=0x275f690) at render.c:759 #10 0x000000000043d541 in Dispatch () at dispatch.c:428 #11 0x000000000042c06a in main (argc=7, argv=0x7fff198870c8, envp=<optimized out>) at main.c:295
Thanks for grabbing the clean stacktrace. Looks to have only been an assertion validation failure (at least one path failed to update the variable that was being checked, and which was only being used in the assertion and debug statement). commit a8cfddd280b5220f23565b21c91f3f7dd10bbe91 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Feb 5 22:06:03 2013 +0000 sna: Tidy buffer allocation size assertions Rather than perilously update a local variable with the allocated size, just use the size of the bo in the assertion that is large enough to satisfy the allocation request. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This happened with 2.21.0-14-ga8cfddd, looks like it is fixed with 2.21.2-1-g301aebf (I'm not sure how to reproduce)? #3 0x00007ff4fbe34392 in __GI___assert_fail (assertion=assertion@entry=0x7ff4f8c19538 "(&bo->base)->size.pages.count >= alloc", file=file@entry=0x7ff4f8c18199 "kgem.c", line=line@entry=5307, function=function@entry=0x7ff4f8c19d00 <__PRETTY_FUNCTION__.24781> "kgem_create_buffer") at assert.c:101 #4 0x00007ff4f8b4bf1f in kgem_create_buffer (kgem=kgem@entry=0x7ff4fdcea220, size=size@entry=1936, flags=flags@entry=3, ret=ret@entry=0x7fffc629d528) at kgem.c:5307 #5 0x00007ff4f8b4c04f in kgem_create_buffer_2d (kgem=kgem@entry=0x7ff4fdcea220, width=width@entry=44, height=height@entry=44, bpp=bpp@entry=8, flags=flags@entry=3, ret=ret@entry=0x7fffc629d528) at kgem.c:5353 #6 0x00007ff4f8b4c1de in kgem_upload_source_image (kgem=kgem@entry=0x7ff4fdcea220, data=0x1f24460, box=box@entry=0x7fffc629d580, stride=<optimized out>, bpp=8) at kgem.c:5405 #7 0x00007ff4f8b93437 in upload (box=0x7fffc629d580, pixmap=0x1f243e0, channel=0x7fffc629d890, sna=0x7ff4fdcea010) at sna_render.c:527 #8 sna_render_pixmap_bo (sna=sna@entry=0x7ff4fdcea010, channel=channel@entry=0x7fffc629d890, pixmap=0x1f243e0, x=x@entry=0, y=y@entry=0, w=<optimized out>, w@entry=44, h=<optimized out>, h@entry=44, dst_x=dst_x@entry=1426, dst_y=dst_y@entry=36) at sna_render.c:659 #9 0x00007ff4f8bc183b in gen3_composite_picture (sna=sna@entry=0x7ff4fdcea010, picture=picture@entry=0x1e8e940, op=op@entry=0x7fffc629d7d0, channel=channel@entry=0x7fffc629d890, x=<optimized out>, y=<optimized out>, w=w@entry=44, h=h@entry=44, dst_x=dst_x@entry=1426, dst_y=dst_y@entry=36, precise=true) at gen3_render.c:2642 #10 0x00007ff4f8bc7a93 in gen3_render_composite (tmp=0x7fffc629d7d0, height=44, width=44, dst_y=36, dst_x=1426, mask_y=0, mask_x=0, src_y=36, src_x=-10200, dst=0x179e1b0, mask=0x1e8e940, src=0x1f1f9e0, op=3 '\003', sna=0x7ff4fdcea010) at gen3_render.c:3105 #11 gen3_render_composite (sna=0x7ff4fdcea010, op=3 '\003', src=0x1f1f9e0, mask=0x1e8e940, dst=0x179e1b0, src_x=-10200, src_y=36, mask_x=0, mask_y=0, dst_x=1426, dst_y=36, width=44, height=44, tmp=0x7fffc629d7d0) at gen3_render.c:3010 #12 0x00007ff4f8b7c30a in sna_composite (op=3 '\003', src=0x1f1f9e0, mask=0x1e8e940, dst=0x179e1b0, src_x=1426, src_y=36, mask_x=0, mask_y=0, dst_x=1426, dst_y=36, width=44, height=44) at sna_composite.c:621 #13 0x0000000000519779 in damageComposite (op=3 '\003', pSrc=0x1f1f9e0, pMask=0x1e8e940, pDst=0x179e1b0, xSrc=1426, ySrc=36, xMask=0, yMask=0, xDst=1426, yDst=36, width=44, height=44) at damage.c:559 #14 0x00000000005126f4 in ProcRenderComposite (client=0x1a484d0) at render.c:708 #15 0x000000000043d541 in Dispatch () at dispatch.c:428 #16 0x000000000042c06a in main (argc=7, argv=0x7fffc629dc88, envp=<optimized out>) at main.c:295
(In reply to comment #142) > This happened with 2.21.0-14-ga8cfddd, looks like it is fixed with > 2.21.2-1-g301aebf (I'm not sure how to reproduce)? > > #3 0x00007ff4fbe34392 in __GI___assert_fail > (assertion=assertion@entry=0x7ff4f8c19538 "(&bo->base)->size.pages.count >= > alloc", file=file@entry=0x7ff4f8c18199 "kgem.c", line=line@entry=5307, > function=function@entry=0x7ff4f8c19d00 <__PRETTY_FUNCTION__.24781> > "kgem_create_buffer") at assert.c:101 > out>) at main.c:295 commit f4cff22afae598f41adf36cd149223d1f7dd6b6e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Feb 6 15:15:36 2013 +0000 sna: Relax the buffer size assertion to only be larger than required Not all paths request alloc pages, a few just request sufficient pages for the original size. So we can only assert that condition is satisfied. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Ok, a new one with 2.21.2-1-g301aebf: #3 0x00007f1b93a6f392 in __GI___assert_fail (assertion=assertion@entry=0x7f1b9022a8cd "bo->flush", file=file@entry=0x7f1b9022c42e "sna_accel.c", line=line@entry=1275, function=function@entry=0x7f1b9022dc70 <__PRETTY_FUNCTION__.24309> "sna_add_flush_pixmap") at assert.c:101 #4 0x00007f1b9016b631 in sna_add_flush_pixmap (sna=<optimized out>, priv=<optimized out>, bo=<optimized out>) at sna_accel.c:1275 #5 0x00007f1b90170fef in sna_drawable_move_region_to_cpu (drawable=drawable@entry=0x19c8e20, region=region@entry=0x7fffe7ef0bf0, flags=<optimized out>, flags@entry=1) at sna_accel.c:2268 #6 0x00007f1b9017dac2 in sna_put_zpixmap_blt (stride=600, bits=0x7f1b8d00c000 "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., h=76, w=150, y=0, x=0, region=0x7fffe7ef0bf0, drawable=0x285b640, gc=<optimized out>) at sna_accel.c:3437 #7 sna_put_image (drawable=0x285b640, gc=0x28686f0, depth=24, x=0, y=0, w=150, h=76, left=0, format=2, bits=0x7f1b8d00c000 "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"...) at sna_accel.c:3815 #8 0x000000000051cac2 in damagePutImage (pDrawable=0x285b640, pGC=0x28686f0, depth=24, x=0, y=0, w=<optimized out>, h=76, leftPad=0, format=2, pImage=0x7f1b8d00c000 "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"...) at damage.c:792 #9 0x00000000004d9c0d in ProcShmPutImage (client=0x2261c90) at shm.c:584 #10 0x00000000004daaa5 in ProcShmDispatch (client=0x2261c90) at shm.c:1115 #11 0x000000000043d541 in Dispatch () at dispatch.c:428 #12 0x000000000042c06a in main (argc=7, argv=0x7fffe7ef1048, envp=<optimized out>) at main.c:295
Hmm, not sure just how that can happen. The flush flag there is used to track a bo exported through DRI, so that's where I suspect I lost track of the priv->flush/bo->flush: commit 696347a6260f3741ad3e6238db0fd285384bb753 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 11 09:58:58 2013 +0000 sna/dri: Only clear the GPU bo flush flag when the last export is destroyed Add various assertions around DRI to be sure that we don't change the GPU bo for a pixmap that does not match our flush bookkeeping. Then be more careful during the DRI2Buffer destroy that we don't accidentally remove the flush flag from a shared bo. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(In reply to comment #145) > Hmm, not sure just how that can happen. So the newly added assert: #3 0x00007f27b8ae5392 in __GI___assert_fail (assertion=assertion@entry=0x7f27b52ac996 "priv->gpu_bo->flush", file=file@entry=0x7f27b52ac8a0 "sna_dri.c", line=line@entry=181, function=function@entry=0x7f27b52ad3a0 <__PRETTY_FUNCTION__.22928> "sna_pixmap_set_dri") at assert.c:101 #4 0x00007f27b52806b8 in sna_pixmap_set_dri (pixmap=0x281e060, sna=0x7f27ba99b010) at sna_dri.c:181 #5 sna_dri_create_buffer (draw=<optimized out>, attachment=0, format=32) at sna_dri.c:283 #6 0x00000000005608fc in create_buffer (format=32, attachment=0, pDraw=0x2c9a9e0) at dri2.c:448 #7 allocate_or_reuse_buffer (pDraw=pDraw@entry=0x2c9a9e0, pPriv=pPriv@entry=0x2d2a6c0, attachment=attachment@entry=0, format=format@entry=32, dimensions_match=dimensions_match@entry=1, buffer=buffer@entry=0x2d2a7a0, ds=<optimized out>) at dri2.c:495 #8 0x0000000000561296 in do_get_buffers (pDraw=0x2c9a9e0, width=width@entry=0x7fff2ae34520, height=height@entry=0x7fff2ae34528, attachments=0x2c2c424, attachments@entry=0x2c2c41c, count=1, out_count=out_count@entry=0x7fff2ae34530, has_format=has_format@entry=1) at dri2.c:573 #9 0x00000000005616b0 in DRI2GetBuffersWithFormat (pDraw=<optimized out>, width=width@entry=0x7fff2ae34520, height=height@entry=0x7fff2ae34528, attachments=attachments@entry=0x2c2c41c, count=<optimized out>, out_count=out_count@entry=0x7fff2ae34530) at dri2.c:690 #10 0x0000000000563130 in ProcDRI2GetBuffersWithFormat (client=0x2cff4a0) at dri2ext.c:306 #11 ProcDRI2Dispatch (client=0x2cff4a0) at dri2ext.c:608 #12 0x000000000043d541 in Dispatch () at dispatch.c:428 #13 0x000000000042c06a in main (argc=7, argv=0x7fff2ae34738, envp=<optimized out>) at main.c:295
Still looking for clues, but I realised that since we now cache the DRI2Buffer for the pixmap, this whole code path is now rather fishy: commit 58b61bcd02467d2b08b2dd194ab6884437b83fa6 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 11 11:05:02 2013 +0000 sna/dri: Tighten the assertions that we have one DRI2Buffer per Pixmap
#3 0x00007f0bc8c14392 in __GI___assert_fail (assertion=assertion@entry=0x7f0bc53dbaf8 "sna_pixmap_get_buffer(pixmap) == buffer", file=file@entry=0x7f0bc53db980 "sna_dri.c", line=line@entry=378, function=function@entry=0x7f0bc53dc4a0 <__PRETTY_FUNCTION__.22960> "sna_dri_create_buffer") at assert.c:101 #4 0x00007f0bc53aef5c in sna_dri_create_buffer (draw=<optimized out>, attachment=<optimized out>, format=32) at sna_dri.c:378 #5 0x00000000005608fc in create_buffer (format=32, attachment=0, pDraw=0x1d81cd0) at dri2.c:448 #6 allocate_or_reuse_buffer (pDraw=pDraw@entry=0x1d81cd0, pPriv=pPriv@entry=0x14b08a0, attachment=attachment@entry=0, format=format@entry=32, dimensions_match=dimensions_match@entry=1, buffer=buffer@entry=0x14b09b0, ds=<optimized out>) at dri2.c:495 #7 0x0000000000561296 in do_get_buffers (pDraw=0x1d81cd0, width=width@entry=0x7fff218ba6e0, height=height@entry=0x7fff218ba6e8, attachments=0x186fe98, attachments@entry=0x186fe90, count=1, out_count=out_count@entry=0x7fff218ba6f0, has_format=has_format@entry=1) at dri2.c:573 #8 0x00000000005616b0 in DRI2GetBuffersWithFormat (pDraw=<optimized out>, width=width@entry=0x7fff218ba6e0, height=height@entry=0x7fff218ba6e8, attachments=attachments@entry=0x186fe90, count=<optimized out>, out_count=out_count@entry=0x7fff218ba6f0) at dri2.c:690 #9 0x0000000000563130 in ProcDRI2GetBuffersWithFormat (client=0x184de00) at dri2ext.c:306 #10 ProcDRI2Dispatch (client=0x184de00) at dri2ext.c:608 #11 0x000000000043d541 in Dispatch () at dispatch.c:428 #12 0x000000000042c06a in main (argc=7, argv=0x7fff218ba8f8, envp=<optimized out>) at main.c:295
Ah hem. Which version of Xorg are you using?
Can you grab a disassembly of sna_dri_create_buffer()? In particular, the assert that fails here is: sna_pixmap_set_buffer(pixmap, buffer); assert(sna_pixmap_get_buffer(pixmap) == buffer); which should be equivalent to : ((void **)dixGetPrivateAddr(&pixmap->devPrivates, &sna_pixmap_key))[2] = buffer; assert(((void **)dixGetPrivateAddr(&pixmap->devPrivates, *sna_pixmap_key))[2] == buffer); For that to fail is rather worrying.
(In reply to comment #150) > Can you grab a disassembly of sna_dri_create_buffer()? > > In particular, the assert that fails here is: > > sna_pixmap_set_buffer(pixmap, buffer); > assert(sna_pixmap_get_buffer(pixmap) == buffer); > > which should be equivalent to : > > ((void **)dixGetPrivateAddr(&pixmap->devPrivates, &sna_pixmap_key))[2] = > buffer; > assert(((void **)dixGetPrivateAddr(&pixmap->devPrivates, > *sna_pixmap_key))[2] == buffer); > > For that to fail is rather worrying. I think you overestimated attribute(const) on sna_pixmap_get_buffer. GCC removes the second call to sna_pixmap_get_buffer because the `pixmap' parameter is the same as for the first one, so it evaluated the call to be superfluous...
(In reply to comment #151) > I think you overestimated attribute(const) on sna_pixmap_get_buffer. GCC > removes the second call to sna_pixmap_get_buffer because the `pixmap' > parameter is the same as for the first one, so it evaluated the call to be > superfluous... No more crashes now. --- a/src/sna/sna_dri.c +++ b/src/sna/sna_dri.c @@ -189,7 +189,7 @@ static struct kgem_bo *sna_pixmap_set_dri(struct sna *sna, return priv->gpu_bo; } -constant static inline void *sna_pixmap_get_buffer(PixmapPtr pixmap) +static inline void *sna_pixmap_get_buffer(PixmapPtr pixmap) { assert(pixmap->refcnt); return ((void **)__get_private(pixmap, sna_pixmap_key))[2];
const: Many functions do not examine any values except their arguments, and have no effects except the return value. Basically this is just slightly more strict class than the pure attribute below, since function is not allowed to read global memory.
commit 0b116a1d023ad7cc148d91f268dbf71452dbd9dd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 11 12:36:43 2013 +0000 sna: Fix inaccurate use of __attribute__((const)) 'const' is only allowed to use the function parameters and not allowed to access global memory - that includes not allowed to deference its arguments... Thanks to Jiri Slaby for spotting my mistake.
Bah, a new one: #2 0x00007f30bf8ed2e2 in __assert_fail_base (fmt=0x7f30bfa2a7f0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f30bc0a8ade "bo->flush", file=file@entry=0x7f30bc0b49e0 "sna_dri.c", line=line@entry=494, function=function@entry=0x7f30bc0b5282 <__PRETTY_FUNCTION__.23004> "set_bo") at assert.c:92 #3 0x00007f30bf8ed392 in __GI___assert_fail (assertion=assertion@entry=0x7f30bc0a8ade "bo->flush", file=file@entry=0x7f30bc0b49e0 "sna_dri.c", line=line@entry=494, function=function@entry=0x7f30bc0b5282 <__PRETTY_FUNCTION__.23004> "set_bo") at assert.c:101 #4 0x00007f30bc0879b7 in set_bo (pixmap=0x2662060, bo=bo@entry=0x3766cc0) at sna_dri.c:494 #5 0x00007f30bc087c08 in sna_dri_page_flip (sna=sna@entry=0x7f30c17a3010, info=0x3057c50) at sna_dri.c:1108 #6 0x00007f30bc08919d in sna_dri_schedule_flip (data=0x2a07580, func=0x7f30bc315220 <__glXdriSwapEvent>, remainder=0, divisor=<optimized out>, target_msc=0x7fff8b934878, pipe=0, back=0x3ebdb90, front=0x3aca110, draw=0x2a6ea70, client=0x36bc5b0) at sna_dri.c:1940 #7 sna_dri_schedule_swap (client=0x36bc5b0, draw=0x2a6ea70, front=0x3aca110, back=0x3ebdb90, target_msc=0x7fff8b934878, divisor=<optimized out>, remainder=0, func=0x7f30bc315220 <__glXdriSwapEvent>, data=0x2a07580) at sna_dri.c:2095 #8 0x0000000000561cc4 in DRI2SwapBuffers (client=client@entry=0x36bc5b0, pDraw=0x2a6ea70, target_msc=target_msc@entry=0, divisor=divisor@entry=0, remainder=remainder@entry=0, swap_target=swap_target@entry=0x7fff8b934878, func=func@entry=0x7f30bc315220 <__glXdriSwapEvent>, data=data@entry=0x2a07580) at dri2.c:1178 #9 0x00007f30bc315206 in __glXDRIdrawableSwapBuffers (client=0x36bc5b0, drawable=0x2a07580) at glxdri2.c:239 #10 0x00007f30bc3088f2 in __glXDisp_SwapBuffers (cl=0x36bc6f0, pc=<optimized out>) at glxcmds.c:1653 #11 0x00007f30bc30b863 in __glXDispatch (client=<optimized out>) at glxext.c:581 #12 0x000000000043d541 in Dispatch () at dispatch.c:428 #13 0x000000000042c06a in main (argc=7, argv=0x7fff8b934ae8, envp=<optimized out>) at main.c:295
(In reply to comment #155) > Bah, a new one: > > #2 0x00007f30bf8ed2e2 in __assert_fail_base (fmt=0x7f30bfa2a7f0 "%s%s%s:%u: > %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f30bc0a8ade > "bo->flush", file=file@entry=0x7f30bc0b49e0 "sna_dri.c", > line=line@entry=494, function=function@entry=0x7f30bc0b5282 > <__PRETTY_FUNCTION__.23004> "set_bo") at assert.c:92 > #3 0x00007f30bf8ed392 in __GI___assert_fail > (assertion=assertion@entry=0x7f30bc0a8ade "bo->flush", > file=file@entry=0x7f30bc0b49e0 "sna_dri.c", line=line@entry=494, > function=function@entry=0x7f30bc0b5282 <__PRETTY_FUNCTION__.23004> "set_bo") > at assert.c:101 > #4 0x00007f30bc0879b7 in set_bo (pixmap=0x2662060, bo=bo@entry=0x3766cc0) > at sna_dri.c:494 > #5 0x00007f30bc087c08 in sna_dri_page_flip (sna=sna@entry=0x7f30c17a3010, > info=0x3057c50) at sna_dri.c:1108 > #6 0x00007f30bc08919d in sna_dri_schedule_flip (data=0x2a07580, > func=0x7f30bc315220 <__glXdriSwapEvent>, remainder=0, divisor=<optimized > out>, target_msc=0x7fff8b934878, pipe=0, back=0x3ebdb90, front=0x3aca110, That looks like the culprit we set out to find this morning. Hmm, I wonder if info->cache has a use-after-free problem.
No epiphany yet, though I've thrown in a few more assertions around one suspect.
I've pushed some more assertions to try and catch this even early. Please do give them a run and let me know what happens.
Any further sightings?
(In reply to comment #159) > Any further sightings? Not at all. It has not crashed since then yet.
Pretty sure I've caught the assertion failures with: commit 1f16d854264ea923303b79379266bd789fd9dd4d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 18 14:30:55 2013 +0000 sna/dri: Prevent swapping a decoupled DRI2Buffer If the DRI2Buffer is no longer valid for the Drawable, for example the window had just been reparent, just complete the swap without triggering any assertions.
An update yesterday (2.21.3-90-g93ecd56), and boom. sna: Assert that inplace operations are done on coherent maps -- they are not ;) #3 0x00007faac4116392 in __GI___assert_fail (assertion=assertion@entry=0x7faac08d3410 "!((uintptr_t)(priv->gpu_bo->map) & 1) || sna->kgem.has_llc", file=file@entry=0x7faac08d248e "sna_accel.c", line=line@entry=1567, function=function@entry=0x7faac08d3e60 <__PRETTY_FUNCTION__.24361> "_sna_pixmap_move_to_cpu") at assert.c:101 #4 0x00007faac080cfab in _sna_pixmap_move_to_cpu (pixmap=pixmap@entry=0x2a80140, flags=flags@entry=5) at sna_accel.c:1567 #5 0x00007faac080e6a0 in sna_drawable_move_region_to_cpu (drawable=drawable@entry=0x2a80140, region=region@entry=0x7fff2562fc40, flags=5, flags@entry=1) at sna_accel.c:1969 #6 0x00007faac081b882 in sna_put_zpixmap_blt (stride=8512, bits=0x7faaacfed000 "\025)H", h=1416, w=2128, y=0, x=0, region=0x7fff2562fc40, drawable=0x2a80140, gc=<optimized out>) at sna_accel.c:3482 #7 sna_put_image (drawable=0x2a80140, gc=0x34879a0, depth=24, x=0, y=0, w=2128, h=1416, left=0, format=2, bits=0x7faaacfed000 "\025)H") at sna_accel.c:3860 #8 0x000000000051cac2 in damagePutImage (pDrawable=0x2a80140, pGC=0x34879a0, depth=24, x=0, y=0, w=<optimized out>, h=1416, leftPad=0, format=2, pImage=0x7faaacfed000 "\025)H") at damage.c:792 #9 0x00000000004d9c0d in ProcShmPutImage (client=0x336f360) at shm.c:584 #10 0x00000000004daaa5 in ProcShmDispatch (client=0x336f360) at shm.c:1115 #11 0x000000000043d541 in Dispatch () at dispatch.c:428 #12 0x000000000042c06a in main (argc=7, argv=0x7fff25630098, envp=<optimized out>) at main.c:295
Hmm, I hope that is just a lax assert.... commit 25a63b32c7b549ffb3c7f14de4bf2731d044eb39 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Mar 11 10:42:08 2013 +0000 sna: Tighten checking for coherent maps
A quite day, I guess it was just a lax assert and not the tip of an iceberg. I had cold hopes.
I'm preparing for a presentation and I also tried to connect an external VGA to try if it works. Again, I switched a resolution and boom. I will present with UXA like the last time ;). This was 2.21.3-69-g4be7253. #3 0x00007fade04cd392 in __GI___assert_fail (assertion=assertion@entry=0x7faddd2bd778 "sna_pixmap_from_drawable(draw)->flush", file=file@entry=0x7faddd2bd380 "sna_dri.c", line=line@entry=926, function=function@entry=0x7faddd2be0d0 <__PRETTY_FUNCTION__.22751> "sna_dri_copy_region") at assert.c:101 #4 0x00007faddd28fafe in sna_dri_copy_region (draw=0x10e2de0, region=0x22997a0, dst_buffer=0x10e3b80, src_buffer=<optimized out>) at sna_dri.c:926 #5 0x0000000000560a2f in dri2_copy_region (pDraw=0x10e2de0, pRegion=0x22997a0, pDest=0x10e3b80, pSrc=0x10e3b30) at dri2.c:890 #6 0x000000000056118b in DRI2CopyRegion (pDraw=0x10e2de0, pRegion=0x22997a0, dest=0, src=1) at dri2.c:931 #7 0x000000000056338b in ProcDRI2CopyRegion (client=0x104ed10) at dri2ext.c:329 #8 ProcDRI2Dispatch (client=0x104ed10) at dri2ext.c:606 #9 0x000000000043d541 in Dispatch () at dispatch.c:428 #10 0x000000000042c06a in main (argc=9, argv=0x7fff6744c508, envp=<optimized out>) at main.c:295
Hit the same problem earlier with reconsfiguring displays: commit c5b901a635a9c8c74017682d17cfcd93031907b4 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Mar 16 10:54:51 2013 +0000 sna/dri: Clear flush flag upon bo destroy
(In reply to comment #166) > Hit the same problem earlier with reconsfiguring displays: > > commit c5b901a635a9c8c74017682d17cfcd93031907b4 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Sat Mar 16 10:54:51 2013 +0000 > > sna/dri: Clear flush flag upon bo destroy The current HEAD does not work for me at all. The windows are not redrawn -- when I write to xterm, the letters don't appear for example. Until I turn off compositing in KDE4.
Hmm. The addition tests added to sna_dri.c were to check that the client had called GetBuffers after a reconfiguration event i.e. to reject an attempt to copy the backbuffer from one pixmap to the frontbuffer of another. The importance of that other than chasing stale pointers, is that is makes sure that all the auxiliary state is consistent. If I am correct, then it is possible that kwin is missing a InvalidateNotify and not updating its set of buffers for a Drawable. Can you please refresh the Xorg.0.log so that I know what versions are hitting the issue?
Created attachment 76654 [details] xorg.log ivb
Just to be certain. switching back to HEAD^ restores kwin behaviour?
(In reply to comment #170) > Just to be certain. switching back to HEAD^ restores kwin behaviour? Not at all. But 2.21.4 works flawlessly.
Hopefully just pushed a fix for the fix. Whoops indeed.
(In reply to comment #172) > Hopefully just pushed a fix for the fix. Whoops indeed. Yeah, it indeed fixes it. Thx.
Today's update to 2.21.6 and boom: #3 0x00007f0267651392 in __GI___assert_fail (assertion=assertion@entry=0x7f02644370f8 "priv->cpu == 0 || (priv->mapped && ((uintptr_t)(priv->gpu_bo->map) & 1))", file=file@entry=0x7f026443625e "sna_accel.c", line=line@entry=3299, function=function@entry=0x7f0264438070 <__PRETTY_FUNCTION__.24534> "sna_pixmap_move_to_gpu") at assert.c:101 #4 0x00007f026436a46c in sna_pixmap_move_to_gpu (pixmap=pixmap@entry=0x20b99f0, flags=flags@entry=82) at sna_accel.c:3299 #5 0x00007f02643a3a4d in move_to_gpu (pixmap=pixmap@entry=0x20b99f0, box=box@entry=0x7fff396a3810, blt=blt@entry=false) at sna_render.c:501 #6 0x00007f02643a4010 in __sna_render_pixmap_bo (sna=sna@entry=0x7f0269507010, pixmap=pixmap@entry=0x20b99f0, box=box@entry=0x7fff396a3810, blt=blt@entry=false) at sna_render.c:576 #7 0x00007f02643a4225 in sna_render_pixmap_bo (sna=sna@entry=0x7f0269507010, channel=channel@entry=0x7fff396a3ac8, pixmap=pixmap@entry=0x20b99f0, x=x@entry=0, y=y@entry=0, w=<optimized out>, w@entry=16, h=<optimized out>, h@entry=16, dst_x=dst_x@entry=33, dst_y=dst_y@entry=255) at sna_render.c:671 #8 0x00007f0264407a9d in gen7_composite_picture (sna=sna@entry=0x7f0269507010, picture=picture@entry=0x2208980, channel=channel@entry=0x7fff396a3ac8, x=x@entry=0, y=y@entry=0, w=w@entry=16, h=h@entry=16, dst_x=dst_x@entry=33, dst_y=dst_y@entry=255, precise=true) at gen7_render.c:1952 #9 0x00007f026440f3b0 in gen7_render_composite (tmp=0x7fff396a3a70, height=16, width=16, dst_y=255, dst_x=33, msk_y=0, msk_x=0, src_y=0, src_x=15048, dst=0x2032240, mask=0x0, src=0x2208980, op=3 '\003', sna=0x7f0269507010) at gen7_render.c:2360 #10 gen7_render_composite (sna=0x7f0269507010, op=<optimized out>, src=0x2208980, mask=0x0, dst=0x2032240, src_x=15048, src_y=0, msk_x=0, msk_y=0, dst_x=33, dst_y=255, width=16, height=16, tmp=0x7fff396a3a70) at gen7_render.c:2315 #11 0x00007f026438b85a in sna_composite (op=3 '\003', src=0x2208980, mask=0x0, dst=0x2032240, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=33, dst_y=255, width=16, height=16) at sna_composite.c:638 #12 0x0000000000519779 in damageComposite (op=3 '\003', pSrc=0x2208980, pMask=0x0, pDst=0x2032240, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=33, yDst=255, width=16, height=16) at damage.c:559 #13 0x00000000005126f4 in ProcRenderComposite (client=0x1bd3660) at render.c:708 #14 0x000000000043d541 in Dispatch () at dispatch.c:428 #15 0x000000000042c06a in main (argc=9, argv=0x7fff396a3f28, envp=<optimized out>) at main.c:295
It's a tricky one - I think harmless as it is only reporting that a hint is stale - but it means that an earlier action flagged the pixmap as being last operated on using the CPU yet the damage is not consistent with such an event. If you can reproduce it easily under --enable-debug=full, I would like to see that log.
Created attachment 77581 [details] xorg.log with full debug (In reply to comment #175) > If you can reproduce it easily under --enable-debug=full, I would like to > see that log. Here you go.
Ah. commit 86efddd9e4b921fcfc1a4b7492ba9174b84de64c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Apr 8 10:54:56 2013 +0100 sna: Remove assertions for mapped GPU bo if priv->cpu after GPU bo creation The condition that we have a mapped GPU bo iff priv->cpu is only true if we have a GPU bo at the time of using the CPU -- if we create the GPU bo during move_to_gpu, then that assertion is bogus.
It's gone now.
Update of my desktop today to 2.21.6-9-gafe6128: #3 0x00007f5e7dc61382 in __GI___assert_fail (assertion=assertion@entry=0x7f5e7a421480 "priv->gpu_damage == ((void *)0)", file=file@entry=0x7f5e7a420c3e "sna_accel.c", line=line@entry=1795, function=function@entry=0x7f5e7a422760 <__PRETTY_FUNCTION__.24375> "_sna_pixmap_move_to_cpu") at assert.c:101 #4 0x00007f5e7a3585a6 in _sna_pixmap_move_to_cpu (pixmap=0x193aa80, flags=2) at sna_accel.c:1795 #5 0x00007f5e7a35daab in sna_pixmap_move_to_cpu (flags=2, pixmap=<optimized out>) at sna.h:475 #6 sna_drawable_move_to_cpu (drawable=<optimized out>, flags=flags@entry=2) at sna_accel.c:2374 #7 0x00007f5e7a376b8b in sna_composite (op=1 '\001', src=0x1cfe800, mask=0x1d015d0, dst=0x1cfa2d0, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=0, dst_y=0, width=21, height=21) at sna_composite.c:693 #8 0x0000000000519779 in damageComposite (op=1 '\001', pSrc=0x1cfe800, pMask=0x1d015d0, pDst=0x1cfa2d0, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=21, height=21) at damage.c:559 #9 0x00007f5e7a39e387 in trapezoid_mask_converter (op=op@entry=1 '\001', src=src@entry=0x1cfe800, dst=dst@entry=0x1cfa2d0, maskFormat=<optimized out>, src_x=src_x@entry=10, src_y=src_y@entry=0, ntrap=ntrap@entry=16, traps=traps@entry=0x1cf22f0) at sna_trapezoids.c:4872 #10 0x00007f5e7a3a80d2 in sna_composite_trapezoids (op=1 '\001', src=0x1cfe800, dst=0x1cfa2d0, maskFormat=<optimized out>, xSrc=<optimized out>, ySrc=<optimized out>, ntrap=16, traps=0x1cf22f0) at sna_trapezoids.c:6480 #11 0x00000000005132cb in ProcRenderTrapezoids (client=0x1bac6d0) at render.c:759 #12 0x000000000043d541 in Dispatch () at dispatch.c:428 #13 0x000000000042c06a in main (argc=7, argv=0x7fff10cc64b8, envp=<optimized out>) at main.c:295
Ok, forgot about one type of buffer that uses both CPU/GPU all-damage, so refined the assertion. (It's slightly disappointing that you end up in that path though, I would have hoped it spotted that it was going to operate in place and use a faster path.) commit aff14a23f98b1230a9053a1b8434f5cc9f766d14 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Apr 12 09:49:47 2013 +0100 sna: Refine assertion about the existence of CPU damage when GPU damaged One particular buffer type is quite cunning and simultaneously sets both CPU/GPU as all damaged (the upload buffer) so be aware when making bold assertions. Hopefully that silences the assertion.
(In reply to comment #180) > Hopefully that silences the assertion. Yeah.
You haven't heard of me lately. Everything has changed after today's update to 2.21.7-5-g87295b1 :): #3 0x00007f52d7dee382 in __GI___assert_fail (assertion=assertion@entry=0x7f52d45af070 "(((uintptr_t)(src_priv->gpu_damage))&1)", file=file@entry=0x7f52d45ad99e "sna_accel.c", line=line@entry=4632, function=function@entry=0x7f52d45affb4 <__PRETTY_FUNCTION__.26304> "sna_copy_boxes") at assert.c:101 #4 0x00007f52d44f30f2 in sna_copy_boxes (src=0x1aeaf80, dst=0x1aea330, gc=0x1b03370, region=0x7fff82e13e10, dx=0, dy=0, bitplane=0, closure=0x0) at sna_accel.c:4632 #5 0x00007f52d44d5278 in sna_do_copy (src=src@entry=0x1aeaf80, dst=dst@entry=0x1aea330, gc=gc@entry=0x1b03370, sx=sx@entry=0, sy=sy@entry=0, width=width@entry=400, height=height@entry=225, dx=dx@entry=0, dy=dy@entry=0, copy=0x7f52d44f1300 <sna_copy_boxes>, bitPlane=bitPlane@entry=0, closure=closure@entry=0x0) at sna_accel.c:5184 #6 0x00007f52d44d54ff in sna_copy_area (src=0x1aeaf80, dst=0x1aea330, gc=0x1b03370, src_x=0, src_y=0, width=400, height=225, dst_x=0, dst_y=0) at sna_accel.c:5262 #7 0x000000000043953b in ProcCopyArea (client=0x151df30) at dispatch.c:1626 #8 0x000000000043d5a6 in Dispatch () at dispatch.c:432 #9 0x000000000042bf8a in main (argc=7, argv=0x7fff82e140f8, envp=<optimized out>) at main.c:295
Hmm. Try assert(src_priv->cpu_damage == NULL) instead.
(In reply to comment #183) > Hmm. Try assert(src_priv->cpu_damage == NULL) instead. That seems to do the trick.
commit dbe2b5896ec2cd00a456236467b221c9fafdd2f7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri May 24 12:03:41 2013 +0100 sna: Correct assertion for reusing COW Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A new one with 2.21.9-27-gcafa33a: #3 0x00007f6e8d770382 in __GI___assert_fail (assertion=assertion@entry=0x7f6e89f30d28 "priv->cpu == 0 || !(((uintptr_t)(priv->gpu_damage))&1)", file=file@entry=0x7f6e89f2f99e "sna_accel.c", line=line@entry=2066, function=function@entry=0x7f6e89f317c0 <__PRETTY_FUNCTION__.25838> "_sna_pixmap_move_to_cpu") at assert.c:101 #4 0x00007f6e89e652dd in _sna_pixmap_move_to_cpu (pixmap=0xc6c820, flags=<optimized out>) at sna_accel.c:2066 #5 0x00007f6e89e6accb in sna_pixmap_move_to_cpu (flags=2, pixmap=<optimized out>) at sna.h:493 #6 sna_drawable_move_to_cpu (drawable=<optimized out>, flags=flags@entry=2) at sna_accel.c:2629 #7 0x00007f6e89e84d3b in sna_composite (op=1 '\001', src=0x106a600, mask=0x1068b50, dst=0x1063c00, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=0, dst_y=0, width=21, height=21) at sna_composite.c:693 #8 0x0000000000518cb9 in damageComposite (op=1 '\001', pSrc=0x106a600, pMask=0x1068b50, pDst=0x1063c00, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=21, height=21) at damage.c:563 #9 0x00007f6e89eabc97 in trapezoid_mask_converter (op=op@entry=1 '\001', src=src@entry=0x106a600, dst=dst@entry=0x1063c00, maskFormat=<optimized out>, src_x=src_x@entry=10, src_y=src_y@entry=0, ntrap=ntrap@entry=16, traps=traps@entry=0x7f6e86841814) at sna_trapezoids.c:4894 #10 0x00007f6e89eb6702 in sna_composite_trapezoids (op=1 '\001', src=0x106a600, dst=0x1063c00, maskFormat=<optimized out>, xSrc=<optimized out>, ySrc=<optimized out>, ntrap=16, traps=0x7f6e86841814) at sna_trapezoids.c:6502 #11 0x00000000005126ab in ProcRenderTrapezoids (client=0xed4d20) at render.c:759 #12 0x000000000043d5a6 in Dispatch () at dispatch.c:432 #13 0x000000000042bf8a in main (argc=7, argv=0x7fff65a6fda8, envp=<optimized out>) at main.c:295
commit 7ebcf94736926d4c7bb5ba1627be0a62b15c1663 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Jun 15 12:27:19 2013 +0100 sna: Be explicit in not marking an upload buffer as being on the cpu Since the upload buffer is special in that it simultaneously exists on both the CPU and GPU, we need to be extremely careful in marking the cpu hint flag and asserting it. In particular, the new stricter assert was firing in the CompositeTrapezoids fallback path. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I got this one with c361b449cc3ec15819883afc220aad8823c0072d: #3 0x00007fc6526a9392 in __GI___assert_fail (assertion=assertion@entry=0x7fc64f48c91f "!src_priv->mapped", file=file@entry=0x7fc64f48c35e "sna_accel.c", line=line@entry=1738, function=function@entry=0x7fc64f48f0f0 <__PRETTY_FUNCTION__.25851> "sna_pixmap_make_cow") at assert.c:101 #4 0x00007fc64f3cfd36 in sna_pixmap_make_cow (dst_priv=<optimized out>, src_priv=0x2333c10, sna=0x7fc65455d000) at sna_accel.c:1738 #5 sna_copy_boxes (src=0x26e27d0, dst=0x245dbd0, gc=0x282d4f0, region=0x7fff82884ac0, dx=0, dy=0, bitplane=0, closure=0x0) at sna_accel.c:5061 #6 0x00007fc64f3aff58 in sna_do_copy (src=src@entry=0x26e27d0, dst=dst@entry=0x245dbd0, gc=gc@entry=0x282d4f0, sx=sx@entry=0, sy=sy@entry=0, width=width@entry=220, height=height@entry=163, dx=dx@entry=0, dy=dy@entry=0, copy=0x7fc64f3cd7d0 <sna_copy_boxes>, bitPlane=bitPlane@entry=0, closure=closure@entry=0x0) at sna_accel.c:5568 #7 0x00007fc64f3b01df in sna_copy_area (src=0x26e27d0, dst=0x245dbd0, gc=0x282d4f0, src_x=0, src_y=0, width=220, height=163, dst_x=0, dst_y=0) at sna_accel.c:5646 #8 0x000000000051c8ad in damageCopyArea (pSrc=0x26e27d0, pDst=0x245dbd0, pGC=0x282d4f0, srcx=0, srcy=<optimized out>, width=220, height=163, dstx=0, dsty=0) at damage.c:821 #9 0x00000000004394eb in ProcCopyArea (client=0x1e9d8a0) at dispatch.c:1622 #10 0x000000000043d541 in Dispatch () at dispatch.c:428 #11 0x000000000042c06a in main (argc=9, argv=0x7fff82884e28, envp=<optimized out>) at main.c:295
commit 5aaab9ea0310d48bb1a1ca20308d1c9721a9de3f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Jul 6 15:10:22 2013 +0100 sna: Relax assertion that the source of the cloned pixmap cannot be mapped I was being overzealous at the time of making the COW and trying to be sure that we would never write through a mapping. Then I started to allow clones to be mapped (for reads) and missed relaxing this assertion. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This one with 2.21.12-2-g884dd89 happened today: #3 0x00007fd052dcc4f2 in __GI___assert_fail (assertion=assertion@entry=0x7fd04f351912 "size", file=file@entry=0x7fd04f342c14 "kgem.c", line=line@entry=5527, function=function@entry=0x7fd04f344960 <__PRETTY_FUNCTION__.25850> "kgem_create_buffer") at assert.c:101 #4 0x00007fd04f262856 in kgem_create_buffer (kgem=kgem@entry=0x7fd054ca3000, size=size@entry=0, flags=flags@entry=1, ret=ret@entry=0x7fff744fa418) at kgem.c:5527 #5 0x00007fd04f262aff in kgem_create_buffer_2d (kgem=0x7fd054ca3000, width=width@entry=5, height=height@entry=1, bpp=bpp@entry=1, flags=flags@entry=1, ret=ret@entry=0x7fff744fa418) at kgem.c:5885 #6 0x00007fd04f272699 in sna_pixmap_create_upload (screen=screen@entry=0x152f390, width=width@entry=5, height=height@entry=1, depth=depth@entry=1, flags=flags@entry=1) at sna_accel.c:3483 #7 0x00007fd04f2b6ded in triangles_fallback (op=op@entry=3 '\003', src=src@entry=0x2214870, dst=dst@entry=0x2215450, maskFormat=maskFormat@entry=0x1529b80, xSrc=xSrc@entry=0, ySrc=ySrc@entry=0, n=n@entry=1, tri=tri@entry=0x7fff744fa5e0) at sna_trapezoids.c:7439 #8 0x00007fd04f2cbdfd in trifan_fallback (points=0x2421e10, n=82, ySrc=0, xSrc=0, maskFormat=0x1529b80, dst=0x2215450, src=0x2214870, op=3 '\003') at sna_trapezoids.c:7886 #9 sna_composite_trifan (op=3 '\003', src=0x2214870, dst=0x2215450, maskFormat=<optimized out>, xSrc=0, ySrc=0, n=82, points=0x2421e10) at sna_trapezoids.c:7907 #10 0x000000000050af78 in ProcRenderTriFan (client=0x18b76b0) at render.c:876 #11 0x000000000043c99e in Dispatch () at dispatch.c:432 #12 0x000000000042bfc5 in main (argc=7, argv=0x7fff744fa878, envp=<optimized out>) at main.c:295
Somebody tried to use a TriFan! Who? The problem looks like I am missing a round up for a bpp=1 multiply.
commit 4a447c514b3193897e0d3ec4b3cf75f346d7438e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jul 16 08:58:21 2013 +0100 sna: Create a pixmap in ordinary system memory for depth=1 uploads Since we will not sample depth=1 pixmaps from the GPU, we may as well directly allocate these in system memory and avoid tickling the upload cache. This then avoids an issue within the size calculation code which makes the assumption that bpp>=8. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
With 2.21.14-24-g6f3999f I got this while browsing the wikipedia: #3 0x00007fc6526a9392 in __GI___assert_fail (assertion=0x7fc64f48c91f "x1)", file=0x7fc64f48c35e "", line=1738, function=0x7fc64f48f0f0 "") at assert.c:101 #4 0x00007fc64f3cfd36 in sna_put_zpixmap_blt (stride=<optimized out>, bits=0x0, h=0, w=40773584, y=0, x=1, region=0x25ff6b0, drawable=<optimized out>, gc=<optimized out>) at sna_accel.c:4260 #5 sna_put_image (drawable=<optimized out>, gc=0x20028888, depth=42128624, x=0, y=38132688, w=40773584, h=0, left=0, format=0, bits=0x0) at sna_accel.c:4610 #6 0x0000000000000000 in ?? ()
Hmm, that stacktrace is a little worse for wear. Can you see if you can reproduce the bug with -O0?
(In reply to comment #194) > Hmm, that stacktrace is a little worse for wear. Can you see if you can > reproduce the bug with -O0? Unfortunately it is very hard to reproduce. Nevertheless I still have the core file. But what prevented gdb to unwind the stack properly? It does not look like an optimization is the cause...
(In reply to comment #195) > It does not look like an optimization is the cause... Of course, it is my stupidity. I had two core dumps in the directory and passed "core*" to gdb. Here comes the right one: #3 0x00007fe4e827f392 in __GI___assert_fail (assertion=assertion@entry=0x7fe4e5067d98 "cache->glyphs[p->pos >> 1] == p", file=file@entry=0x7fe4e5067cae "sna_glyphs.c", line=line@entry=2225, function=function@entry=0x7fe4e5067ea0 <__PRETTY_FUNCTION__.23963> "sna_glyph_unrealize") at assert.c:101 #4 0x00007fe4e4fc8870 in sna_glyph_unrealize (screen=0x1af6ab0, glyph=0x3534e10) at sna_glyphs.c:2225 #5 0x00000000005087bc in FreeGlyphPicture (glyph=glyph@entry=0x3534e10) at glyph.c:244 #6 0x000000000050884c in FreeGlyph (format=<optimized out>, glyph=<optimized out>) at glyph.c:276 #7 FreeGlyph (glyph=0x3534e10, format=<optimized out>) at glyph.c:249 #8 0x00000000005088cf in FreeGlyphSet (value=0x34e2180, gid=<optimized out>) at glyph.c:482 #9 0x000000000045fe22 in doFreeResource (res=0x3344560, skip=0) at resource.c:873 #10 0x0000000000460901 in FreeResource (id=59640181, skipDeleteFuncType=skipDeleteFuncType@entry=0) at resource.c:903 #11 0x000000000051210e in ProcRenderFreeGlyphSet (client=0x20ff870) at render.c:993 #12 0x000000000043d541 in Dispatch () at dispatch.c:428 #13 0x000000000042c06a in main (argc=9, argv=0x7fffbdce6238, envp=<optimized out>) at main.c:295
Hmm, ok. That assertion makes a lot more sense and is a consequence of commit 021c59f6243c51871f664e5fdd43aa00a809033a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Aug 10 13:15:48 2013 +0000 sna/glyphs: Micro-optimise glyph drawing When you don't have many cycles to play with, every one counts. Here we make sure we cache negative lookups for large glyphs. During the free, I am not handling the negative cache correctly.
commit 5775731b6489c63212914163f51b7a576df70658 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Aug 12 22:24:47 2013 +0100 sna/glyphs: Do not attempt to release the negative cache lookup Following commit 021c59f6243c51871f664e5fdd43aa00a809033a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Aug 10 13:15:48 2013 +0000 sna/glyphs: Micro-optimise glyph drawing we need to be careful during sna_glyph_release() that the atlas points into the glyph cache before releasing it. Fortunately caught by an assertion. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Ok, today I got one with 2.21.14-54-g4486ae2. Nicely reproducible by running evince with a specific pdf. #3 0x00007fb99926b392 in __GI___assert_fail (assertion=assertion@entry=0x7fb99604de7a "!w->winSize.data", file=file@entry=0x7fb99604dc4e "sna_accel.c", line=line@entry=5644, function=function@entry=0x7fb996050b34 <__PRETTY_FUNCTION__.26501> "sna_do_copy") at assert.c:101 #4 0x00007fb995f75391 in sna_do_copy (src=src@entry=0x232f5d0, dst=dst@entry=0x233d8a0, gc=gc@entry=0x2333570, sx=2, sx@entry=0, sy=33, sy@entry=0, width=width@entry=596, height=height@entry=565, dx=dx@entry=0, dy=dy@entry=0, copy=0x7fb995f8c740 <sna_copy_boxes>, bitPlane=bitPlane@entry=0, closure=closure@entry=0x0) at sna_accel.c:5644 #5 0x00007fb995f7546f in sna_copy_area (src=0x232f5d0, dst=0x233d8a0, gc=0x2333570, src_x=0, src_y=0, width=596, height=565, dst_x=0, dst_y=0) at sna_accel.c:5752 #6 0x000000000051c8ad in damageCopyArea (pSrc=0x232f5d0, pDst=0x233d8a0, pGC=0x2333570, srcx=0, srcy=<optimized out>, width=596, height=565, dstx=0, dsty=0) at damage.c:821 #7 0x00000000004394eb in ProcCopyArea (client=0x22fc0d0) at dispatch.c:1622 #8 0x000000000043d541 in Dispatch () at dispatch.c:428 #9 0x000000000042c06a in main (argc=3, argv=0x7ffff6e1d898, envp=<optimized out>) at main.c:295
Created attachment 85046 [details] pdf to reproduce
commit 9ac895fccba72c6d4c0e754ab5298ece7dc8632d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 2 13:03:33 2013 +0100 sna: Process winSize region Sometimes the window size is not a simple box, but a full region. In which case we do need to process it and not just assert that it is a box! Reported-by: Jiri Slaby <jirislaby@gmai.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A new one with 2.99.901. I switched off effects, turned them on, and boom. #3 0x00007fc134316392 in __GI___assert_fail (assertion=assertion@entry=0x7fc1310f7718 "bo->rq == ((struct kgem_request *)((uintptr_t)(kgem->next_request) | (kgem->ring)))", file=file@entry=0x7fc1310f68a4 "kgem.c", line=line@entry=4796, function=function@entry=0x7fc1310f84ba <__PRETTY_FUNCTION__.25650> "kgem_add_reloc") at assert.c:101 #4 0x00007fc13100f231 in kgem_add_reloc (kgem=kgem@entry=0x7fc1361a7000, pos=pos@entry=16369, bo=<optimized out>, bo@entry=0x2785d50, read_write_domain=131074, delta=<optimized out>, delta@entry=0) at kgem.c:4796 #5 0x00007fc1310c9618 in gen7_bind_bo (sna=sna@entry=0x7fc1361a7000, bo=0x2785d50, width=882, height=733, format=<optimized out>, is_dst=is_dst@entry=true) at gen7_render.c:1281 #6 0x00007fc1310cc833 in gen7_emit_fill_state (sna=sna@entry=0x7fc1361a7000, op=op@entry=0x7fff7f5a6f60) at gen7_render.c:3276 #7 0x00007fc1310ccb42 in gen7_render_fill (sna=0x7fc1361a7000, alu=<optimized out>, dst=0x2950110, dst_bo=0x2785d50, color=8420732, op=0x7fff7f5a6f60) at gen7_render.c:3587 #8 0x00007fc131034af0 in sna_fill_init_blt (pixel=<optimized out>, alu=<optimized out>, bo=<optimized out>, pixmap=<optimized out>, sna=<optimized out>, fill=0x7fff7f5a6f60) at sna_accel.c:407 #9 sna_poly_fill_polygon (draw=0x25d41e0, gc=0x2399490, shape=0, mode=0, n=6, pt=0x2642e94) at sna_accel.c:10507 #10 0x000000000051af73 in damageFillPolygon (pDrawable=0x25d41e0, pGC=0x2399490, shape=0, mode=<optimized out>, npt=<optimized out>, ppt=<optimized out>) at damage.c:1214 #11 0x0000000000439d3a in ProcFillPoly (client=0x22dfe20) at dispatch.c:1813 #12 0x000000000043d541 in Dispatch () at dispatch.c:428 #13 0x000000000042c06a in main (argc=9, argv=0x7fff7f5a73e8, envp=<optimized out>) at main.c:295
Impossible! /o\ That means we found a bo that still thought it was in the last batch on another ring... I presume it is highly reproducible for you. Would it possible to get a full debug log? (I suspect that will change timing enough to render the effort void.)
(In reply to comment #203) > Impossible! /o\ > > That means we found a bo that still thought it was in the last batch on > another ring... I presume it is highly reproducible for you. Not really, it happened only once... I will try after some suspend/resume cycles if it starts happening.
I have another one with 2.99.902-1-g218c990 today though. I switched to VT1 immediately after an xdm restart and see: #2 0x00007f42ee8842e2 in __assert_fail_base (fmt=0x7f42ee9c17f0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f42eb66aa04 "sna->scrn->vtSema", file=file@entry=0x7f42eb66a94d "sna_display.c", line=line@entry=220, function=function@entry=0x7f42eb66b4a8 <__PRETTY_FUNCTION__.24673> "get_fb") at assert.c:92 #3 0x00007f42ee884392 in __GI___assert_fail (assertion=assertion@entry=0x7f42eb66aa04 "sna->scrn->vtSema", file=file@entry=0x7f42eb66a94d "sna_display.c", line=line@entry=220, function=function@entry=0x7f42eb66b4a8 <__PRETTY_FUNCTION__.24673> "get_fb") at assert.c:101 #4 0x00007f42eb5bde98 in get_fb (sna=sna@entry=0x7f42f0715000, bo=bo@entry=0x10c4e30, width=1920, height=1848) at sna_display.c:220 #5 0x00007f42eb5bf39a in sna_crtc_attach (crtc=0xd22140) at sna_display.c:1265 #6 sna_crtc_set_mode_major (crtc=crtc@entry=0xd22140, mode=mode@entry=0xd22158, rotation=<optimized out>, x=<optimized out>, y=<optimized out>) at sna_display.c:1441 #7 0x00007f42eb5c3499 in sna_mode_resize (scrn=0xd20500, width=1920, height=1848) at sna_display.c:2791 #8 0x00000000004bd615 in xf86RandR12ScreenSetSize (pScreen=0xd27ac0, width=1920, height=1848, mmWidth=507, mmHeight=488) at xf86RandR12.c:699 #9 0x000000000050474b in ProcRRSetScreenSize (client=0xfdbec0) at rrscreen.c:286 #10 0x000000000043d541 in Dispatch () at dispatch.c:428 #11 0x000000000042c06a in main (argc=9, argv=0x7fff41855df8, envp=<optimized out>) at main.c:295
Ok, that last bug detected in resize() looks fairly benign (in effect had the assertion not fired, it should have been a no-op, at worst an fb-id leak). commit 6a6517c748c33a73c92ca67cae9bfb00d48d2382 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Sep 7 22:07:04 2013 +0100 sna: Only update the CRTCs after a resize if master It appears possible to race the framebuffer resize with a VT switch and so end up attempting to update the CRTCs whilst master. The code complains, but in reality we can just ignore the requested change until we VT switch back and then apply the updates upon restoration of master. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> will prevent that occurring again in future. Still none the wiser as to the previous issue.
Another one with 2.99.902-5-gf25235a: #3 0x00007f549a56e392 in __GI___assert_fail (assertion=assertion@entry=0x7f5497352190 "sna_damage_contains_box(src_priv->gpu_damage, ®ion->extents) == PIXMAN_REGION_IN", file=file@entry=0x7f549735014e "sna_accel.c", line=line@entry=5112, function=function@entry=0x7f54973531f0 <__PRETTY_FUNCTION__.26478> "sna_copy_boxes__inplace") at assert.c:101 #4 0x00007f549728fe4c in sna_copy_boxes__inplace (dst_priv=<optimized out>, dst_pixmap=0x2b4c6b0, dy=1400, dx=0, src_priv=0x2a57e90, src_pixmap=0x24d0b60, alu=3, region=0x7fff2ebffe20, sna=0x7f549c3ff000) at sna_accel.c:5112 #5 sna_copy_boxes (src=0x24d0b60, dst=0x2b4c6b0, gc=0x25ecaa0, region=<optimized out>, dx=0, dy=1400, bitplane=0, closure=0x0) at sna_accel.c:5635 #6 0x00007f54972740b9 in sna_do_copy (src=src@entry=0x24d0b60, dst=dst@entry=0x2b4c6b0, gc=gc@entry=0x25ecaa0, sx=sx@entry=0, sy=sy@entry=1400, width=width@entry=50, height=height@entry=2, dx=dx@entry=0, dy=dy@entry=0, copy=0x7f549728d000 <sna_copy_boxes>, bitPlane=bitPlane@entry=0, closure=closure@entry=0x0) at sna_accel.c:5851 #7 0x00007f54972743c1 in sna_copy_area (src=0x24d0b60, dst=0x2b4c6b0, gc=0x25ecaa0, src_x=0, src_y=1400, width=50, height=2, dst_x=0, dst_y=0) at sna_accel.c:5929 #8 0x000000000051c8ad in damageCopyArea (pSrc=0x24d0b60, pDst=0x2b4c6b0, pGC=0x25ecaa0, srcx=0, srcy=<optimized out>, width=50, height=2, dstx=0, dsty=0) at damage.c:821 #9 0x00000000004394eb in ProcCopyArea (client=0x23e24f0) at dispatch.c:1622 #10 0x000000000043d541 in Dispatch () at dispatch.c:428 #11 0x000000000042c06a in main (argc=9, argv=0x7fff2ec00188, envp=<optimized out>) at main.c:295
commit 6d5df67b3e4ed5d5b9f1aabf01b5657d4c9eeac7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 9 11:19:19 2013 +0100 sna: Remember to offset the box before asserting damage When using sna_copy_boxes__inplace(), we need to remember that the region is in destination space, so we need to offset the boxes when comparing against the source. The assertion forgot to do so, and so failed as soon as it met a little complexity. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(In reply to comment #208) > commit 6d5df67b3e4ed5d5b9f1aabf01b5657d4c9eeac7 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Sep 9 11:19:19 2013 +0100 > > sna: Remember to offset the box before asserting damage Arr: sna_accel.c: In function 'sna_copy_boxes__inplace': sna_accel.c:5112:2: error: too many arguments to function 'sna_damage_contains_box' In file included from sna.h:103:0, from sna_accel.c:33: sna_damage.h:210:19: note: declared here sna_accel.c:5113:2: error: too many arguments to function 'sna_damage_contains_box' In file included from sna.h:103:0, from sna_accel.c:33: sna_damage.h:210:19: note: declared here
A new one when I ran a command which produces output of 38 megabytes in the xterm. This is with 2.99.902-5-gf25235a. #3 0x00007f4cc7da8392 in __GI___assert_fail (assertion=assertion@entry=0x7f4cc4b89018 "bo->rq == ((struct kgem_request *)((uintptr_t)(kgem->next_request) | (kgem->ring)))", file=file@entry=0x7f4cc4b881a4 "kgem.c", line=line@entry=4796, function=function@entry=0x7f4cc4b89dba <__PRETTY_FUNCTION__.25652> "kgem_add_reloc") at assert.c:101 #4 0x00007f4cc4aa0321 in kgem_add_reloc (kgem=kgem@entry=0x7f4cc9c39000, pos=pos@entry=16369, bo=<optimized out>, bo@entry=0x3a1ca60, read_write_domain=131074, delta=<optimized out>, delta@entry=0) at kgem.c:4796 #5 0x00007f4cc4b5af78 in gen7_bind_bo (sna=sna@entry=0x7f4cc9c39000, bo=0x3a1ca60, width=1916, height=556, format=<optimized out>, is_dst=is_dst@entry=true) at gen7_render.c:1281 #6 0x00007f4cc4b5e203 in gen7_emit_fill_state (sna=sna@entry=0x7f4cc9c39000, op=op@entry=0x7fff56a380a0) at gen7_render.c:3285 #7 0x00007f4cc4b5fcb0 in gen7_render_fill_one (sna=0x7f4cc9c39000, dst=0x2871d50, bo=<optimized out>, color=<optimized out>, x1=10, y1=26, x2=1906, y2=546, alu=3 '\003') at gen7_render.c:3677 #8 0x00007f4cc4aaa155 in sna_poly_fill_rect_blt (drawable=0x41d2a80, bo=<optimized out>, damage=0x0, pixel=0, n=n@entry=1, rect=rect@entry=0x3198170, extents=extents@entry=0x7fff56a39570, clipped=false, gc=0x24ddf10, gc=0x24ddf10) at sna_accel.c:10420 #9 0x00007f4cc4acc977 in sna_poly_fill_rect (draw=<optimized out>, gc=0x24ddf10, n=1, rect=0x3198170) at sna_accel.c:12969 #10 0x000000000051acdd in damagePolyFillRect (pDrawable=0x41d2a80, pGC=0x24ddf10, nRects=1, pRects=0x3198170) at damage.c:1250 #11 0x00000000005718f0 in miPaintWindow (pWin=<optimized out>, pWin@entry=0x41d2a80, prgn=prgn@entry=0x7fff56a396b0, what=what@entry=0) at miexpose.c:641 #12 0x0000000000587add in miClearToBackground (pWin=0x41d2a80, x=<optimized out>, y=<optimized out>, w=<optimized out>, h=<optimized out>, generateExposures=0) at miwindow.c:115 #13 0x000000000043937d in ProcClearToBackground (client=0x3a44920) at dispatch.c:1589 #14 0x000000000043d541 in Dispatch () at dispatch.c:428 #15 0x000000000042c06a in main (argc=9, argv=0x7fff56a398d8, envp=<optimized out>) at main.c:295
The impossible happened twice. I'm still none the wiser as to how it even happened once.
(In reply to comment #211) > The impossible happened twice. I'm still none the wiser as to how it even > happened once. Just a shot in the dark. Have you considered WYSINWYX? Some kind of compiler unexpected optimization. Like the bug we fixed in 0b116a1d023a?
More asserts, maybe we will catch the problem earlier: commit a934b2455a7b67d124903ab086a323169f0b4d69 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Sep 25 12:06:56 2013 +0100 sna: Add a few more asserts around tracking of bo->exec vs bo->request References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I think this is it, as it crashes every time I start X now. #3 0x00007f363ee1f392 in __GI___assert_fail (assertion=assertion@entry=0x7f363bc0e150 "flip->scanout[1].bo->refcnt == 1", file=file@entry=0x7f363bc0d7a0 "sna_dri.c", line=line@entry=1629, function=function@entry=0x7f363bc0e4d0 <__PRETTY_FUNCTION__.24795> "sna_dri_flip_event") at assert.c:101 #4 0x00007f363bbe0405 in sna_dri_flip_event (flip=0x1809bd0, sna=0x7f3640cb1000) at sna_dri.c:1629 #5 sna_dri_page_flip_handler (sna=sna@entry=0x7f3640cb1000, event=event@entry=0x7fffddd4ec90) at sna_dri.c:1732 #6 0x00007f363bb5afb3 in sna_mode_wakeup (sna=0x7f3640cb1000) at sna_display.c:4274 #7 0x000000000044188c in WakeupHandler (result=result@entry=1, pReadmask=pReadmask@entry=0x81bfa0 <LastSelectMask>) at dixutils.c:426 #8 0x000000000058cd54 in WaitForSomething (pClientsReady=pClientsReady@entry=0x15abbc0) at WaitFor.c:224 #9 0x000000000043d291 in Dispatch () at dispatch.c:357 #10 0x000000000042c06a in main (argc=9, argv=0x7fffddd4f518, envp=<optimized out>) at main.c:295
That assertion should be silenced by commit 22557b88d3c0bb8f883db7d0d0bf85f1255c65c9 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 4 14:14:19 2013 +0100 sna/dri: Only move the scanout to the flip cache if destroyed Under certain circumstances, we can retire the pageflip before the client has updated its back buffers. This leads us to pre-emptively moving the scanout to the cache, when all we need to do (and will do) is report the same back buffer back to the client. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Though I suspect it is a recent artifact, so I'm not overly optimistic that it is the root cause behind the bizarre earlier assert.
(In reply to comment #215) > That assertion should be silenced by > > commit 22557b88d3c0bb8f883db7d0d0bf85f1255c65c9 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Fri Oct 4 14:14:19 2013 +0100 It does not crash, but a multihead screen output does not work, so I cannot use it. (see bug 70132)
Got this right now with 2.99.903-47-g082c087 #3 0x00007faeebbc1392 in __GI___assert_fail (assertion=assertion@entry=0x7faee89aa230 "sna->render.vertex_used <= sna->render.vertex_size", file=file@entry=0x7faee89ab838 "gen4_vertex.c", line=line@entry=67, function=function@entry=0x7faee89abb20 <__PRETTY_FUNCTION__.23394> "gen4_vertex_finish") at assert.c:101 #4 0x00007faee895c098 in gen4_vertex_finish (sna=sna@entry=0x7faeeda52000) at gen4_vertex.c:67 #5 0x00007faee889b7d5 in gen7_get_rectangles__flush (sna=sna@entry=0x7faeeda52000, op=op@entry=0x7fff771f38b0) at gen7_render.c:1394 #6 0x00007faee8978c74 in gen7_get_rectangles (emit_state=0x7faee8977a10 <gen7_emit_composite_state>, want=1, op=0x7fff771f38b0, sna=0x7faeeda52000) at gen7_render.c:1411 #7 gen7_render_composite_box (sna=0x7faeeda52000, op=0x7fff771f38b0, box=<optimized out>) at gen7_render.c:1555 #8 0x00007faee88e900e in sna_composite (op=<optimized out>, src=<optimized out>, mask=0x0, dst=0x3398690, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=51, dst_y=399, width=16, height=16) at sna_composite.c:732 #9 0x00000000005197a9 in damageComposite (op=3 '\003', pSrc=0x2f3f930, pMask=0x0, pDst=0x3398690, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=51, yDst=399, width=16, height=16) at damage.c:559 #10 0x0000000000512724 in ProcRenderComposite (client=0x2a88de0) at render.c:708 #11 0x000000000043d541 in Dispatch () at dispatch.c:428 #12 0x000000000042c06a in main (argc=9, argv=0x7fff771f3d68, envp=<optimized out>) at main.c:295
It's a new assert, so let me double check that it is sane and has the boundary condition correct. Could you check if you can easily trigger it? I may need a full-debug log to piece together the fail... But it should be easy enough to spot, maybe with one or two more asserts.
(In reply to comment #218) > It's a new assert, so let me double check that it is sane and has the > boundary condition correct. Could you check if you can easily trigger it? I > may need a full-debug log to piece together the fail... But it should be > easy enough to spot, maybe with one or two more asserts. It happens occasionally, once, twice a day. So a debug log is impractical, it would have gigs. So if you could add some more asserts, I would test.
Looks to have been a useful assert, some rejigging later: commit d1cb39d0a86390b7ef704422a5b91f9a012d24a5 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Oct 9 10:32:02 2013 +0100 sna/gen3+: Flush vertex buffer after computing resize Upon aligning the buffer, we may enlarge the vbo to accomodate the vertex alignment and push the current index past the end of the buffer. Move the space check from before the alignment computation to afterwards. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(In reply to comment #220) > Looks to have been a useful assert, some rejigging later: > > commit d1cb39d0a86390b7ef704422a5b91f9a012d24a5 Ok, so now I have: #3 0x00007fdecaec7392 in __GI___assert_fail (assertion=assertion@entry=0x7fdec7ca6418 "bo->rq == ((struct kgem_request *)((uintptr_t)(kgem->next_request) | (kgem->ring)))", file=file@entry=0x7fdec7ca5594 "kgem.c", line=line@entry=4908, function=function@entry=0x7fdec7ca725a <__PRETTY_FUNCTION__.25691> "kgem_add_reloc") at assert.c:101 #4 0x00007fdec7bb144f in kgem_add_reloc (kgem=kgem@entry=0x7fdeccd58000, pos=pos@entry=16369, bo=<optimized out>, bo@entry=0x243a950, read_write_domain=131074, delta=<optimized out>, delta@entry=0) at kgem.c:4908 #5 0x00007fdec7c79b31 in gen7_bind_bo (sna=sna@entry=0x7fdeccd58000, bo=0x243a950, width=922, height=1080, format=<optimized out>, is_dst=is_dst@entry=true) at gen7_render.c:1301 #6 0x00007fdec7c7aad3 in gen7_emit_fill_state (sna=sna@entry=0x7fdeccd58000, op=op@entry=0x7fffa0feb3a0) at gen7_render.c:3328 #7 0x00007fdec7c802fc in gen7_render_fill (sna=0x7fdeccd58000, alu=3 '\003', dst=0x3254be0, dst_bo=0x243a950, color=16711680, op=0x7fffa0feb3a0) at gen7_render.c:3645 #8 0x00007fdec7be1d40 in sna_fill_init_blt (alu=<optimized out>, fill=0x7fffa0feb3a0, pixel=16711680, bo=<optimized out>, pixmap=<optimized out>, sna=<optimized out>) at sna_accel.c:427 #9 sna_poly_line_blt (clipped=true, extents=0x7fffa0feb360, pt=0x7fffa0fec5e0, n=5, mode=0, pixel=16711680, damage=0x0, bo=<optimized out>, drawable=0x3513330, gc=<optimized out>) at sna_accel.c:8215 #10 sna_poly_line (drawable=0x3513330, gc=0x34e2c70, mode=0, n=5, pt=0x7fffa0fec5e0) at sna_accel.c:8606 #11 0x000000000057c9ba in miPolyRectangle (pDraw=pDraw@entry=0x3513330, pGC=pGC@entry=0x34e2c70, nrects=nrects@entry=1, pRects=pRects@entry=0x35436dc) at mipolyrect.c:171 #12 0x00007fdec7bda3ee in sna_poly_rectangle (drawable=0x3513330, gc=0x34e2c70, n=1, r=0x35436dc) at sna_accel.c:10291 #13 0x000000000051b599 in damagePolyRectangle (pDrawable=0x3513330, pGC=<optimized out>, nRects=1, pRects=0x35436dc) at damage.c:1108 #14 0x0000000000439b1d in ProcPolyRectangle (client=0x34ec490) at dispatch.c:1764 #15 0x000000000043d541 in Dispatch () at dispatch.c:428 #16 0x000000000042c06a in main (argc=9, argv=0x7fffa0fec928, envp=<optimized out>) at main.c:295
Well the good news is that bizarre assert for the stale rq is the same stacktraces for all 3 occurrences. Hopefully that means that path is broken rather than just coincidence.
(In reply to comment #222) > Well the good news is that bizarre assert for the stale rq is the same > stacktraces for all 3 occurrences. Hopefully that means that path is broken > rather than just coincidence. Oh, is it the one which is hard to understand? Well, I can reproduce this one quite easily ;). Do you want a debug trace?
Yes, if it is reproducible with the debug log, then please do.
(In reply to comment #224) > Yes, if it is reproducible with the debug log, then please do. Ok, it's at: http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old.xz
Thanks, this should fix it commit b16999da214803a026d7246ec4aa62031e82dfc8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Oct 9 23:54:57 2013 +0100 sna: Upon unwinding operations, also check for exec objects to clear If we cancel an operation after partially committing it, we may leave the batch bookkeeping in an inconsistent state with an exec object with a zero-length batch. Ordinarily, this would not be an issue as we could pass the extra object to the next batch. However, if we switch rings, we need to clear the extra objects as they are currently flagged as being on the wrong ring, leading to hilarity. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> I think that's the last of your current batch of assertions...
(In reply to comment #226) > I think that's the last of your current batch of assertions... It looks promising.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.