Bug 56180

Summary: xf86-video-intel 2.20.10 dies frequently
Product: xorg Reporter: nkalkhof
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
xorg.log none

Description nkalkhof 2012-10-19 13:30:58 UTC
Created attachment 68797 [details]
xorg.log

hello,

xf86-video-intel 2.20.10 dies frequently on SNB when working with Firefox. Crash appears to be random while loading webpages with images.

gdb backtrace:
0  0x00007ffd398a9fa5 in raise () from /lib/libc.so.6
No symbol table info available.
#1  0x00007ffd398ab428 in abort () from /lib/libc.so.6
No symbol table info available.
#2  0x00007ffd398a2f92 in __assert_fail_base () from /lib/libc.so.6
No symbol table info available.
#3  0x00007ffd398a3042 in __assert_fail () from /lib/libc.so.6
No symbol table info available.
#4  0x00007ffd3861ccef in sna_damage_add (region=<optimized out>, 
    damage=<optimized out>) at sna_damage.h:48
No locals.
#5  0x00007ffd38661e03 in sna_composite (op=1 '\001', src=0x1470dc0, mask=0x0, 
    dst=<optimized out>, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=0, 
    dst_y=0, width=12, height=10) at sna_composite.c:540
        x = <optimized out>
        y = <optimized out>
        pixmap = 0x7fffdb173d60
        sna = 0x7ffd3b819010
        priv = 0x8
        tmp = {blt = 0x7ffd38659b90 <blt_put_composite__cpu>, 
          box = 0x7ffd38659af0 <blt_put_composite_box__cpu>, 
          boxes = 0x7ffd38659a30 <blt_put_composite_boxes__cpu>, 
          done = 0x7ffd38659ec0 <nop_done>, damage = 0x13da9f0, op = 0, dst = {
            pixmap = 0x13db520, format = 537004168, bo = 0x0, x = 0, y = 0, 
            width = 12, height = 945}, src = {bo = 0x0, transform = 0x0, 
            width = 0, height = 0, pict_format = 0, card_format = 0, 
            filter = 0, repeat = 0, is_affine = 0, is_solid = 0, 
            is_linear = 0, is_opaque = 0, alpha_fixup = 0, rb_reversed = 0, 
            offset = {0, 0}, scale = {0, 0}, embedded_transform = {matrix = {{
                  0, 0, 0}, {0, 0, 0}, {0, 0, 0}}}, u = {gen2 = {pixel = 0, 
                linear_dx = 0, linear_dy = 0, linear_offset = 0}, gen3 = {
                type = 0, mode = 0, constants = 0}}}, mask = {bo = 0x0, 
            transform = 0x0, width = 0, height = 0, pict_format = 0, 
            card_format = 0, filter = 0, repeat = 0, is_affine = 0, 
            is_solid = 0, is_linear = 0, is_opaque = 0, alpha_fixup = 0, 
            rb_reversed = 0, offset = {0, 0}, scale = {0, 0}, 
            embedded_transform = {matrix = {{0, 0, 0}, {0, 0, 0}, {0, 0, 0}}}, 
            u = {gen2 = {pixel = 0, linear_dx = 0, linear_dy = 0, 
                linear_offset = 0}, gen3 = {type = 0, mode = 0, 
                constants = 0}}}, is_affine = 0, has_component_alpha = 0, 
          need_magic_ca_pass = 0, rb_reversed = 0, floats_per_vertex = 0, 
          floats_per_rect = 0, prim_emit = 0x0, redirect = {real_bo = 0x0, 
            real_damage = 0x0, damage = 0x0, box = {x1 = 0, y1 = 0, x2 = 0, 
              y2 = 0}}, u = {blt = {src_pixmap = 0x14733d0, sx = 0, sy = 0, 
              inplace = 0, overwrites = 0, bpp = 0, cmd = 0, br13 = 0, 
              pitch = {0, 0}, pixel = 0, bo = {0x0, 0x0}}, gen3 = {
              constants = {3.65877011e-38, 0, 0, 0, 0, 0, 0, 0}, 
              num_constants = 0}, gen4 = {wm_kernel = 21443536, ve_id = 0}, 
            gen5 = {wm_kernel = 21443536, ve_id = 0}, gen6 = {
              flags = 21443536}, gen7 = {flags = 21443536}}, priv = 0x0}
        flags = <optimized out>
        region = {extents = {x1 = 0, y1 = 0, x2 = 12, y2 = 10}, data = 0x0}
        dx = <optimized out>
        dy = 0
#6  0x0000000000514eb9 in damageComposite ()
No symbol table info available.
#7  0x000000000050de44 in ProcRenderComposite ()
No symbol table info available.
#8  0x000000000043b241 in Dispatch ()
No symbol table info available.
#9  0x0000000000429e6a in main ()
No symbol table info available.

configure options:
configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --disable-dependency-tracking --docdir=/usr/share/doc/xf86-video-intel-9999 --enable-dri --disable-glamor --enable-sna --disable-uxa --enable-udev --disable-xvmc -enable-debug

xorg.log attached.

Could someone please confirm this?

Best regards
Nic
Comment 1 Chris Wilson 2012-10-19 13:50:11 UTC
Can you double check you have the right debug symbols? That stack trace isn't lineing up neatly to the code.
Comment 2 Chris Wilson 2012-10-19 14:06:26 UTC
Spotted one path that could result in the tmp.damage != NULL and tmp.dst.bo == NULL:

commit 299232bdb69c8c2b6231905e0f45e9cfe74fe09a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Oct 19 15:02:00 2012 +0100

    sna: Reorder final checks for using the BO and setting the damage pointer
    
    When we return NULL from sna_drawable_use_bo(), the expectation is that
    the damage pointer is also NULL. However, one SHM path leaked.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=56180
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 3 nkalkhof 2012-10-19 15:15:58 UTC
Hi Chris,

right gdb debug symbols don't seem to be in alignment with the code. But I have no idea how that could have happened. There are no other versions of the intel_drv.so lying around somewhere.
I checked it again and even with your latest patch X still dies. Debug still seems to be out of alignment. Any Idea what could have caused this? I am remote debugging since the localhost console is dead too after X dies.

Program received signal SIGABRT, Aborted.
0x00007f7a12e39fa5 in raise () from /lib/libc.so.6
(gdb) backtrace full
#0  0x00007f7a12e39fa5 in raise () from /lib/libc.so.6
No symbol table info available.
#1  0x00007f7a12e3b428 in abort () from /lib/libc.so.6
No symbol table info available.
#2  0x00007f7a12e32f92 in __assert_fail_base () from /lib/libc.so.6
No symbol table info available.
#3  0x00007f7a12e33042 in __assert_fail () from /lib/libc.so.6
No symbol table info available.
#4  0x00007f7a11b75509 in sna_damage_add (region=<optimized out>, 
    damage=<optimized out>) at sna_damage.h:48
No locals.
#5  0x00007f7a11bd7693 in sna_composite (op=1 '\001', src=0xf5d4b0, mask=0x0, 
    dst=<optimized out>, src_x=0, src_y=0, mask_x=0, mask_y=0, dst_x=0, 
    dst_y=0, width=12, height=10) at sna_composite.c:540
        x = <optimized out>
        y = <optimized out>
        pixmap = 0x7fff4f357a10
        sna = 0x7f7a14db2010
        priv = 0x8
        tmp = {blt = 0x7f7a11bcf420 <blt_put_composite__cpu>, 
          box = 0x7f7a11bcf380 <blt_put_composite_box__cpu>, 
          boxes = 0x7f7a11bcf2c0 <blt_put_composite_boxes__cpu>, 
          done = 0x7f7a11bcf750 <nop_done>, damage = 0xe45ca0, op = 0, dst = {
            pixmap = 0xf56350, format = 537004168, bo = 0x0, x = 0, y = 0, 
            width = 12, height = 916}, src = {bo = 0x0, transform = 0x0, 
            width = 0, height = 0, pict_format = 0, card_format = 0, 
            filter = 0, repeat = 0, is_affine = 0, is_solid = 0, 
            is_linear = 0, is_opaque = 0, alpha_fixup = 0, rb_reversed = 0, 
            offset = {0, 0}, scale = {0, 0}, embedded_transform = {matrix = {{
                  0, 0, 0}, {0, 0, 0}, {0, 0, 0}}}, u = {gen2 = {pixel = 0, 
                linear_dx = 0, linear_dy = 0, linear_offset = 0}, gen3 = {
                type = 0, mode = 0, constants = 0}}}, mask = {bo = 0x0, 
            transform = 0x0, width = 0, height = 0, pict_format = 0, 
            card_format = 0, filter = 0, repeat = 0, is_affine = 0, 
            is_solid = 0, is_linear = 0, is_opaque = 0, alpha_fixup = 0, 
            rb_reversed = 0, offset = {0, 0}, scale = {0, 0}, 
            embedded_transform = {matrix = {{0, 0, 0}, {0, 0, 0}, {0, 0, 0}}}, 
            u = {gen2 = {pixel = 0, linear_dx = 0, linear_dy = 0, 
                linear_offset = 0}, gen3 = {type = 0, mode = 0, 
                constants = 0}}}, is_affine = 0, has_component_alpha = 0, 
          need_magic_ca_pass = 0, rb_reversed = 0, floats_per_vertex = 0, 
          floats_per_rect = 0, prim_emit = 0x0, redirect = {real_bo = 0x0, 
            real_damage = 0x0, damage = 0x0, box = {x1 = 0, y1 = 0, x2 = 0, 
              y2 = 0}}, u = {blt = {src_pixmap = 0xefe5f0, sx = 0, sy = 0, 
              inplace = 0, overwrites = 0, bpp = 0, cmd = 0, br13 = 0, 
              pitch = {0, 0}, pixel = 0, bo = {0x0, 0x0}}, gen3 = {
              constants = {2.20311696e-38, 0, 0, 0, 0, 0, 0, 0}, 
              num_constants = 0}, gen4 = {wm_kernel = 15721968, ve_id = 0}, 
            gen5 = {wm_kernel = 15721968, ve_id = 0}, gen6 = {
              flags = 15721968}, gen7 = {flags = 15721968}}, priv = 0x0}
        flags = <optimized out>
        region = {extents = {x1 = 0, y1 = 0, x2 = 12, y2 = 10}, data = 0x0}
        dx = <optimized out>
        dy = 0
#6  0x0000000000514eb9 in damageComposite ()
No symbol table info available.
#7  0x000000000050de44 in ProcRenderComposite ()
No symbol table info available.
#8  0x000000000043b241 in Dispatch ()
No symbol table info available.
#9  0x0000000000429e6a in main ()
No symbol table info available.
(gdb) cont
Continuing.

Regards
Nic
Comment 4 Chris Wilson 2012-10-19 15:23:16 UTC
Either gcc is being too aggressive and the dwarf info no longer matches, or there is a residual debug.so that gdb is picking up. Can you see if you can reproduce with --enable-debug=full?
Comment 5 Chris Wilson 2012-10-19 15:24:11 UTC
Ho hum, and try

diff --git a/src/sna/sna_blt.c b/src/sna/sna_blt.c
index 7410eb1..fd97255 100644
--- a/src/sna/sna_blt.c
+++ b/src/sna/sna_blt.c
@@ -1899,6 +1899,7 @@ put:
                                if (tmp->dst.bo == priv->cpu_bo) {
                                        DBG(("%s: forcing the stall to overwrite
                                        tmp->dst.bo = NULL;
+                                       tmp->damage = NULL;
                                }
                        }
                }
Comment 6 Chris Wilson 2012-10-19 17:15:07 UTC
commit f4c32af48b0c92a48131090886a6a6b6c45dbe34
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Oct 19 16:29:19 2012 +0100

    sna: Clear the damage along with the BO when forcing the stall for inplace BLT
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56180
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Any progress?
Comment 7 nkalkhof 2012-10-20 06:24:04 UTC
Hi Chris,

so far I haven't been able to reproduce the crash using the --enable-debug=full switch. However your latest patch seems to fix the issue even without hitting the debug switch. I'll give it a try for a couple of days and if I don't see any more related crashes I'll mark this bug resolved, ok?

Thx and Regards
Nic
Comment 8 Chris Wilson 2012-10-20 07:46:04 UTC
One step ahead of you :)

Considering that those paths would be timing critical, it is understandable that running debug=full would mask the issue. However, I'm pretty sure that I understood the issue and with the extra assertions in place, the bug has to commit f4c32af48b0c92a48131090886a6a6b6c45dbe34
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Oct 19 16:29:19 2012 +0100

    sna: Clear the damage along with the BO when forcing the stall for inplace BLT

Between sna_drawable_use_bo() and sna_damage_add(), the damage must have become reduced and that is only possible by an intervening move-to-cpu/-gpu. Of which the above is the only guilty party.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.