Bug 54127 - Assertion on Gen6 [SNA]
Summary: Assertion on Gen6 [SNA]
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-27 17:51 UTC by Clemens Eisserer
Modified: 2012-08-27 21:31 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
xorg log (265.17 KB, application/octet-stream)
2012-08-27 18:19 UTC, Clemens Eisserer
no flags Details
xorg log (2.55 MB, application/x-bzip2)
2012-08-27 18:37 UTC, Clemens Eisserer
no flags Details

Description Clemens Eisserer 2012-08-27 17:51:21 UTC
When running 2.20.5-11-g3c6758f with SNA on a Gen6 notebook (running Fedora 17 with latest updates), I frequently hit the following assertion:


Program received signal SIGABRT, Aborted.
0x00000034f2635925 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00000034f2635925 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00000034f26370d8 in __GI_abort () at abort.c:91
#2  0x00000034f262e6a2 in __assert_fail_base (fmt=0x34f2778188 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7f867393d428 "list_is_empty(&kgem->requests)", file=file@entry=0x7f867393ccce "kgem.c", 
    line=line@entry=1774, function=function@entry=0x7f867393df62 "__kgem_is_idle") at assert.c:94
#3  0x00000034f262e752 in __GI___assert_fail (assertion=assertion@entry=0x7f867393d428 "list_is_empty(&kgem->requests)", 
    file=file@entry=0x7f867393ccce "kgem.c", line=line@entry=1774, function=function@entry=0x7f867393df62 "__kgem_is_idle")
    at assert.c:103
#4  0x00007f8673866d66 in __kgem_is_idle (kgem=0x7f86735a6238) at kgem.c:1774
#5  0x00007f86738f197b in kgem_is_idle (kgem=<optimized out>) at kgem.h:269
#6  can_switch_to_blt (sna=<optimized out>) at gen6_render.c:2385
#7  can_switch_to_blt (sna=<optimized out>) at gen6_render.c:2374
#8  0x00007f86738f58ae in prefer_blt_fill (bo=0x1d080e0, sna=0x7f86735a6010) at gen6_render.c:3644
#9  gen6_render_fill_boxes (sna=0x7f86735a6010, op=<optimized out>, format=<optimized out>, color=0x1b65bec, dst=0x1c50a30, 
    dst_bo=0x1d080e0, box=0x1d23c70, n=2) at gen6_render.c:3671
#10 0x00007f86738989b4 in sna_composite_rectangles (op=1 '\001', dst=0x1d1df50, color=0x1b65bec, num_rects=2, rects=0x1b65bf4)
    at sna_composite.c:861
#11 0x00000000004fb135 in ?? ()
#12 0x000000000043444a in ?? ()
#13 0x0000000000423485 in ?? ()
#14 0x00000034f2621735 in __libc_start_main (main=0x423110, argc=8, ubp_av=0x7fffb2db4cb8, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffb2db4ca8) at libc-start.c:226
#15 0x000000000042375d in _start ()
Comment 1 Chris Wilson 2012-08-27 18:01:11 UTC
Frequently enough to enable debug=full and hit it?

The implication behind the assert is that although we tested the youngest request and found that it was no longer active, a test upon one of the older ones declared that it was still busy and so returned early. Hmm.
Comment 2 Clemens Eisserer 2012-08-27 18:04:28 UTC
sure, I'll give it a try soon.

noticed sporadic crashes even without --enable-debug, even with 2.20.5, so I started testing again ;)
Comment 3 Chris Wilson 2012-08-27 18:12:19 UTC
Brown paper bag commit 8e10a5b348a37feadcf935ec7694e46cc0802bdf for the crashes
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Aug 26 14:53:12 2012 +0100

    sna/gen6+: Do not call sna_blt_composite() after prepping the composite op
    
    As sna_blt_composite() will overwrite parts of the composite op as it
    checks whether or not it can execute that operation, it will lead to a
    crash as the normal render path finds the op corrupt. (The BLT
    conversion functions cater for the cases where we may wish to switch
    pipelines after choosing src/dst bo.)
    
    Reported-by: rei4dan@gmail.com
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 4 Clemens Eisserer 2012-08-27 18:19:40 UTC
Created attachment 66185 [details]
xorg log
Comment 5 Chris Wilson 2012-08-27 18:23:21 UTC
Was that the same assertion failure?
Comment 6 Clemens Eisserer 2012-08-27 18:26:36 UTC
I haven't had a look, just waited until X was terminated.
I'll give it another try.
Comment 7 Clemens Eisserer 2012-08-27 18:35:03 UTC
Program received signal SIGSEGV, Segmentation fault.
0x00000034f2648377 in _IO_vfprintf_internal (s=s@entry=0x7fffd1cf7f80, 
    format=<optimized out>, 
    format@entry=0x7f1ee9fc7588 "%s: source is already on the gpu\n", 
    ap=ap@entry=0x7fffd1cf8548) at vfprintf.c:1576
1576		  process_string_arg (((struct printf_spec *) NULL));
(gdb) bt
#0  0x00000034f2648377 in _IO_vfprintf_internal (s=s@entry=0x7fffd1cf7f80, 
    format=<optimized out>, 
    format@entry=0x7f1ee9fc7588 "%s: source is already on the gpu\n", 
    ap=ap@entry=0x7fffd1cf8548) at vfprintf.c:1576
#1  0x00000034f2706d20 in ___vsnprintf_chk (s=0x7fffd1cf8100 "", 
    maxlen=<optimized out>, flags=1, slen=<optimized out>, 
    format=0x7f1ee9fc7588 "%s: source is already on the gpu\n", 
    args=0x7fffd1cf8548) at vsnprintf_chk.c:65
#2  0x00000000004700da in Xvscnprintf ()
#3  0x0000000000469627 in LogVMessageVerb ()
#4  0x0000000000468fec in ErrorF ()
#5  0x00007f1ee9ed9a76 in sna_poly_fill_rect (draw=0x2a4eb50, gc=0x264ed50, 
    n=3, rect=0x2a94510) at sna_accel.c:11634
#6  0x000000000050363d in ?? ()
#7  0x0000000000552100 in miPaintWindow ()
#8  0x0000000000552248 in miWindowExposures ()
#9  0x0000000000495e3c in ?? ()
#10 0x0000000000568698 in miHandleValidateExposures ()
#11 0x0000000000569612 in miSetShape ()
#12 0x00000000004d34d9 in ?? ()
#13 0x00000000004d383d in ?? ()
#14 0x00000000004d41e5 in ?? ()
#15 0x000000000043444a in ?? ()
---Type <return> to continue, or q <return> to quit---
#16 0x0000000000423485 in ?? ()
#17 0x00000034f2621735 in __libc_start_main (main=0x423110, argc=8, 
    ubp_av=0x7fffd1cf8ba8, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffd1cf8b98) at libc-start.c:226
#18 0x000000000042375d in _start ()
Comment 8 Clemens Eisserer 2012-08-27 18:37:27 UTC
Created attachment 66186 [details]
xorg log

that log corresponds with the segfault
Comment 9 Chris Wilson 2012-08-27 18:40:12 UTC
This should get you past that crash, and further into the debug:

commit 8218e5da2b177ca9cd0e2b1e7dbe114e5ef2ebf0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Aug 27 19:36:03 2012 +0100

    sna: Fix crash with broken DBG missing one of its arguments
    
    Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54127
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 10 Clemens Eisserer 2012-08-27 18:44:43 UTC
#0  0x00000034f2635925 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00000034f26370d8 in __GI_abort () at abort.c:91
#2  0x00000034f262e6a2 in __assert_fail_base (
    fmt=0x34f2778188 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7fe8308569e0 "list_is_empty(&kgem->requests)", 
    file=file@entry=0x7fe830855cee "kgem.c", line=line@entry=1774, 
    function=function@entry=0x7fe830858f9c "__kgem_is_idle") at assert.c:94
#3  0x00000034f262e752 in __GI___assert_fail (
    assertion=0x7fe8308569e0 "list_is_empty(&kgem->requests)", 
    file=0x7fe830855cee "kgem.c", line=1774, 
    function=0x7fe830858f9c "__kgem_is_idle") at assert.c:103
#4  0x00007fe83073c54c in __kgem_is_idle (kgem=0x7fe83046a240) at kgem.c:1774
#5  0x00007fe8307f9037 in kgem_is_idle (kgem=0x7fe83046a240) at kgem.h:269
#6  0x00007fe830800409 in can_switch_to_blt (sna=0x7fe83046a010)
    at gen6_render.c:2385
#7  0x00007fe830804653 in prefer_blt_fill (sna=0x7fe83046a010, bo=0x1494650)
    at gen6_render.c:3644
#8  0x00007fe830804785 in gen6_render_fill_boxes (sna=0x7fe83046a010, 
    op=1 '\001', format=537004168, color=0x7fe82f6c1a5c, dst=0x163e530, 
    dst_bo=0x1494650, box=0x7fffc45158b0, n=1) at gen6_render.c:3671
#9  0x00007fe830784537 in sna_composite_rectangles (op=1 '\001', 
    dst=0x161f110, color=0x7fe82f6c1a5c, num_rects=1, rects=0x7fe82f6c1a64)
---Type <return> to continue, or q <return> to quit---
    at sna_composite.c:861
#10 0x00000000004fb135 in ?? ()
#11 0x000000000043444a in ?? ()
#12 0x0000000000423485 in ?? ()
#13 0x00000034f2621735 in __libc_start_main (main=0x423110, argc=8, 
    ubp_av=0x7fffc4515b38, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffc4515b28) at libc-start.c:226
#14 0x000000000042375d in _start ()
Comment 11 Clemens Eisserer 2012-08-27 18:48:00 UTC
please find the corresponding log-file at: http://93.83.133.214/Xorg.21.log.7za
Comment 12 Chris Wilson 2012-08-27 19:23:24 UTC
Ok, that debug log corresponds with your assertion. The sequence of events is consistent with my intended design, just the kernel is throwing a spanner in the works here I guess. Let me add another DBG and see if I can shed a little more light on this.
Comment 13 Chris Wilson 2012-08-27 20:08:12 UTC
Ok, I can reproduce this locally. Obviously my logic is not as foolproof as I thought.
Comment 14 Chris Wilson 2012-08-27 20:29:06 UTC
Of course! This didn't take into account the multiple rings on a device like SandyBridge. Hmm.
Comment 15 Chris Wilson 2012-08-27 21:15:48 UTC
That turned out to be a very useful assertion indeed.

commit 96a921487ef00db03a12bec7b0821410d6b74c31
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Aug 27 21:50:32 2012 +0100

    sna: Track outstanding requests per-ring
    
    In order to properly track when the GPU is idle, we need to account for
    the completion order that may differ on architectures like SandyBridge
    with multiple mostly independent rings.
    
    Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54127
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 16 Clemens Eisserer 2012-08-27 21:31:25 UTC
Thanks :)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.