Bug 55700

Summary: [SNA 865g] sna/kgem.c:2192: Assertion `kgem->nbatch <= ((kgem)->batch_size-1)' failed
Product: xorg Reporter: tka <tka>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: CLOSED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description tka 2012-10-06 19:56:01 UTC
After bug #55455 was fixed, I have not seen any screen corruption. Unfortunately, the X server is killed from time to time due to an assertion failure.

xf86-video-intel version: git 3680aa4976407886eb4be9878d5296d5a1fadccf

Backtrace:

Program received signal SIGABRT, Aborted.
#0  0xb759b424 in __kernel_vsyscall ()
#1  0xb7234ff7 in __GI_raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0xb7236494 in __GI_abort () at abort.c:91
#3  0xb722e699 in __assert_fail_base (
    fmt=0xb7355d68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0xb70380d8 "kgem->nbatch <= ((kgem)->batch_size-1)", 
    file=0xb7037ad0 "/var/tmp/portage/x11-drivers/xf86-video-intel-2.20.9/work/xf86-video-intel-2.20.9/src/sna/kgem.c", line=2192, 
    function=0xb7038f9f <__PRETTY_FUNCTION__.24035> "_kgem_submit")
    at assert.c:94
#4  0xb722e743 in __GI___assert_fail (
    assertion=0xb70380d8 "kgem->nbatch <= ((kgem)->batch_size-1)", 
    file=0xb7037ad0 "/var/tmp/portage/x11-drivers/xf86-video-intel-2.20.9/work/xf86-video-intel-2.20.9/src/sna/kgem.c", line=2192, 
    function=0xb7038f9f <__PRETTY_FUNCTION__.24035> "_kgem_submit")
    at assert.c:103
#5  0xb6f5b4d6 in _kgem_submit (kgem=0xb6ed41a0)
    at /var/tmp/portage/x11-drivers/xf86-video-intel-2.20.9/work/xf86-video-intel-2.20.9/src/sna/kgem.c:2192
#6  0xb6f8c03a in sna_blt_fill_boxes (sna=0xb6ed4008, alu=3 '\003', 
    bo=0xb9060930, bpp=32, pixel=4292466646, box=0xb902e568, nbox=4)
    at /var/tmp/portage/x11-drivers/xf86-video-intel-2.20.9/work/xf86-video-intel-2.20.9/src/sna/sna_blt.c:2290
#7  0xb6fd065c in gen2_render_fill_boxes_try_blt (op=1 '\001', n=4, 
    box=0xb902e568, dst_bo=0xb9060930, color=0xb90a063c, format=537036936, 
    sna=0xb6ed4008, dst=<optimized out>)
    at /var/tmp/portage/x11-drivers/xf86-video-intel-2.20.9/work/xf86-video-intel-2.20.9/src/sna/gen2_render.c:2437
#8  gen2_render_fill_boxes_try_blt (op=1 '\001', n=4, box=0xb902e568, 
    dst_bo=0xb9060930, color=0xb90a063c, format=537036936, sna=0xb6ed4008, 
    dst=<optimized out>)
    at /var/tmp/portage/x11-drivers/xf86-video-intel-2.20.9/work/xf86-video-intel-2.20.9/src/sna/gen2_render.c:3209
#9  gen2_render_fill_boxes (sna=0xb6ed4008, op=<optimized out>, 
    format=537036936, color=0xb90a063c, dst=0xb908b990, dst_bo=0xb9060930, 
    box=0xb902e568, n=4)
    at /var/tmp/portage/x11-drivers/xf86-video-intel-2.20.9/work/xf86-video-intel-2.20.9/src/sna/gen2_render.c:2506
#10 0xb6f8f71e in sna_composite_rectangles (op=1 '\001', dst=0xb9093768, 
    color=0xb90a063c, num_rects=4, rects=0xb90a0644)
    at /var/tmp/portage/x11-drivers/xf86-video-intel-2.20.9/work/xf86-video-intel-2.20.9/src/sna/sna_composite.c:874
#11 0xb76e8153 in CompositeRects (op=1 '\001', pDst=0xb9093768, 
    color=0xb90a063c, nRect=4, rects=0xb90a0644)
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/render/picture.c:1626
#12 0xb76ed940 in ProcRenderFillRectangles (client=0xb8fd7420)
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/render/render.c:1427
#13 0xb76e8993 in ProcRenderDispatch (client=0xb8fd7420)
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/render/render.c:1989
#14 0xb75f8ad1 in Dispatch ()
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/dix/dispatch.c:428
#15 0xb75e5baa in main (argc=9, argv=0xbfcd2fc4, envp=0xbfcd2fec)
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/dix/main.c:295
Continuing.

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.
Comment 1 Chris Wilson 2012-10-06 20:56:35 UTC
You are hitting this because 865g has to use tiny batches due to a hardware bug...

However, the error lies in the earlier operation that failed to account correctly for the space it used. Can you please reproduce with --enable=debug=full?
Comment 2 tka 2012-10-06 22:47:17 UTC
(In reply to comment #1)
> Can you please reproduce with --enable=debug=full?

Interestingly, I can not. I can reproduce it with plain --enable-debug. In this case, I just have to run something that writes a lot in a terminal (rsync) and start firefox. Within a few seconds of using firefox, the assertion fails. However, that does not work with --enable-debug=full. So, I will continue using --enable-debug=full and hope to catch that thing eventually.
Comment 3 Chris Wilson 2012-10-07 08:01:16 UTC
Added a pair of assertions that I hope will fire earlier:

commit d2a26adc8e7b02aea204101f207f740bbde62414
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Oct 7 08:59:32 2012 +0100

    sna/gen2: Add a couple of assertions to track down a batch overflow
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=55700
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 4 tka 2012-10-07 15:17:43 UTC
Still no luck with --enable-debug=full. With plain --enable-debug, it is still the assertion in kgem.c:2192 that fails. By the way, at the time of the failure, kgem->nbatch=1024 and kgem->batch_size=1024, so it is off by one.
Comment 5 Chris Wilson 2012-10-07 15:57:39 UTC
Just to be clear: you continue to get the same assertion in _kgem_submit() after applying today's new assertions?

In which case I've miscounted how many dwords are required for an op...
Comment 6 Chris Wilson 2012-10-07 16:11:43 UTC
If you can also dump the last 20 or so elements of the batch, that should help identify the who overran.
Comment 7 tka 2012-10-07 16:25:27 UTC
(In reply to comment #5)
> Just to be clear: you continue to get the same assertion in _kgem_submit()
> after applying today's new assertions?
Yes, with today's patches applied, I still trigger the same assertion as before.

(In reply to comment #6)
> If you can also dump the last 20 or so elements of the batch, that should
> help identify the who overran.
I will do this later today.
Comment 8 tka 2012-10-07 18:49:49 UTC
(gdb) p kgem->batch[1000]@25
$3 = {4278255360, 4278255360, 0, 0, 1228931073, 62128128, 62193673, 
  1228931073, 62193664, 62980097, 1228931073, 62193672, 62980105, 1228931073, 
  62980096, 63045641, 1425014790, 63709184, 1265, 63112448, 16777216, 0, 64, 
  42991616, 0}

(Everything else is as before.)
Comment 9 Chris Wilson 2012-10-07 21:46:03 UTC
Thanks for the grabbing the batch, I believe this should fix it:

commit 2ac3776be85d857a57ce7b742e52cd6091d2befb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Oct 7 22:41:25 2012 +0100

    sna: Check that we have sufficient space for a copy when replacing a fill
    
    Reported-by: Timo Kamph <timo@kamph.org>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55700
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 10 tka 2012-10-07 22:56:02 UTC
After a first test, it seems to be fixed. I will test it more thoroughly tomorrow.

However, kgem_check_exec_and_reloc() is undeclared in sna_io.c. The calls were introduced in commit 1a5d5b9a.
Comment 11 tka 2012-10-08 22:12:12 UTC
Looks good, no failure so far. Thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.