Bug 19911

Summary:	[i965] Xorg lockup with incorrect usage of VBOs
Product:	Mesa	Reporter:	Peter Clifton <pcjc2>
Component:	Drivers/DRI/i965	Assignee:	Eric Anholt <eric>
Status:	RESOLVED FIXED	QA Contact:	Xorg Project Team <xorg-team>
Severity:	normal
Priority:	high	CC:	arekm, bgamari, kedgedev
Version:	unspecified
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	20277
Attachments:	Test case to trigger the crash Xorg log (for driver information

Description Peter Clifton 2009-02-02 09:41:24 UTC

With careful introduction of programming errors, it is posible to lock up the Xorg server through broken use of VBOs inside a direct renering client.

The backtrace is:

#0  0xb807f430 in __kernel_vsyscall ()
#1  0xb7d04ce9 in ioctl () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7b0aadd in drmIoctl () from /usr/lib/libdrm.so.2
#3  0xb7b0aee2 in drmCommandNone () from /usr/lib/libdrm.so.2
#4  0xb7aa115f in I830BlockHandler (i=0, blockData=0x0, pTimeout=0xbfb9ace8, 
    pReadmask=0x81f73e0) at ../../src/i830_driver.c:2623
#5  0x0817bf1b in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, 
    pTimeout=0xbfb9ace8, pReadmask=0x81f73e0) at ../../render/animcur.c:222
#6  0x08144eb8 in compBlockHandler (i=0, blockData=0x0, pTimeout=0xbfb9ace8, 
    pReadmask=0x81f73e0) at ../../composite/compinit.c:158
#7  0x08091088 in BlockHandler (pTimeout=0xbfb9ace8, pReadmask=0x81f73e0)
    at ../../dix/dixutils.c:384
#8  0x081318a4 in WaitForSomething (pClientsReady=0xa3618f8)
    at ../../os/WaitFor.c:215
#9  0x0808d18e in Dispatch () at ../../dix/dispatch.c:367
#10 0x080721bd in main (argc=10, argv=0xbfb9ae34, envp=Cannot access memory at address 0x6460

Comment 1 Peter Clifton 2009-02-02 09:44:34 UTC

Created attachment 22489 [details]
Test case to trigger the crash

This might not be the minimal test-case, but I've so far been unable to un-wedge the GPU once this lock-up has occurred - so testing each crash requires a full reboot.

Comment 2 Peter Clifton 2009-02-02 09:46:32 UTC

Created attachment 22490 [details]
Xorg log (for driver information

Comment 3 Peter Clifton 2009-02-02 09:50:19 UTC

I couldn't reproduce it when my second buffer was created on the stack, only when I malloc / free the second buffer. Perhaps this gives some clues.

Comment 4 Roman Jarosz 2009-02-12 14:35:25 UTC

I see the same thing when using UXA (random freezes), it would be great if you could fix this, because UXA is unusable with this bug an EXA is slow :(

My backtrace:
#0  0x00007f114589b027 in ioctl () from /lib/libc.so.6
No symbol table info available.
#1  0x00007f1144b31c63 in drmIoctl (fd=11, request=25688, arg=0x0)
    at xf86drm.c:187
        ret = -1
#2  0x00007f1144b31f66 in drmCommandNone (fd=11,
    drmCommandIndex=<value optimized out>) at xf86drm.c:2313
No locals.
#3  0x00007f11446ac798 in I830BlockHandler (i=0, blockData=0x0,
    pTimeout=0x7fff503afd78, pReadmask=0x7d4e60) at i830_driver.c:2737
        flushed = <value optimized out>
        pScreen = (ScreenPtr) 0x862f40
        pScrn = (ScrnInfoPtr) 0x813ce0
        pI830 = (I830Ptr) 0x8163d0
#4  0x0000000000530a38 in AnimCurScreenBlockHandler (screenNum=0,
    blockData=0x0, pTimeout=0x7fff503afd78, pReadmask=0x7d4e60)
    at animcur.c:222
        pScreen = (ScreenPtr) 0x862f40
        as = (AnimCurScreenPtr) 0x35ce280
        dev = (DeviceIntPtr) 0x0
        now = 0
        soonest = 4294967295
#5  0x00000000004fcaae in compBlockHandler (i=0, blockData=0x0,
---Type <return> to continue, or q <return> to quit---
    pTimeout=0x7fff503afd78, pReadmask=0x7d4e60) at compinit.c:158
        pScreen = (ScreenPtr) 0x862f40
        cs = (CompScreenPtr) 0x35b7660
#6  0x000000000044f2fb in BlockHandler (pTimeout=0x7fff503afd78,
    pReadmask=0x7d4e60) at dixutils.c:384
        i = 1
#7  0x00000000004ead91 in WaitForSomething (pClientsReady=0x3662910)
    at WaitFor.c:215
        i = 86708032
        waittime = {tv_sec = 996394, tv_usec = 117000}
        wt = (struct timeval *) 0x7fff503afd60
        timeout = <value optimized out>
        clientsReadable = {fds_bits = {0 <repeats 16 times>}}
        clientsWritable = {fds_bits = {86708032, 56535488, 512, 79663048, 32,
    32, 0, 32, 110854160, 5192572, 140734539431104, 5174049, 31,
    140734539431184, 0, 110854160}}
        curclient = <value optimized out>
        selecterr = 0
        nready = <value optimized out>
        devicesReadable = {fds_bits = {40, 65518993, 1073741825,
    140734539430996, 16, 65536, 140734539431324, 40, 139712160475648,
    79663048, 16, 140734539431028, 16, 56583792, 57026832, 5332129}}
        now = 86708032
---Type <return> to continue, or q <return> to quit---
        someReady = 0
#8  0x000000000044b750 in Dispatch () at dispatch.c:367
        result = 0
        client = (ClientPtr) 0x52b0f40
        nready = -1
        start_tick = <value optimized out>
#9  0x00000000004319fd in main (argc=10, argv=0x7fff503aff58,
    envp=<value optimized out>) at main.c:397
        i = 1
        alwaysCheckForInput = {0, 1}

Comment 5 Eric Anholt 2009-02-25 11:14:22 UTC

Yeah, while you're passing garbage to GL, from the spec it sounds like we should not render (or kill your app), but not hang the GPU.

Comment 6 Ian Romanick 2009-02-25 14:06:12 UTC

(In reply to comment #2)
> Created an attachment (id=22490) [details]
> Xorg log (for driver information
> 

(In reply to comment #1)
> Created an attachment (id=22489) [details]
> Test case to trigger the crash
> 
> This might not be the minimal test-case, but I've so far been unable to
> un-wedge the GPU once this lock-up has occurred - so testing each crash
> requires a full reboot.

I was discussion this with Eric on IRC.  Looking at the glDrawArrays man page, the second draw call shouldn't do *anything* because GL_VERTEX_ARRAY is disabled:

       When  glDrawArrays  is  called,  it uses count sequential elements from
       each enabled array to construct a  sequence  of  geometric  primitives,
       beginning  with  element  first. mode specifies what kind of primitives
       are constructed, and how the array elements construct those primitives.
       If  GL_VERTEX_ARRAY  is not enabled, no geometric primitives are gener-
       ated.

Comment 7 Ian Romanick 2009-02-25 14:08:11 UTC

(In reply to comment #6)

> I was discussion this with Eric on IRC.  Looking at the glDrawArrays man page,
> the second draw call shouldn't do *anything* because GL_VERTEX_ARRAY is

And by second I mean the second one that draws GL_TRIANGLES.  This is actually the third call to glDrawArrays.

Comment 8 Eric Anholt 2009-02-25 16:29:32 UTC

Thanks for the great testcase!  piglit test added that reproduces the problem, patches sent out for review.

Comment 9 Brian Paul 2009-03-02 11:33:52 UTC

I've committed a modified version of Eric's patch from the mesa3d-mailing list (posted 2/25/09) that no-ops the glDrawArrays() call when there's no enabled vertex position array.

Commit 97dd2ddbd97ba95e8bc8ab572ec05e8081556e1e

Peter, could you test Mesa/master with this change and your original test case?

Comment 10 Arkadiusz Miskiewicz 2009-04-02 04:31:38 UTC

bug #19740 looks to be the same issue and I just hit it with mesa 7.4 which contains #9 commit (details in bug #19740).

Comment 11 Arkadiusz Miskiewicz 2009-04-03 16:03:22 UTC

mesa master from 1 hour ago and I also hit this:

0x00007fc499f48327 in ioctl () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fc499f48327 in ioctl () from /lib64/libc.so.6
#1  0x00007fc4987241c3 in drmIoctl (fd=7, request=25688, arg=0x0) at xf86drm.c:187
#2  0x00007fc4987244c6 in drmCommandNone (fd=7, drmCommandIndex=<value optimized out>) at xf86drm.c:2313
#3  0x00007fc49829c838 in I830BlockHandler (i=<value optimized out>, blockData=0x0, pTimeout=0x7fffa3efda88, pReadmask=0x7d1ea0) at i830_driver.c:2655
#4  0x000000000052d4b8 in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, pTimeout=0x7fffa3efda88, pReadmask=0x7d1ea0) at animcur.c:222
#5  0x00000000004f93fe in compBlockHandler (i=0, blockData=0x0, pTimeout=0x7fffa3efda88, pReadmask=0x7d1ea0) at compinit.c:158
#6  0x000000000044b170 in BlockHandler (pTimeout=0x7fffa3efda88, pReadmask=0x7d1ea0) at dixutils.c:384
#7  0x00000000004e7661 in WaitForSomething (pClientsReady=0x5571860) at WaitFor.c:215
#8  0x00000000004474f0 in Dispatch () at dispatch.c:367
#9  0x000000000042d63d in main (argc=7, argv=0x7fffa3efdc68, envp=<value optimized out>) at main.c:397

Comment 12 Arkadiusz Miskiewicz 2009-04-08 23:56:14 UTC

I'm using mesa from master (fetched at 20090408), xserver 1.6, intel driver from master, recent linux kernel from git, GM45.

My system locks up with test program #1. I need to run & stop & run #1 several times for lockup to happen. Running once is not enough.

I also applied http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg06658.html but it didn't help.

Backtrace is different:

0x00007fb060db6327 in ioctl () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fb060db6327 in ioctl () from /lib64/libc.so.6
#1  0x00007fb05eedcb05 in drm_intel_gem_bo_map_gtt (bo=0x5100d30) at intel_bufmgr_gem.c:721
#2  0x00007fb05f12e5ed in i830_uxa_prepare_access (pixmap=0x51fd370, access=UXA_ACCESS_RW) at i830_exa.c:865
#3  0x00007fb05f14d8c4 in uxa_check_poly_fill_rect (pDrawable=0x51fd370, pGC=0x3cc4800, nrect=1, prect=0x7fff6ad6a090) at uxa-unaccel.c:255
#4  0x00007fb05f14a84e in uxa_create_alpha_picture (pScreen=0xf2cba0, pDst=<value optimized out>, pPictFormat=0xf2d988, width=7, height=7) at uxa-render.c:841
#5  0x00007fb05f14ae4c in uxa_trapezoids (op=8 '\b', pSrc=0x4624ae0, pDst=0x51fce30, maskFormat=0xf2d988, xSrc=10, ySrc=7, ntrap=49, traps=0x4e78d64) at uxa-render.c:909
#6  0x000000000052ad8d in ProcRenderTrapezoids (client=0x3fcccf0) at render.c:782
#7  0x00000000004477bc in Dispatch () at dispatch.c:437
#8  0x000000000042d63d in main (argc=7, argv=0x7fff6ad6a368, envp=<value optimized out>) at main.c:397

Comment 13 Eric Anholt 2009-07-15 13:38:45 UTC

This works fine on my G45 and GM45 at this point.  Peter, do you still have the problem?

Comment 14 Eric Anholt 2009-08-03 17:58:27 UTC

I gave up on getting the 100% solution I wanted, and came up with:

commit d7430d942f6c7950a92367aeb13b80cf76ccad78
Author: Eric Anholt <eric@anholt.net>
Date:   Mon Aug 3 17:55:14 2009 -0700

    i965: Assert that the offset in the VBO is below the VBO size.
    
    This avoids sending a bad buffer address to the GPU due to programmer error,
    and is permitted by the ARB_vbo spec.  Note that we still have the opportuni
    to dereference past the end of the GPU, because we aren't clipping to a
    correct _MaxElement, but that appears to be harder than it should be.  This
    gets us the 90% solution.
    
    Bug #19911.

This fixes it again on my GM45 (which was lucking out somehow for a while, and then started failing).

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.