Bug 33046 - [bisected]glean/pixelFormats and 3 oglc cases segfault
[bisected]glean/pixelFormats and 3 oglc cases segfault
Status: VERIFIED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965
7.10
All Linux (All)
: high major
Assigned To: Ian Romanick
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-12 23:38 UTC by fangxun
Modified: 2011-04-06 02:39 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Don't leak the tex object miptree when replacing it (998 bytes, patch)
2011-03-28 10:19 UTC, Ian Romanick
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description fangxun 2011-01-12 23:38:58 UTC
System Environment:
--------------------------
Arch:           x86_64
Platform:       piketon
Libdrm:		(master)2.4.23-4-gbad5242a59aa8e31cf10749e2ac69b3c66ef7da0
Mesa:		(master)1e4f412242391000eea3fd28452865c3d27f987d
Xserver:	(master)xorg-server-1.9.99.901-94-g6358a60065eef167d4e5f4afd981ff26deeba80d
Xf86_video_intel:(master)2.14.0-7-gfd9235ebe03a01982238cdd6e8b55f613e14b6af
Kernel:	(drm-intel-next) 34da1327c3814781925396fa10c42f596588ff76


Bug detailed description:
-------------------------
3 oglc cases are pxstore-tex, blend-separate and texRect. This issue happens on piketon and pineview.
Dmesg shows: oglconform[13146]: segfault at 0 ip 00007fa91b7dd590 sp 00007fff12e0b660 error 4 in i965_dri.so[7fa91b6c5000+36e000]

Bisect find 48024fb44cbbccd0c688949084ef249d3c1208ab is the first bad commit
commit 48024fb44cbbccd0c688949084ef249d3c1208ab
Author: Eric Anholt <eric@anholt.net>
Date:   Mon Jan 10 12:05:14 2011 -0800

    intel: When making a new teximage miptree, make a full one.

    If we hit this path, we're level 1+ and the base level got allocated
    as a single level instead of a full tree (so we don't match
    intelObj->mt).  This tries to recover from that so that we end up with
    2 allocations and 1 validation blit (old -> new) instead of
    allocations equal to number of levels and levels - 1 blits.
Comment 1 Gordon Jin 2011-03-01 00:48:22 UTC
any comments? evaluation for revert?
Comment 2 Ian Romanick 2011-03-01 18:28:27 UTC
Here's a backtrace from one of the tests.  The NULL dstAddr in _mesa_texstore_argb8888 seems suspect.  The odd thing is that the test consistently runs for all of the visuals up to 0x93.  I don't see anything special about that visual.  In fact, it looks identical to 0x92, which does not crash.

#0  0x00007ffff640b314 in _mesa_texstore_argb8888 (ctx=0x2110df0, dims=1, 
    baseInternalFormat=6408, dstFormat=MESA_FORMAT_ARGB8888, dstAddr=0x0, 
    dstXoffset=0, dstYoffset=0, dstZoffset=0, dstRowStride=64, 
    dstImageOffsets=0x7ff1b90, srcWidth=4, srcHeight=1, srcDepth=1, 
    srcFormat=6408, srcType=5126, srcAddr=0x81fe530, srcPacking=0x2120038)
    at main/texstore.c:1582
#1  0x00007ffff6412ddb in _mesa_texstore (ctx=0x2110df0, dims=1, 
    baseInternalFormat=6408, dstFormat=MESA_FORMAT_ARGB8888, dstAddr=0x0, 
    dstXoffset=0, dstYoffset=0, dstZoffset=0, dstRowStride=64, 
    dstImageOffsets=0x7ff1b90, srcWidth=4, srcHeight=1, srcDepth=1, 
    srcFormat=6408, srcType=5126, srcAddr=0x81fe530, srcPacking=0x2120038)
    at main/texstore.c:4195
#2  0x00007ffff6300d11 in intelTexImage (ctx=0x2110df0, dims=1, target=3552, 
    level=0, internalFormat=6408, width=4, height=1, depth=1, border=0, 
    format=6408, type=5126, pixels=0x81fe530, unpack=0x2120038, 
    texObj=0x57892b0, texImage=0x6ee6e90, imageSize=0, compressed=0 '\000')
    at intel_tex_image.c:493
#3  0x00007ffff6300f77 in intelTexImage1D (ctx=0x2110df0, target=3552, 
    level=0, internalFormat=6408, width=4, border=0, format=6408, type=5126, 
    pixels=0x81fe530, unpack=0x2120038, texObj=0x57892b0, texImage=0x6ee6e90)
    at intel_tex_image.c:558
#4  0x00007ffff63fd6a8 in teximage (ctx=0x2110df0, dims=1, target=3552, 
    level=0, internalFormat=6408, width=4, height=1, depth=1, border=0, 
    format=6408, type=5126, pixels=0x81fe530) at main/teximage.c:2448
#5  0x00007ffff63fd8c4 in _mesa_TexImage1D (target=3552, level=0, 
    internalFormat=6408, width=4, border=0, format=6408, type=5126, 
    pixels=0x81fe530) at main/teximage.c:2496
#6  0x00000000006e83c5 in ClearAll (test=0x574b8c0)
    at /home/idr/devel/graphics/oglconform_31/src/OGLconform/pxstore-tex.c:552
#7  0x00000000006e8c48 in TestStorage (w=4, h=4, baseFormat=0x1915548, 
    baseType=0x1915770, texFormat=0x1917380, store=0x7fffffffba70, 
    testName=0x7fffffffb670 "Exercising 4x4 4-byte alignment.", 
    clampComponents=1 '\001')
    at /home/idr/devel/graphics/oglconform_31/src/OGLconform/pxstore-tex.c:736
#8  0x00000000006e9600 in TexImageStorageExec ()
    at /home/idr/devel/graphics/oglconform_31/src/OGLconform/pxstore-tex.c:972
#9  0x0000000000da5771 in callFunctionHandleExceptionsInner (funcWithParams=0, 
    func=0x6e8deb <TexImageStorageExec()>, params=0x7fffffffbcc0, 
    msg=0x7fffffffbb70 "during pxstore-tex test execution. \n\nIntel Conformance (pxstore-tex) failed.\n")
    at /home/idr/devel/graphics/oglconform_31/src/CONFSHEL/driverExceptionHandling.cpp:33
#10 0x0000000000da592a in callFunctionHandleExceptions (funcWithParams=0, 
    func=0x6e8deb <TexImageStorageExec()>, params=0x7fffffffbcc0)
    at /home/idr/devel/graphics/oglconform_31/src/CONFSHEL/driverExceptionHandling.cpp:62
#11 0x0000000000da4b4a in DriverExec (Func=0x6e8deb <TexImageStorageExec()>, 
    FuncWithParams=0, params=0x7fffffffbcc0)
    at /home/idr/devel/graphics/oglconform_31/src/CONFSHEL/driver.c:436
#12 0x0000000000da4c7c in DriverExecRGB (
    Func=0x6e8deb <TexImageStorageExec()>, FuncWithParams=0, 
    testRecord=0x1dbdfc0, testFilter=0x0)
    at /home/idr/devel/graphics/oglconform_31/src/CONFSHEL/driver.c:480
#13 0x0000000000da5173 in Driver (testSchedule=0x1dbdfc0, testFilter=0x0)
    at /home/idr/devel/graphics/oglconform_31/src/CONFSHEL/driver.c:597
#14 0x0000000000d74f9f in Exec (ptr=0x1cd63a0)
    at /home/idr/devel/graphics/oglconform_31/src/CONFSHEL/shell.c:965
#15 0x0000000000da22fe in tkExec (Func=0xd74f45 <Exec(TK_EventRec*)>)
    at /home/idr/devel/graphics/oglconform_31/src/CONFSHEL/ctkx.c:322
#16 0x0000000000d6da6b in tkShellExecute (windSizeX=100, windSizeY=100, 
    ExecFunc=0xd74f45 <Exec(TK_EventRec*)>, testSchedule=0x1dbdfc0)
    at /home/idr/devel/graphics/oglconform_31/src/CONFSHEL/ctkshell.c:216
#17 0x0000000000d75101 in main (argc=3, argv=0x7fffffffe068)
    at /home/idr/devel/graphics/oglconform_31/src/CONFSHEL/shell.c:1031
Comment 3 Ian Romanick 2011-03-26 12:15:41 UTC
I spent a bit more time looking at this yesterday.  I have a couple additional data points.

The crash only happens after the test has been running for a long time.  If I just run with visual 0x93, it never crashes.

I have also seen a different crash.  This one occurs in prepare_constant_buffer (brw_curbe.c) when it tried to memcpy into the buffer.  brw->curbe.curbe_bo->virtual is NULL, and the memcpy faults.  This one happens a lot less frequently, and I don't have a full backtrace handy.
Comment 4 Ian Romanick 2011-03-28 10:19:59 UTC
Created attachment 44940 [details] [review]
Don't leak the tex object miptree when replacing it

When the miptree for an object was replaced with the miptree created for an individual image, the old miptree was leaked.  I believe that Eric assumed intel_miptree_reference would release the old miptree.

I wonder if a better fix would be to change intel_miptree_reference so that it does unreference the old miptree.  Opinions?
Comment 5 Ian Romanick 2011-03-28 20:19:06 UTC
commit e21beaeb10711a38276d704c4e058cb07f9b23e6
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Mon Mar 28 10:13:25 2011 -0700

    intel: Don't leak the tex object miptree when replacing it
    
    Eventually the miptree refcounting interface should be cleaned up.
    The assymmetry dramatically increases the probability of bugs like
    this.  It should be made to like like libdrm refcounting or the
    refcounting style used in other parts of Mesa.
    
    Fixes https://bugs.freedesktop.org/show_bug.cgi?id=33046
    
    Reviewed-by: Eric Anholt <eric@anholt.net>
Comment 6 fangxun 2011-03-30 20:32:16 UTC
Three oglc cases(pxstore-tex, blend-separate and texRect) was fixed by the fixed commit. 

Glean pixelFormats still fails. It passed before the first bad commit 48024fb and segfault after this commit.
Before the fixed commit e21bea, it shows in dmesg: "glean[4867]: segfault at 0 ip 00007fa75e0198ef sp 00007fffb4b28900 error 6 in i965_dri.so[7fa75df0a000+38b000]".  
After the fixed commit e21bea, it shows in dmesg: "glean[16967] trap divide error ip:7fe7a1812ac1 sp:7fff6ff94220 error:0 in i965_dri.so[7fe7a17cf000+38e000]".

Should I file a new bug for this case?
Comment 7 Ian Romanick 2011-04-02 17:45:02 UTC
(In reply to comment #6)
> Three oglc cases(pxstore-tex, blend-separate and texRect) was fixed by the
> fixed commit. 
> 
> Glean pixelFormats still fails. It passed before the first bad commit 48024fb
> and segfault after this commit.
> Before the fixed commit e21bea, it shows in dmesg: "glean[4867]: segfault at 0
> ip 00007fa75e0198ef sp 00007fffb4b28900 error 6 in
> i965_dri.so[7fa75df0a000+38b000]".  
> After the fixed commit e21bea, it shows in dmesg: "glean[16967] trap divide
> error ip:7fe7a1812ac1 sp:7fff6ff94220 error:0 in
> i965_dri.so[7fe7a17cf000+38e000]".
> 
> Should I file a new bug for this case?

Yes, please.

The pixelFormats failure is a different issue.  This *should* have bisected to 0be36997.  This patch makes drivers that previously advertised GL_MESA_texture_signed_rgba magically start advertising GL_EXT_texture_snorm.  However, these drivers don't magically know about the formats added with GL_EXT_texture_snorm, so they run into problems when the encounter them.  This causes the i965 driver to hit an assertion failure when glean's pixelFormats test gives it an internal format of GL_RED_SNORM:

glean: main/teximage.c:2384: _mesa_choose_texture_format: Assertion `f != MESA_FORMAT_NONE' failed.
Comment 8 Gordon Jin 2011-04-02 22:46:05 UTC
I assume this fix hasn't been cherry-picked to 7.10 branch.
Comment 9 Ian Romanick 2011-04-04 09:18:15 UTC
The changes that caused the segfaults only exist in master.
Comment 10 fangxun 2011-04-06 02:39:04 UTC
File a new bug 36022 to trace glean pixelFormats.