Bug 72926 - [REGRESSION,swrast] Memory-related crash with anti-aliasing enabled
[REGRESSION,swrast] Memory-related crash with anti-aliasing enabled
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/X11
unspecified
x86-64 (AMD64) Linux (All)
: medium normal
Assigned To: mesa-dev
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-20 18:25 UTC by Peter Wu
Modified: 2014-01-20 18:05 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
gdb bt full (62.60 KB, text/plain)
2013-12-20 18:25 UTC, Peter Wu
Details
api trace output (gzipped) (1.58 MB, application/x-gzip)
2013-12-27 12:17 UTC, Peter Wu
Details
valgrind apitrace mesa-a3ae5dc7dd5c2f8893f86a920247e690e550ebd4 (36.65 KB, text/plain)
2013-12-28 15:16 UTC, Peter Wu
Details
Output for `LIBGL_ALWAYS_SOFTWARE=1 valgrind glretrace -v java.trace` (408.55 KB, text/plain)
2013-12-30 15:27 UTC, Peter Wu
Details
gdb bt full for smaller C program "robot" (6.31 KB, text/plain)
2014-01-13 21:48 UTC, Peter Wu
Details
smaller apitrace output for "robot" program (190.87 KB, application/octet-stream)
2014-01-13 21:50 UTC, Peter Wu
Details
gdb debug session (with more details) (13.04 KB, text/plain)
2014-01-14 17:12 UTC, Peter Wu
Details
Small test program (robot.c) (1.33 KB, text/plain)
2014-01-14 19:07 UTC, Peter Wu
Details
proposed fix for the bug (1.79 KB, text/plain)
2014-01-14 22:25 UTC, Brian Paul
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Wu 2013-12-20 18:25:59 UTC
Created attachment 91053 [details]
gdb bt full

After upgrading Mesa 9.2.4 to 10.0.1, my Java program using JOGL crashes with a memory corruption error.

The attached GDB log was generated with Mesa a3ae5dc7dd5c2f8893f86a920247e690e550ebd4 ("draw: make sure that the stages setup outputs"), built with --enable-debug.

I enforce software rendering because that gives me in an order of magnitude better fps than i965 (glReadPixel is slow.):

    LIBGL_ALWAYS_SOFTWARE=1 java -cp ... RobotRace

With some versions of my program (new member variable, no other side-effects), it immediately crashes. For other versions, it crashes after modifying the center point in gl.glLookAt(). Let me know if you need more details (source, etc.).

Bisection leads to:
a3ae5dc7dd5c2f8893f86a920247e690e550ebd4 is the first bad commit
commit a3ae5dc7dd5c2f8893f86a920247e690e550ebd4
Author: Zack Rusin <zackr@vmware.com>
Date:   Fri Aug 9 10:11:31 2013 -0400

    draw: make sure that the stages setup outputs
    
    Calling the prepare outputs cleans up the slot assignments
    for outputs, unfortunately aapoint and aaline didn't have
    code to reset their slots after the initial setup, this
    was messing up our slot assignments. The unfilled stage
    was just missing the initial assignment of the face slot.
    This fixes all of the reported piglit failures.
    
    Signed-off-by: Zack Rusin <zackr@vmware.com>
    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

:040000 040000 fb87dfd2039663da7ff0fa6f12a5b0668fecee7f fc98438608d4df5bd64ff651bf9098aaabc5a262 M      src

LLVM: 3.3
Mesa: 10.0.1 (gdb from a3ae5dc7dd5c2f8893f86a920247e690e550ebd4)
JOGL: 2.1-b1135-20131101
Linux: v3.13-rc2-208-g8ecffd7
Xorg: 1.14.5
OpenJDK: 7.u45_2.4.3
Comment 1 Alexander Monakov 2013-12-27 12:01:45 UTC
I'd recommend to run it under valgrind and compare logs for before/after bad commit.

I'd also recommend to capture an apitrace trace and see if the problem is reproducible under glretrace.  If so, it may be more useful to valgrind glretrace rather than the original application.

Finally, with i965 glReadPixels into a PBO should not be slow.
Comment 2 Peter Wu 2013-12-27 12:17:14 UTC
Created attachment 91216 [details]
api trace output (gzipped)

Java has no love for valgrind, it hits the 10 million error count before a window even gets displayed.

apitrace was a great suggestion, the attached gzip-compressed api trace can be replayed to trigger a crash.

(generated without software rendering, that should not matter right? Otherwise it would crash in the middle.)
Comment 3 Alexander Monakov 2013-12-28 14:38:15 UTC
I don't have llvmpipe installed so I can't investigate further.  Have you tried running glretrace under valgrind before/after the offending commit?
Comment 4 Peter Wu 2013-12-28 15:16:37 UTC
Created attachment 91257 [details]
valgrind apitrace mesa-a3ae5dc7dd5c2f8893f86a920247e690e550ebd4

Steps to reproduce:
# good commit: 98d2498404ba69a3efc1c765b1a1885d151181ed
# bad commit:
git checkout a3ae5dc7dd5c2f8893f86a920247e690e550ebd4
NOCONFIGURE=1 ./autogen.sh
./configure --prefix=/tmp/mesa-root --with-gallium-drivers=swrast --with-dri-drivers=i965 --with-llvm-shared-libs --enable-gallium-llvm --enable-shared-glapi --enable-dri --enable-glx --enable-texture-float
make install

Run with:
LIBGL_DEBUG=verbose \
LD_LIBRARY_PATH=/tmp/mesa-root/lib \
LIBGL_DRIVERS_PATH=/tmp/mesa-root/lib/dri \
LIBGL_ALWAYS_SOFTWARE=1 \
valgrind glretrace /tmp/java.trace

This is tested with LLVM 3.3. The two conditional jump warnings are also present in the "good" commit, everything thereafter is new (starting from "Invalid write").
Comment 5 Peter Wu 2013-12-30 15:27:49 UTC
Created attachment 91336 [details]
Output for `LIBGL_ALWAYS_SOFTWARE=1 valgrind glretrace -v java.trace`

Same config, but with `-v` added to `glretrace`.
Comment 6 Peter Wu 2014-01-13 21:48:01 UTC
Created attachment 92000 [details]
gdb bt full for smaller C program "robot"
Comment 7 Peter Wu 2014-01-13 21:50:47 UTC
Created attachment 92001 [details]
smaller apitrace output for "robot" program

This is a smaller test case, the previous gdb output was generated using Mesa 10.0.2 + LLVM 3.4.

./configure line:

LDFLAGS='-fsanitize=address -lasan' 
CFLAGS='-g -O0 -fsanitize=address -fno-omit-frame-pointer' \
CXXFLAGS="$CFLAGS" \
./configure --enable-debug --prefix=/tmp/mesa-root \
--with-gallium-drivers=swrast --with-llvm-shared-libs \
--enable-gallium-llvm --enable-shared-glapi --enable-dri \
--enable-glx --with-dri-drivers=
Comment 8 Peter Wu 2014-01-13 22:34:43 UTC
bisecting with the small program (via glretrace) and ASN + -O0 and -g still points to the same faulty commit:

a3ae5dc7dd5c2f8893f86a920247e690e550ebd4 is the first bad commit
commit a3ae5dc7dd5c2f8893f86a920247e690e550ebd4
Author: Zack Rusin <zackr@vmware.com>
Date:   Fri Aug 9 10:11:31 2013 -0400

    draw: make sure that the stages setup outputs
    
    Calling the prepare outputs cleans up the slot assignments
    for outputs, unfortunately aapoint and aaline didn't have
    code to reset their slots after the initial setup, this
    was messing up our slot assignments. The unfilled stage
    was just missing the initial assignment of the face slot.
    This fixes all of the reported piglit failures.
    
    Signed-off-by: Zack Rusin <zackr@vmware.com>
    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

:040000 040000 fb87dfd2039663da7ff0fa6f12a5b0668fecee7f fc98438608d4df5bd64ff651bf9098aaabc5a262 M      src

git bisect log:

git bisect start
# bad: [277dbf08b0e78fe6cff0fc751768a6f3d33e61f7] glsl: Remove exec_list iterators now that nothing uses them.
git bisect bad 277dbf08b0e78fe6cff0fc751768a6f3d33e61f7
# skip: [3e385d1bc314a50c9572b04210c4d6ac1b0a7381] docs: Add release notes for the 9.2.4 release.
git bisect skip 3e385d1bc314a50c9572b04210c4d6ac1b0a7381
# good: [3e385d1bc314a50c9572b04210c4d6ac1b0a7381] docs: Add release notes for the 9.2.4 release.
git bisect good 3e385d1bc314a50c9572b04210c4d6ac1b0a7381
# skip: [9f07ca11c1797ac12de1e1c6aef13cf58824b5f5] mesa: Dispatch ARB_framebuffer_object and EXT_framebuffer_object differently
git bisect skip 9f07ca11c1797ac12de1e1c6aef13cf58824b5f5
# skip: [9f07ca11c1797ac12de1e1c6aef13cf58824b5f5] mesa: Dispatch ARB_framebuffer_object and EXT_framebuffer_object differently
git bisect skip 9f07ca11c1797ac12de1e1c6aef13cf58824b5f5
# bad: [8d4ecbccd6a5608005b5c8f473d9a44dbde0b08d] i965: Remove #define name from PCI ID table.
git bisect bad 8d4ecbccd6a5608005b5c8f473d9a44dbde0b08d
# bad: [7086636358b611a2bb124253e1fe870107e1cecb] nvc0/ir: fix use after free in texture barrier insertion pass
git bisect bad 7086636358b611a2bb124253e1fe870107e1cecb
# bad: [e858921d527bfcbbda27760f781c25cab469e852] ilo: implement new float comparison instructions
git bisect bad e858921d527bfcbbda27760f781c25cab469e852
# bad: [e858921d527bfcbbda27760f781c25cab469e852] ilo: implement new float comparison instructions
git bisect bad e858921d527bfcbbda27760f781c25cab469e852
# good: [6065a87bce0c3fb0d9694c381c5a31b63e1f0300] glsl: Cross-validate GS layout qualifiers while intrastage linking.
git bisect good 6065a87bce0c3fb0d9694c381c5a31b63e1f0300
# good: [6065a87bce0c3fb0d9694c381c5a31b63e1f0300] glsl: Cross-validate GS layout qualifiers while intrastage linking.
git bisect good 6065a87bce0c3fb0d9694c381c5a31b63e1f0300
# good: [331a8fa41d174c74afe58f43a5943627398eac6b] gallium-egl: Simplify native_wayland_drm_bufmgr_helper interface
git bisect good 331a8fa41d174c74afe58f43a5943627398eac6b
# good: [2c32c3985ca6232a81d21feb9ac6443145b42d0e] i965/fs: Consider predicated SEL instructions as whole variable writes.
git bisect good 2c32c3985ca6232a81d21feb9ac6443145b42d0e
# good: [438cc6bc49d109f9ddeed6a741c4f0b8f1c4ffe2] mesa: Make detach_renderbuffer available outside fbobject.c
git bisect good 438cc6bc49d109f9ddeed6a741c4f0b8f1c4ffe2
# good: [336351e971d6232bbed11d9812ebf05341b6aa36] glsl/ast: Check that geometry shader interface block inputs are arrays.
git bisect good 336351e971d6232bbed11d9812ebf05341b6aa36
# good: [98d2498404ba69a3efc1c765b1a1885d151181ed] glsl: Fix incorrect pattern matching in ir_set_program_inouts
git bisect good 98d2498404ba69a3efc1c765b1a1885d151181ed
# bad: [c6c55ad3e967f3d151c24795a99634b297c13fde] gallivm: fix border color with normalized texture formats
git bisect bad c6c55ad3e967f3d151c24795a99634b297c13fde
# bad: [27cedd8aecccea808a35ef297477cac5fe87e476] llvmpipe: fix pipeline statistics with a null ps
git bisect bad 27cedd8aecccea808a35ef297477cac5fe87e476
# bad: [a3ae5dc7dd5c2f8893f86a920247e690e550ebd4] draw: make sure that the stages setup outputs
git bisect bad a3ae5dc7dd5c2f8893f86a920247e690e550ebd4
# first bad commit: [a3ae5dc7dd5c2f8893f86a920247e690e550ebd4] draw: make sure that the stages setup outputs
Comment 9 Peter Wu 2014-01-14 17:12:31 UTC
Created attachment 92053 [details]
gdb debug session (with more details)

The address v0 is invalid according to AddressSanitizer. This looks fishy:

#2  0x00007ffff0ae213f in lp_setup_draw_elements (vbr=0x60740000a100, indices=0x605200001b80, nr=6) at lp_setup_vbuf.c:188
188              setup->triangle( setup,
(gdb) p i
$30 = 5
(gdb) p stride
$31 = 32
(gdb) p (ushort[64])indices[0]
$32 = {0, 0, 0, 48910, 1, 2, 4, 3, 0, 5, 3, 4, 6, 5, 4, 7, 5, 6, 8, 9, 10, 11, 9, 8, 12, 11, 8, 13, 11, 12, 14, 13, 12, 15, 13, 14, 0 <repeats 28 times>}

Playing a bit with break- and watchpoints did not really enlight me. See the GDB session, all I know now is that something wrong entered llvm_pipeline_generic().
Comment 10 Peter Wu 2014-01-14 19:07:15 UTC
Created attachment 92081 [details]
Small test program (robot.c)

Here is a small test program that crashes with Mesa 10.0 (and master). On startup, it immediately crashes. A heap-buffer-overflow according to ASAN.

Some notes:
- It has something to do with anti-aliasing (with GL_LINE_SMOOTH disabled, it runs fine).
- The problem is probably related to negative vertices and clipping. If the triangle vertex (-10,100) is changed to (-1,100), it still crashes but (0,100) is fine.
- It is an combination of vertices, if I remove two vertices for GL_LINES, then it won't crash.

Another hint is the following assertion failure when replacing GL_LINES by GL_POINTS:

lp_setup_vbuf.c:112:lp_setup_unmap_vertices: Assertion `setup->vertex_buffer_size >= (max_index+1) * setup->vertex_size' failed.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007fffefc004a5 in _debug_assert_fail (expr=expr@entry=0x7ffff01f9be0 "setup->vertex_buffer_size >= (max_index+1) * setup->vertex_size", file=file@entry=0x7ffff01f9ba0 "lp_setup_vbuf.c", line=line@entry=112, 
    function=function@entry=0x7ffff01f9d20 <__func__.15180> "lp_setup_unmap_vertices") at util/u_debug.c:278
278           os_abort();
#0  0x00007fffefc004a5 in _debug_assert_fail (expr=expr@entry=0x7ffff01f9be0 "setup->vertex_buffer_size >= (max_index+1) * setup->vertex_size", file=file@entry=0x7ffff01f9ba0 "lp_setup_vbuf.c", line=line@entry=112, 
    function=function@entry=0x7ffff01f9d20 <__func__.15180> "lp_setup_unmap_vertices") at util/u_debug.c:278
#1  0x00007fffeff5be20 in lp_setup_unmap_vertices (vbr=0x60740000a100, min_index=<optimized out>, max_index=<optimized out>) at lp_setup_vbuf.c:112
#2  0x00007fffefb10aeb in vbuf_flush_vertices (vbuf=vbuf@entry=0x601e00009820) at draw/draw_pipe_vbuf.c:323
#3  0x00007fffefb112a9 in vbuf_flush (stage=0x601e00009820, flags=4) at draw/draw_pipe_vbuf.c:392
#4  0x00007fffefaf1ab1 in aaline_flush (stage=0x60360000e4c0, flags=4) at draw/draw_pipe_aaline.c:734
#5  0x00007fffefafe27a in clip_flush (stage=0x602a0001f660, flags=4) at draw/draw_pipe_clip.c:796
#6  0x00007fffefaeb064 in draw_pipeline_flush (draw=draw@entry=0x60680001b100, flags=flags@entry=4) at draw/draw_pipe.c:349
#7  0x00007fffefad26a8 in draw_do_flush (draw=draw@entry=0x60680001b100, flags=flags@entry=4) at draw/draw_context.c:741
#8  0x00007fffefad0571 in draw_flush (draw=draw@entry=0x60680001b100) at draw/draw_context.c:234
#9  0x00007fffeff0da0a in llvmpipe_draw_vbo (pipe=0x606e0001c300, info=0x7fffffffdf10) at lp_draw_arrays.c:155
#10 0x00007fffefacc1e8 in cso_draw_vbo (cso=0x60640001a500, info=info@entry=0x7fffffffdf10) at cso_cache/cso_context.c:1400
#11 0x00007fffef7ed6bb in st_draw_vbo (ctx=<optimized out>, prims=<optimized out>, nr_prims=<optimized out>, ib=<optimized out>, index_bounds_valid=<optimized out>, min_index=<optimized out>, max_index=<optimized out>, 
    tfb_vertcount=<optimized out>, indirect=<optimized out>) at state_tracker/st_draw.c:290
#12 0x00007fffef72f418 in vbo_exec_vtx_flush (exec=exec@entry=0x608800012e48, keepUnmapped=keepUnmapped@entry=1 '\001') at vbo/vbo_exec_draw.c:399
#13 0x00007fffef71c8bc in vbo_exec_FlushVertices_internal (exec=exec@entry=0x608800012e48, unmap=unmap@entry=1 '\001') at vbo/vbo_exec_api.c:555
#14 0x00007fffef72151d in vbo_exec_FlushVertices (ctx=0x7fffe9eb1800, flags=1) at vbo/vbo_exec_api.c:1164
#15 0x00007fffef3ecdc6 in _mesa_flush (ctx=ctx@entry=0x7fffe9eb1800) at main/context.c:1666
#16 0x00007fffef3ed051 in _mesa_Flush () at main/context.c:1701
#17 0x00007ffff4bf1677 in glFlush () at ../../../src/mapi/glapi/glapi_mapi_tmp.h:2968
#18 0x0000000000400ef8 in display () at robot.c:29
#19 0x00007ffff48b4ac4 in ?? () from /usr/lib/libglut.so.3
#20 0x00007ffff48b8329 in fgEnumWindows () from /usr/lib/libglut.so.3
#21 0x00007ffff48b507d in glutMainLoopEvent () from /usr/lib/libglut.so.3
#22 0x00007ffff48b58e5 in glutMainLoop () from /usr/lib/libglut.so.3
#23 0x0000000000401035 in main (argc=1, argv=0x7fffffffe468) at robot.c:57
Comment 11 Roland Scheidegger 2014-01-14 19:38:36 UTC
(In reply to comment #10)
> Here is a small test program that crashes with Mesa 10.0 (and master). On
> startup, it immediately crashes. A heap-buffer-overflow according to ASAN.
> 
> Some notes:
> - It has something to do with anti-aliasing (with GL_LINE_SMOOTH disabled,
> it runs fine).
Ahh yes this makes sense, this must be due to the additional pipeline stage draw injects for smooth lines (probably would crash with other additional pipeline stages too, not just the one for aa lines). I suspect the interaction with front face injection and those additional stages is just broken, as this probably wasn't tested all that much.

> - The problem is probably related to negative vertices and clipping. If the
> triangle vertex (-10,100) is changed to (-1,100), it still crashes but
> (0,100) is fine.
> - It is an combination of vertices, if I remove two vertices for GL_LINES,
> then it won't crash.
> 
> Another hint is the following assertion failure when replacing GL_LINES by
> GL_POINTS:
> 
> lp_setup_vbuf.c:112:lp_setup_unmap_vertices: Assertion
> `setup->vertex_buffer_size >= (max_index+1) * setup->vertex_size' failed.
Comment 12 Brian Paul 2014-01-14 22:25:02 UTC
Created attachment 92096 [details]
proposed fix for the bug

Can you try this patch?  It fixes your test program for me.
Comment 13 Peter Wu 2014-01-14 23:25:29 UTC
(In reply to comment #12)
> Created attachment 92096 [details]
> proposed fix for the bug
> 
> Can you try this patch?  It fixes your test program for me.

Works great, tested it with mesa master 8c4a9f631d7438aeaf56785401891d0773792123 and on top of 10.0.2. Both versions do not crash anymore, whatever I try for some minutes (my java program, another C program and the attached apitrace).

I verified that 10.0.2 still crashes without the patch. Can this patch be backported to 10.0? I have not ran piglit tests though.
Comment 14 Brian Paul 2014-01-15 01:34:13 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > Created attachment 92096 [details]
> > proposed fix for the bug
> > 
> > Can you try this patch?  It fixes your test program for me.
> 
> Works great, tested it with mesa master
> 8c4a9f631d7438aeaf56785401891d0773792123 and on top of 10.0.2. Both versions
> do not crash anymore, whatever I try for some minutes (my java program,
> another C program and the attached apitrace).
> 
> I verified that 10.0.2 still crashes without the patch. Can this patch be
> backported to 10.0? I have not ran piglit tests though.

I'll post the patch for review on mesa-dev and tag it for inclusion in the next 10.0.x release.
Comment 15 Brian Paul 2014-01-20 18:05:50 UTC
Patch committed to master: ad814d04ca5d579538885a595331b5b27caefd2a