Bug 106928 - When starting a match Rocket League crashes on "Go"
Summary: When starting a match Rocket League crashes on "Go"
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: 18.0
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-15 12:21 UTC by terikss
Modified: 2018-09-07 18:28 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Core dump from gdb while running Mesa 18.1.1-3 (385.10 MB, application/x-xz)
2018-06-15 12:21 UTC, terikss
Details
Steam output from start of steam to after game crash. (14.52 KB, text/plain)
2018-06-15 12:59 UTC, terikss
Details
apitrace of google maps in google chrome that asserts (10.92 MB, application/x-xz)
2018-06-26 22:16 UTC, ubizjak
Details
apitrace of google maps in google chrome that asserts with current mainline mesa (10.99 MB, application/x-xz)
2018-06-28 18:30 UTC, ubizjak
Details

Description terikss 2018-06-15 12:21:10 UTC
Created attachment 140170 [details]
Core dump from gdb while running Mesa 18.1.1-3

Overview:
When starting a match in Rocket League it counts down 3, 2, 1, Go. The game always crashes on Go.
Different game modes such as training where the countdown does not occur does not crash the game.
I have tried different graphics settings in Rocket League ranging from lowest to highest. Changing graphics settings makes no difference the game always crashes in the same place.

The reason I suspect the r600 driver is the cause of the issue (or at least part of it) is because of the back trace from gdb which starts with the following and looks like it includes a function name prefixed with "r600".

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f52a1f85561 in __GI_abort () at abort.c:79
#2  0x00007f528e46fecd in std::__replacement_assert (__condition=0x7f528e697818 "__builtin_expect(__n < this->size(), true)", __function=<synthetic pointer>, __line=932, 
    __file=0x7f528e699680 "/usr/include/c++/8/bits/stl_vector.h") at /usr/include/c++/8/x86_64-redhat-linux/bits/c++config.h:2389
#3  std::vector<r600_sb::value*, std::allocator<r600_sb::value*> >::operator[] (__n=2, this=0x7f51a1912120) at /usr/include/c++/8/bits/stl_vector.h:932
#4  r600_sb::expr_handler::fold_alu_op3 (this=0x7f51a1717ba8, n=...) at sb/sb_expr.cpp:952

Steps to Reproduce:
1. Start Rocket League
2. Click Play.
3. Click Play Local.
4. Click Exhibition.
5. Click Create Match.
6. Wait until the match has loaded, countdown begins and it says "Go".

Actual Results:
The game crashes.

Expected Results:
Game to not crash and playable match to begin.

Build Date & Hardware:
Graphics card: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 5770]

I'm running Fedora 28. The crash can be recreated with both Mesa 18.0.2-1 and Mesa 18.1.1-3 from https://apps.fedoraproject.org/packages/mesa.


Attached is a core dump from gdb and the output from steam from when starting Steam to after the game has crashed. The core dump and output is from when running Mesa 18.1.1-3.
Comment 1 terikss 2018-06-15 12:59:05 UTC
Created attachment 140173 [details]
Steam output from start of steam to after game crash.
Comment 2 Roland Scheidegger 2018-06-22 16:26:32 UTC
Definitely looks like a r600 issue, in the sb code (I suppose it would work with R600_DEBUG=nosb).
An apitrace might be helpful (it should be possible to capture one with nosb so it doesn't crash and then replaying it without that should crash).
Unless someone has some ideas what could be wrong with the sb code there...
Comment 3 ubizjak 2018-06-26 15:50:38 UTC
(In reply to terikss from comment #1)
> Created attachment 140173 [details]
> Steam output from start of steam to after game crash.

I hit the same assert:

/usr/include/c++/8/bits/stl_vector.h:932: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = r600_sb::value*; _Alloc = std::allocator<r600_sb::value*>; std::vector<_Tp, _Alloc>::reference = r600_sb::value*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.

with Google maps on Fedora 28 with mesa 18.0.5 ([AMD/ATI] Cypress XT [Radeon HD 5870]). Please see [1] for more information.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1580019
Comment 4 ubizjak 2018-06-26 16:27:06 UTC
(In reply to ubizjak from comment #3)

> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1580019

As reported in [1], full backtrace with debug info reads:

Thread 1 "firefox" received signal SIGSEGV, Segmentation fault.
0x000055b7d4b054b2 in mozalloc_abort(char const*) ()
(gdb) bt
#0  0x000055b7d4b054b2 in mozalloc_abort(char const*) ()
#1  0x000055b7d4af97b6 in mozilla::detail::ConditionVariableImpl::ConditionVariableImpl() [clone .cold.16] ()
#2  0x00007f94d056f965 in std::__replacement_assert(char const*, int, char const*, char const*) (__condition=0x7f94d0786638 "__builtin_expect(__n < this->size(), true)", __function=<synthetic pointer>, __line=932, __file=0x7f94d07884a0 "/usr/include/c++/8/bits/stl_vector.h")
    at /usr/include/c++/8/x86_64-redhat-linux/bits/c++config.h:2389
#3  0x00007f94d056f965 in std::vector<r600_sb::value*, std::allocator<r600_sb::value*> >::operator[](unsigned long) (__n=2, this=0x7f94d904c820)
    at /usr/include/c++/8/bits/stl_vector.h:932
#4  0x00007f94d056f965 in r600_sb::expr_handler::fold_alu_op3(r600_sb::alu_node&) (this=0x7f94f3deeea8, n=...) at sb/sb_expr.cpp:951
#5  0x00007f94d056c0cd in r600_sb::expr_handler::try_fold(r600_sb::node*) (n=<optimized out>, this=<optimized out>) at sb/sb_expr.cpp:145
#6  0x00007f94d056c0cd in r600_sb::expr_handler::try_fold(r600_sb::value*) (this=<optimized out>, v=0x7f94d90616f0) at sb/sb_expr.cpp:136
#7  0x00007f94d058e3fc in r600_sb::value_table::add_value(r600_sb::value*) (this=0x7f94f3deee70, v=<optimized out>, v@entry=0x7f94d90616f0)
    at sb/sb_valtable.cpp:149
#8  0x00007f94d0573381 in r600_sb::gvn::process_op(r600_sb::node&, bool) (this=0x7ffeac463420, n=..., rewrite=rewrite@entry=true) at sb/sb_gvn.cpp:222
#9  0x00007f94d057356e in r600_sb::gvn::visit(r600_sb::alu_node&, bool) (this=<optimized out>, n=..., enter=<optimized out>) at sb/sb_gvn.cpp:97
#10 0x00007f94d0577570 in r600_sb::vpass::run_on(r600_sb::container_node&) (this=0x7ffeac463420, n=...) at sb/sb_ir.h:888
#11 0x00007f94d0577651 in r600_sb::vpass::run() (this=0x7ffeac463420) at sb/sb_pass.cpp:44
#12 0x00007f94d05670ba in r600_sb_bytecode_process () at sb/sb_core.cpp:205
#13 0x00007f94d0533bcd in r600_pipe_shader_create (ctx=ctx@entry=0x7f94fa76f000, shader=shader@entry=0x7f94d903f000, key=...) at r600_shader.c:215
#14 0x00007f94d053de64 in r600_shader_select (ctx=ctx@entry=0x7f94fa76f000, sel=0x7f94d903c000, dirty=dirty@entry=0x7ffeac464175) at r600_state_common.c:869
#15 0x00007f94d053f077 in r600_update_derived_state (rctx=0x7f94fa76f000) at r600_state_common.c:1687
#16 0x00007f94d053f077 in r600_draw_vbo (ctx=0x7f94fa76f000, info=0x7ffeac464490) at r600_state_common.c:1968
#17 0x00007f94d031e710 in u_vbuf_draw_vbo (mgr=0x7f94f90af000, info=<optimized out>) at util/u_vbuf.c:1150
#18 0x00007f94d0109a2f in st_draw_vbo (ctx=<optimized out>, prims=0x7ffeac464550, nr_prims=<optimized out>, ib=0x0, index_bounds_valid=<optimized out>, min_index=<optimized out>, max_index=<optimized out>, tfb_vertcount=0x0, stream=0, indirect=0x0) at state_tracker/st_draw.c:227
#19 0x00007f94d00ce70b in vbo_draw_arrays (ctx=ctx@entry=0x7f94ed8c2000, mode=mode@entry=4, start=start@entry=1320, count=count@entry=87, numInstances=numInstances@entry=1, baseInstance=baseInstance@entry=0, drawID=0) at vbo/vbo_exec_array.c:486
#20 0x00007f94d00ce79a in vbo_exec_DrawArraysInstanced (mode=4, start=1320, count=87, numInstances=1) at vbo/vbo_exec_array.c:677
#21 0x00007f96a8c92855 in mozilla::WebGLContext::DrawArraysInstanced(unsigned int, int, int, int, char const*) () at /usr/lib64/firefox/libxul.so
#22 0x00007f96a892f967 in mozilla::dom::WebGLRenderingContextBinding::drawArrays(JSContext*, JS::Handle<JSObject*>, mozilla::WebGLContext*, JSJitMethodCallArgs const&) () at /usr/lib64/firefox/libxul.so
#23 0x00007f96a8c0e1c9 in mozilla::dom::GenericBindingMethod(JSContext*, unsigned int, JS::Value*) () at /usr/lib64/firefox/libxul.so
#24 0x000021a97a2b79c6 in  ()
#25 0x0000000000000000 in  ()

The value of alu_node& n in frame #4, where things go downhill:

(gdb) f 4
#4  r600_sb::expr_handler::fold_alu_op3 (this=0x7f94f3deeea8, n=...) at sb/sb_expr.cpp:951
951             value* v2 = n.src[2]->gvalue();
(gdb) list
946                             return true;
947             }
948
949             value* v0 = n.src[0]->gvalue();
950             value* v1 = n.src[1]->gvalue();
951             value* v2 = n.src[2]->gvalue();
952
953             /* LDS instructions look like op3 with no dst - don't fold. */
954             if (!n.dst[0])
955                     return false;

(gdb) p n
$1 = (r600_sb::alu_node &) @0x7f94d904c7d0: {<r600_sb::node> = {_vptr.node = 0x7f94d0b692f8 <vtable for r600_sb::alu_node+16>, prev = 0x7f94d904c738, 
    next = 0x7f94d904c938, parent = 0x7f94d904c000, type = r600_sb::NT_OP, subtype = r600_sb::NST_ALU_INST, flags = r600_sb::NF_EMPTY, pred = 0x0, 
    dst = std::vector of length 1, capacity 1 = {0x7f94d90616f0}, src = std::vector of length 2, capacity 3 = {0x7f94d905d110, 0x7f94d905cf70}}, bc = {
    op_ptr = 0x7f94d0b628a0 <r600_alu_op_table>, op = 0, src = {{sel = 252, chan = 2, neg = 0, abs = 0, rel = 0, value = {{i = 0, u = 0, f = 0}}}, {sel = 0, 
        chan = 0, neg = 0, abs = 0, rel = 0, value = {{i = 0, u = 0, f = 0}}}, {sel = 252, chan = 2, neg = 0, abs = 0, rel = 0, value = {{i = 0, u = 0, 
            f = 0}}}}, dst_gpr = 6, dst_chan = 2, dst_rel = 0, clamp = 0, omod = 0, bank_swizzle = 0, index_mode = 0, last = 1, pred_sel = 0, fog_merge = 0, 
    write_mask = 0, update_exec_mask = 0, update_pred = 0, slot = 2, lds_idx_offset = 0, slot_flags = AF_VS}}
Comment 5 ubizjak 2018-06-26 22:16:58 UTC
Created attachment 140354 [details]
apitrace of google maps in google chrome that asserts

$ apitrace replay chrome.trace
132 @0 glXCreateWindow(dpy = 0xb9b58c2b800, config = 0xb9b58fbd1c0, win = 169869314, attribList = {}) = 169869315
132: warning: unsupported glXCreateWindow call
133 @0 glXDestroyWindow(dpy = 0xb9b58c2b800, window = 169869315)
133: warning: unsupported glXDestroyWindow call
156 @0 glXCreateWindow(dpy = 0xb9b58c2b800, config = 0xb9b58fbd1c0, win = 169869316, attribList = {}) = 169869317
156: warning: unsupported glXCreateWindow call
692 @0 glXDestroyWindow(dpy = 0xb9b58c2b800, window = 169869317)
692: warning: unsupported glXDestroyWindow call
716 @0 glXCreateWindow(dpy = 0xb9b58c2b800, config = 0xb9b58fbd1c0, win = 169869322, attribList = {}) = 169869323
716: warning: unsupported glXCreateWindow call
1588 @0 glXCreateWindow(dpy = 0xb9b58c2b800, config = 0xb9b58fbd1c0, win = 169869324, attribList = {}) = 169869331
1588: warning: unsupported glXCreateWindow call
/usr/include/c++/8/bits/stl_vector.h:932: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = r600_sb::value*; _Alloc = std::allocator<r600_sb::value*>; std::vector<_Tp, _Alloc>::reference = r600_sb::value*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.
apitrace: warning: caught signal 6
113301: error: caught an unhandled exception
/usr/bin/glretrace+0x299350
/lib64/libpthread.so.0+0x11fbf
/lib64/libc.so.6: gsignal+0x10b
/lib64/libc.so.6: abort+0x12a
/usr/lib64/dri/r600_dri.so+0x6c4964
/usr/lib64/dri/r600_dri.so+0x6c10cc
/usr/lib64/dri/r600_dri.so+0x6e33fb
/usr/lib64/dri/r600_dri.so+0x6c8380
/usr/lib64/dri/r600_dri.so+0x6c856d
/usr/lib64/dri/r600_dri.so+0x6cc56f
/usr/lib64/dri/r600_dri.so+0x6cc650
/usr/lib64/dri/r600_dri.so+0x6bc0b9
/usr/lib64/dri/r600_dri.so+0x688bcc
/usr/lib64/dri/r600_dri.so+0x692e63
/usr/lib64/dri/r600_dri.so+0x694076
/usr/lib64/dri/r600_dri.so+0x47370f
/usr/lib64/dri/r600_dri.so+0x25ea2e
/usr/lib64/dri/r600_dri.so+0x22370a
/usr/lib64/dri/r600_dri.so+0x223a4b
/usr/bin/glretrace+0x1325ae
/usr/bin/glretrace+0x649de
/usr/bin/glretrace+0x65184
/usr/bin/glretrace+0x5fbcf
/lib64/libc.so.6: __libc_start_main+0xea
/usr/bin/glretrace: _start+0x29
?
apitrace: info: taking default action for signal 6
Comment 6 Roland Scheidegger 2018-06-27 00:38:39 UTC
At a quick try I can't reproduce this on my HD 5750. Albeit I tried chromium, and probably not the same version.
The replay doesn't assert neither, though I get lots of link errors due to usage of glProgramBinary it seems (also needed gl/glsl version overrides). Might need to disable ARB_get_program_binary when capturing?
From the backtrace it sort of looks like trying to fold a alu3 op with only 2 sources but I've no idea really. I hope it's not compiler dependent (valgrind was of no use neither).
Since this appears to be a regression, you could try to git bisect.
Comment 7 ubizjak 2018-06-28 18:06:51 UTC
(In reply to Roland Scheidegger from comment #6)
> At a quick try I can't reproduce this on my HD 5750. Albeit I tried
> chromium, and probably not the same version.
> The replay doesn't assert neither, though I get lots of link errors due to
> usage of glProgramBinary it seems (also needed gl/glsl version overrides).
> Might need to disable ARB_get_program_binary when capturing?
> From the backtrace it sort of looks like trying to fold a alu3 op with only
> 2 sources but I've no idea really. I hope it's not compiler dependent
> (valgrind was of no use neither).
> Since this appears to be a regression, you could try to git bisect.

Please configure the build with:

CXXFLAGS="-Wp,-D_GLIBCXX_ASSERTIONS" ./autogen.sh

Rest assured that the assert will be triggered. I have tried with build of Mesa 18.0.5 (git-ca0037aaef). Just invoke firefox from command line and navigate to Google Maps.
Comment 8 ubizjak 2018-06-28 18:30:24 UTC
Created attachment 140382 [details]
apitrace of google maps in google chrome that asserts with current mainline mesa

Attached trace that fails with current mainline

commit e83cd38eac26167cdab9205f17fd276c73daf3fb (HEAD -> master, origin/master, origin/HEAD)

when configured with _GLIBCXX_ASSERTIONS.

There are a bunch of harmless warnings, but the trace works with R600_DEBUG=nosb and fails without:

132 @0 glXCreateWindow(dpy = 0xb154f920800, config = 0xb154fcc10e0, win = 182452226, attribList = {}) = 182452227
132: warning: unsupported glXCreateWindow call
133 @0 glXDestroyWindow(dpy = 0xb154f920800, window = 182452227)
133: warning: unsupported glXDestroyWindow call
156 @0 glXCreateWindow(dpy = 0xb154f920800, config = 0xb154fcc10e0, win = 182452228, attribList = {}) = 182452229
156: warning: unsupported glXCreateWindow call
694 @0 glXDestroyWindow(dpy = 0xb154f920800, window = 182452229)
694: warning: unsupported glXDestroyWindow call
718 @0 glXCreateWindow(dpy = 0xb154f920800, config = 0xb154fcc10e0, win = 182452234, attribList = {}) = 182452235
718: warning: unsupported glXCreateWindow call
1591 @0 glXCreateWindow(dpy = 0xb154f920800, config = 0xb154fcc10e0, win = 182452236, attribList = {}) = 182452243
1591: warning: unsupported glXCreateWindow call
4762 @0 glProgramBinary(program = 6, binaryFormat = 34655, binary = blob(5017), length = 5017)
4762: warning: link failed
4879 @0 glProgramBinary(program = 9, binaryFormat = 34655, binary = blob(4629), length = 4629)
4879: warning: link failed
4966 @0 glProgramBinary(program = 12, binaryFormat = 34655, binary = blob(5177), length = 5177)
4966: warning: link failed
60680 @0 glProgramBinary(program = 24, binaryFormat = 34655, binary = blob(6349), length = 6349)
60680: warning: link failed
61187 @0 glProgramBinary(program = 27, binaryFormat = 34655, binary = blob(11341), length = 11341)
61187: warning: link failed
67815 @0 glProgramBinary(program = 30, binaryFormat = 34655, binary = blob(6993), length = 6993)
67815: warning: link failed
/usr/include/c++/8/bits/stl_vector.h:932: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = r600_sb::value*; _Alloc = std::allocator<r600_sb::value*>; std::vector<_Tp, _Alloc>::reference = r600_sb::value*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.
apitrace: warning: caught signal 6
68048: error: caught an unhandled exception
/usr/bin/glretrace+0x299350
/lib64/libpthread.so.0+0x11fbf
/lib64/libc.so.6: gsignal+0x10b
/lib64/libc.so.6: abort+0x12a
/ssd/uros/git/mesa/lib/gallium//r600_dri.so+0x639afa
...
Comment 9 Roland Scheidegger 2018-06-29 00:09:54 UTC
(In reply to ubizjak from comment #7)
> Please configure the build with:
> 
> CXXFLAGS="-Wp,-D_GLIBCXX_ASSERTIONS" ./autogen.sh

That didn't do anything neither. However I figured out the problem more or less in the code, and some googling said that using -D_GLIBCXX_DEBUG should make it trigger reliably, and indeed it does...
The issue is that (you already showed that actually)
src = std::vector of length 2, capacity 3 = {0x7f94d905d110, 0x7f94d905cf70}}
And trying to access element src[2].
There's an early exit in the function if src.size() is < 3. Since this didn't hit, apparently fold_assoc() resized the vector. And indeed it can do that (there's an explicit n->src.resize(2) somewhere, and it would still return false in this case).

I think something like this should do:
diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp b/src/gallium/drivers/r600/sb/sb_expr.cpp
index 1df78da660..c77b9f2d7d 100644
--- a/src/gallium/drivers/r600/sb/sb_expr.cpp
+++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
@@ -945,6 +945,8 @@ bool expr_handler::fold_alu_op3(alu_node& n) {
        if (!sh.safe_math && (n.bc.op_ptr->flags & AF_M_ASSOC)) {
                if (fold_assoc(&n))
                        return true;
+               else if (n.src.size() < 3)
+                       return fold_alu_op2(n);
        }
 
        value* v0 = n.src[0]->gvalue();

But I'm not entirely convinced it's really the right thing to do (maybe what fold_assoc() did isn't quite what it's supposed to do?).
It fixes the particular fold_alu_op3 crash for me, but the shader (not sure it's actually the same one) crashes later anyway:
/usr/include/c++/4.8/debug/safe_iterator.h:225:
Error: attempt to copy from a singular iterator.

Objects involved in the operation:
    iterator "this" @ 0x0x7ffff3c71e80 {
      type = 
Thread 1 "glretrace" received signal SIGSEGV, Segmentation fault.
...
#3  0x00007ffff36538cb in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<r600_sb::region_node**, std::__cxx1998::vector<r600_sb::region_node*, std::allocator<r600_sb::region_node*> > >, std::__debug::vector<r600_sb::region_node*, std::allocator<r600_sb::region_node*> > >::operator= (this=0x7fffffffb840, __x=) at /usr/include/c++/4.8/debug/safe_iterator.h:221
#4  0x00007ffff365376d in std::reverse_iterator<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<r600_sb::region_node**, std::__cxx1998::vector<r600_sb::region_node*, std::allocator<r600_sb::region_node*> > >, std::__debug::vector<r600_sb::region_node*, std::allocator<r600_sb::region_node*> > > >::operator= (this=0x7fffffffb840) at /usr/include/c++/4.8/bits/stl_iterator.h:96
#5  r600_sb::if_conversion::run (this=0x7fffffffbee0) at sb/sb_if_conversion.cpp:46
#6  0x00007ffff3632765 in r600_sb_bytecode_process (rctx=0x10e4660, bc=0x1980bf0, pshader=0x1980be8, dump_bytecode=1, optimize=1) at sb/sb_core.cpp:195

I don't know though if that's just due to the D_GLIBCXX_DEBUG thing or it will also cause crashes without it in some other libstdc++ versions... (in any case, it probably should be fixed, but this code isn't my area of expertise).
Comment 10 Dave Airlie 2018-06-29 02:57:36 UTC
https://patchwork.freedesktop.org/series/45627/

should fix the second breakage, after Roland's fix for the first one.
Comment 11 ubizjak 2018-06-29 07:16:24 UTC
(In reply to Roland Scheidegger from comment #9)
> (In reply to ubizjak from comment #7)
> > Please configure the build with:
> > 
> > CXXFLAGS="-Wp,-D_GLIBCXX_ASSERTIONS" ./autogen.sh
> 
> That didn't do anything neither. However I figured out the problem more or
> less in the code, and some googling said that using -D_GLIBCXX_DEBUG should
> make it trigger reliably, and indeed it does...

Great ;)

> The issue is that (you already showed that actually)
> src = std::vector of length 2, capacity 3 = {0x7f94d905d110, 0x7f94d905cf70}}
> And trying to access element src[2].
> There's an early exit in the function if src.size() is < 3. Since this
> didn't hit, apparently fold_assoc() resized the vector. And indeed it can do
> that (there's an explicit n->src.resize(2) somewhere, and it would still
> return false in this case).
> 
> I think something like this should do:
> diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp
> b/src/gallium/drivers/r600/sb/sb_expr.cpp
> index 1df78da660..c77b9f2d7d 100644
> --- a/src/gallium/drivers/r600/sb/sb_expr.cpp
> +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
> @@ -945,6 +945,8 @@ bool expr_handler::fold_alu_op3(alu_node& n) {
>         if (!sh.safe_math && (n.bc.op_ptr->flags & AF_M_ASSOC)) {
>                 if (fold_assoc(&n))
>                         return true;
> +               else if (n.src.size() < 3)
> +                       return fold_alu_op2(n);
>         }
>  
>         value* v0 = n.src[0]->gvalue();

I wonder if we should fix expr_handler::fold_assoc instead. Digging through the code, fold_assoc is called only from expr_handler::fold_alu_op2 and expr_handler::fold_alu_op3. When true is returned, it triggers an early exit from expr_handler::fold_alu_op{2,3} functions.

The part we are looking for in expr_handler::fold_assoc is:

		} else { // MULADD => ADD
			n->src[0] = n->src[2];
			n->bc.src[0] = n->bc.src[2];
			n->src[1] = sh.get_const_value(cr);
			memset(&n->bc.src[1], 0, sizeof(bc_alu_src));

			n->src.resize(2);
			n->bc.set_op(ALU_OP2_ADD);
		}

So, let's call fold_alu_op2 here and return true to trigger early exit in expr_handler::fold_alu_op3. This is what the code a couple of lines above the presented code does when an operand degenerates to mov.

The (effectively the same patch as yours) proposed patch would be:

diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp b/src/gallium/drivers/r600/sb/sb_expr.cpp
index 7a5d62c8e8..a609d1377f 100644
--- a/src/gallium/drivers/r600/sb/sb_expr.cpp
+++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
@@ -714,6 +714,8 @@ bool expr_handler::fold_assoc(alu_node *n) {
 
                        n->src.resize(2);
                        n->bc.set_op(ALU_OP2_ADD);
+                       fold_alu_op2(*n);
+                       return true;
                }
        } else if (last_arg >= 0) {
                n->src[0] = a->src[last_arg];

WDYT?

On a side note, maybe -D_GLIBCXX_ASSERTIONS should be added to mesa testsuite. This is the flag that Fedora 28 builds use by default now, so it would be beneficial to catch these bugs early in the development cycle, before they reach users.
Comment 12 Roland Scheidegger 2018-06-29 15:09:36 UTC
(In reply to ubizjak from comment #11)
> The (effectively the same patch as yours) proposed patch would be:
> 
> diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp
> b/src/gallium/drivers/r600/sb/sb_expr.cpp
> index 7a5d62c8e8..a609d1377f 100644
> --- a/src/gallium/drivers/r600/sb/sb_expr.cpp
> +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
> @@ -714,6 +714,8 @@ bool expr_handler::fold_assoc(alu_node *n) {
>  
>                         n->src.resize(2);
>                         n->bc.set_op(ALU_OP2_ADD);
> +                       fold_alu_op2(*n);
> +                       return true;
>                 }
>         } else if (last_arg >= 0) {
>                 n->src[0] = a->src[last_arg];
> 
> WDYT?

I am not quite convinced it's ok to return true (in fold_alu_op3) if the expression hasn't really been folded. You are quite right that just above it looks similar, but all other places always return the return value of fold_alu_op2 when calling into it from fold_alu_op3.
(Not saying it isn't correct, just saying I can't tell...)

> On a side note, maybe -D_GLIBCXX_ASSERTIONS should be added to mesa
> testsuite. This is the flag that Fedora 28 builds use by default now, so it
> would be beneficial to catch these bugs early in the development cycle,
> before they reach users.
I was actually going to suggest to enable -D_GLIBCXX_DEBUG for debug builds always, but it wouldn't work in general (due to not being able to link anything which hasn't been compiled with it, for instance llvm). So using -D_GLIBCXX_ASSERTIONS instead (which should be link-compatible) looks like a good idea to me, albeit it didn't do anything for me (too old libstdc++ I suppose). I'm also not quite sure which issues it actually catches (vs the DEBUG one) but even if it just catches some that should be a plus...
Comment 13 ubizjak 2018-06-29 19:09:41 UTC
(In reply to Roland Scheidegger from comment #12)
> (In reply to ubizjak from comment #11)
> > The (effectively the same patch as yours) proposed patch would be:
> > 
> > diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp
> > b/src/gallium/drivers/r600/sb/sb_expr.cpp
> > index 7a5d62c8e8..a609d1377f 100644
> > --- a/src/gallium/drivers/r600/sb/sb_expr.cpp
> > +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
> > @@ -714,6 +714,8 @@ bool expr_handler::fold_assoc(alu_node *n) {
> >  
> >                         n->src.resize(2);
> >                         n->bc.set_op(ALU_OP2_ADD);
> > +                       fold_alu_op2(*n);
> > +                       return true;
> >                 }
> >         } else if (last_arg >= 0) {
> >                 n->src[0] = a->src[last_arg];
> > 
> > WDYT?
> 
> I am not quite convinced it's ok to return true (in fold_alu_op3) if the
> expression hasn't really been folded. You are quite right that just above it
> looks similar, but all other places always return the return value of
> fold_alu_op2 when calling into it from fold_alu_op3.
> (Not saying it isn't correct, just saying I can't tell...)

Beeing a newcomer, I'm also not too familiar with this code, and there are no comments on what the return value really means. Instead of guessing, is it possible for you to invite the author or other knowledgeable people from mesa community to the discussion in this bugreport?

Looking at the commit logs of src/gallium/drivers/r600/sb, there are quite some experts that can perhaps help here...
Comment 14 Dave Airlie 2018-06-29 20:11:31 UTC
I think Roland's first patch is correct, just call fold_alu_op2 if we get back 2 sources from fold_assoc. return true is for when we've finished all folding on that instruction, so I don't think that's correct without calling fold_alu_op2
Comment 15 ubizjak 2018-06-29 20:23:45 UTC
(In reply to Dave Airlie from comment #14)
> I think Roland's first patch is correct, just call fold_alu_op2 if we get
> back 2 sources from fold_assoc. return true is for when we've finished all
> folding on that instruction, so I don't think that's correct without calling
> fold_alu_op2

LGTM then (it also fixes the failure for me), but can we please drop the "else" from the patch, it is really confusing.
Comment 16 Roland Scheidegger 2018-06-29 20:45:54 UTC
(In reply to Dave Airlie from comment #14)
> I think Roland's first patch is correct, just call fold_alu_op2 if we get
> back 2 sources from fold_assoc. return true is for when we've finished all
> folding on that instruction, so I don't think that's correct without calling
> fold_alu_op2
Well it would still be called (in fold_assoc instead) in this case so the only difference in the end really is that fold_alu_op3 would return false instead of true if fold_alu_op2 (called when fold_assoc reduced number of src to 2) returned false.
If the return value is just indicating if we're finished with folding, then I suppose it does not actually matter for correctness either way
This code was all written by Vadim Girlin, but I don't think he's still around here.
Comment 17 Roland Scheidegger 2018-06-29 20:54:57 UTC
(In reply to ubizjak from comment #15)
> (In reply to Dave Airlie from comment #14)
> > I think Roland's first patch is correct, just call fold_alu_op2 if we get
> > back 2 sources from fold_assoc. return true is for when we've finished all
> > folding on that instruction, so I don't think that's correct without calling
> > fold_alu_op2
> 
> LGTM then (it also fixes the failure for me), but can we please drop the
> "else" from the patch, it is really confusing.

Sure that can be done. (I was actually also wondering if the test for n.src.size() < 3 at the beginning of the function is actually necessary, are we really supposed to have already fewer sources when this is called? But I couldn't quite answer that. This code is a bit confusing to me.)
Comment 18 Miroslav Šustek 2018-07-04 07:50:56 UTC
(In reply to Roland Scheidegger from comment #12)
> (In reply to ubizjak from comment #11)
> > The (effectively the same patch as yours) proposed patch would be:
> > 
> > diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp
> > b/src/gallium/drivers/r600/sb/sb_expr.cpp
> > index 7a5d62c8e8..a609d1377f 100644
> > --- a/src/gallium/drivers/r600/sb/sb_expr.cpp
> > +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
> > @@ -714,6 +714,8 @@ bool expr_handler::fold_assoc(alu_node *n) {
> >  
> >                         n->src.resize(2);
> >                         n->bc.set_op(ALU_OP2_ADD);
> > +                       fold_alu_op2(*n);
> > +                       return true;
> >                 }
> >         } else if (last_arg >= 0) {
> >                 n->src[0] = a->src[last_arg];
> > 
> > WDYT?
> 
> I am not quite convinced it's ok to return true (in fold_alu_op3) if the
> expression hasn't really been folded. You are quite right that just above it
> looks similar, but all other places always return the return value of
> fold_alu_op2 when calling into it from fold_alu_op3.
> (Not saying it isn't correct, just saying I can't tell...)

I guess it is OK. There already is the same pattern in fold_mul_add where it tries to fold:
> ADD x, MUL(y, z)
into:
> MULADD y, z, x

and when it succeeds, it does:
>
>     fold_alu_op3(*n);
>     return true;

That means that no matter if fold_alu_op3 folds the instruction any further, fold_mul_add returns true because it converted ADD and MUL into MULADD.

In fold_alu_op2 where fold_mul_add is called, the return value is used in an early return condition:
> 	if (n.bc.op == ALU_OP2_ADD) {
> 		if (fold_mul_add(&n))
> 			return true;
> 	}

It seems OK to me to return true after calling fold_alu_op2 in fold_alu_op3 as a sign that the node should not be folded as op3 anymore.

Also, the return value of fold_alu_op* is thrown away anyway here:
> bool expr_handler::try_fold(value* v) {
> 	assert(!v->gvn_source);
> 
> 	if (v->def)
Here expr_handler::try_fold(node* n) calls the fold_alu_op* methods:
> 		try_fold(v->def);
> 
> 	if (v->gvn_source)
> 		return true;
> 
> 	return false;
> }

I like the patch that ubizjak@gmail.com posted:
> diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp b/src/gallium/drivers/r600/sb/sb_expr.cpp
> index 1df78da660..8cbff6f577 100644
> --- a/src/gallium/drivers/r600/sb/sb_expr.cpp
> +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
> @@ -723,6 +723,8 @@ bool expr_handler::fold_assoc(alu_node *n) {
>  
>                         n->src.resize(2);
>                         n->bc.set_op(ALU_OP2_ADD);
> +                       fold_alu_op2(n);
> +                       return true;
>                 }
>         } else if (last_arg >= 0) {
>                 n->src[0] = a->src[last_arg];
Comment 19 Miroslav Šustek 2018-07-10 07:57:00 UTC
By the way, I can reproduce the same problem when I open Google Maps in Firefox. The tab crashes during loading. After few crashes it stops to use WebGL and then it starts to work.
Comment 20 Miroslav Šustek 2018-07-10 08:15:54 UTC
Sorry for the unnecessary bump. I did not notice that a patch is already in the staging/18.1 branch: https://github.com/mesa3d/mesa/commit/d68f2d7edeedebf2db58b28a9ac6befb7757afb7
Comment 21 Andrés Gómez García 2018-09-07 17:36:24 UTC
Roland, is this fixed by?

commit 817efd89685efc6b5866e09cbdad01c4ff21c737
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jul 4 04:44:17 2018 +0200

    r600/sb: fix crash in fold_alu_op3
    
    fold_assoc() called from fold_alu_op3() can lower the number of src to 2,
    which then leads to an invalid access to n.src[2]->gvalue().
    This didn't seem to have caused much harm in the past, but on Fedora 28
    it will crash (presumably because -D_GLIBCXX_ASSERTIONS is used, although
    with libstdc++ 4.8.5 this didn't do anything, -D_GLIBCXX_DEBUG was
    needed to show the issue).
    
    An alternative fix would be to instead call fold_alu_op2() from within
    fold_assoc() when the number of src is reduced and return always TRUE
    from fold_assoc() in this case, with the only actual difference being
    the return value from fold_alu_op3() then. I'm not sure what the return
    value actually should be in this case (or whether it even can make a
    difference).
    
    https://bugs.freedesktop.org/show_bug.cgi?id=106928
    Cc: mesa-stable@lists.freedesktop.org
    Reviewed-by: Dave Airlie <airlied@redhat.com>
Comment 22 Roland Scheidegger 2018-09-07 18:28:47 UTC
Yes, should be fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.