Summary: | When starting a match Rocket League crashes on "Go" | ||
---|---|---|---|
Product: | Mesa | Reporter: | terikss |
Component: | Drivers/Gallium/r600 | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | normal | ||
Priority: | medium | CC: | agomez, sroland, ubizjak |
Version: | 18.0 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
See Also: | https://bugzilla.redhat.com/show_bug.cgi?id=1580019 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Core dump from gdb while running Mesa 18.1.1-3
Steam output from start of steam to after game crash. apitrace of google maps in google chrome that asserts apitrace of google maps in google chrome that asserts with current mainline mesa |
Description
terikss
2018-06-15 12:21:10 UTC
Created attachment 140173 [details]
Steam output from start of steam to after game crash.
Definitely looks like a r600 issue, in the sb code (I suppose it would work with R600_DEBUG=nosb). An apitrace might be helpful (it should be possible to capture one with nosb so it doesn't crash and then replaying it without that should crash). Unless someone has some ideas what could be wrong with the sb code there... (In reply to terikss from comment #1) > Created attachment 140173 [details] > Steam output from start of steam to after game crash. I hit the same assert: /usr/include/c++/8/bits/stl_vector.h:932: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = r600_sb::value*; _Alloc = std::allocator<r600_sb::value*>; std::vector<_Tp, _Alloc>::reference = r600_sb::value*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed. with Google maps on Fedora 28 with mesa 18.0.5 ([AMD/ATI] Cypress XT [Radeon HD 5870]). Please see [1] for more information. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1580019 (In reply to ubizjak from comment #3) > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1580019 As reported in [1], full backtrace with debug info reads: Thread 1 "firefox" received signal SIGSEGV, Segmentation fault. 0x000055b7d4b054b2 in mozalloc_abort(char const*) () (gdb) bt #0 0x000055b7d4b054b2 in mozalloc_abort(char const*) () #1 0x000055b7d4af97b6 in mozilla::detail::ConditionVariableImpl::ConditionVariableImpl() [clone .cold.16] () #2 0x00007f94d056f965 in std::__replacement_assert(char const*, int, char const*, char const*) (__condition=0x7f94d0786638 "__builtin_expect(__n < this->size(), true)", __function=<synthetic pointer>, __line=932, __file=0x7f94d07884a0 "/usr/include/c++/8/bits/stl_vector.h") at /usr/include/c++/8/x86_64-redhat-linux/bits/c++config.h:2389 #3 0x00007f94d056f965 in std::vector<r600_sb::value*, std::allocator<r600_sb::value*> >::operator[](unsigned long) (__n=2, this=0x7f94d904c820) at /usr/include/c++/8/bits/stl_vector.h:932 #4 0x00007f94d056f965 in r600_sb::expr_handler::fold_alu_op3(r600_sb::alu_node&) (this=0x7f94f3deeea8, n=...) at sb/sb_expr.cpp:951 #5 0x00007f94d056c0cd in r600_sb::expr_handler::try_fold(r600_sb::node*) (n=<optimized out>, this=<optimized out>) at sb/sb_expr.cpp:145 #6 0x00007f94d056c0cd in r600_sb::expr_handler::try_fold(r600_sb::value*) (this=<optimized out>, v=0x7f94d90616f0) at sb/sb_expr.cpp:136 #7 0x00007f94d058e3fc in r600_sb::value_table::add_value(r600_sb::value*) (this=0x7f94f3deee70, v=<optimized out>, v@entry=0x7f94d90616f0) at sb/sb_valtable.cpp:149 #8 0x00007f94d0573381 in r600_sb::gvn::process_op(r600_sb::node&, bool) (this=0x7ffeac463420, n=..., rewrite=rewrite@entry=true) at sb/sb_gvn.cpp:222 #9 0x00007f94d057356e in r600_sb::gvn::visit(r600_sb::alu_node&, bool) (this=<optimized out>, n=..., enter=<optimized out>) at sb/sb_gvn.cpp:97 #10 0x00007f94d0577570 in r600_sb::vpass::run_on(r600_sb::container_node&) (this=0x7ffeac463420, n=...) at sb/sb_ir.h:888 #11 0x00007f94d0577651 in r600_sb::vpass::run() (this=0x7ffeac463420) at sb/sb_pass.cpp:44 #12 0x00007f94d05670ba in r600_sb_bytecode_process () at sb/sb_core.cpp:205 #13 0x00007f94d0533bcd in r600_pipe_shader_create (ctx=ctx@entry=0x7f94fa76f000, shader=shader@entry=0x7f94d903f000, key=...) at r600_shader.c:215 #14 0x00007f94d053de64 in r600_shader_select (ctx=ctx@entry=0x7f94fa76f000, sel=0x7f94d903c000, dirty=dirty@entry=0x7ffeac464175) at r600_state_common.c:869 #15 0x00007f94d053f077 in r600_update_derived_state (rctx=0x7f94fa76f000) at r600_state_common.c:1687 #16 0x00007f94d053f077 in r600_draw_vbo (ctx=0x7f94fa76f000, info=0x7ffeac464490) at r600_state_common.c:1968 #17 0x00007f94d031e710 in u_vbuf_draw_vbo (mgr=0x7f94f90af000, info=<optimized out>) at util/u_vbuf.c:1150 #18 0x00007f94d0109a2f in st_draw_vbo (ctx=<optimized out>, prims=0x7ffeac464550, nr_prims=<optimized out>, ib=0x0, index_bounds_valid=<optimized out>, min_index=<optimized out>, max_index=<optimized out>, tfb_vertcount=0x0, stream=0, indirect=0x0) at state_tracker/st_draw.c:227 #19 0x00007f94d00ce70b in vbo_draw_arrays (ctx=ctx@entry=0x7f94ed8c2000, mode=mode@entry=4, start=start@entry=1320, count=count@entry=87, numInstances=numInstances@entry=1, baseInstance=baseInstance@entry=0, drawID=0) at vbo/vbo_exec_array.c:486 #20 0x00007f94d00ce79a in vbo_exec_DrawArraysInstanced (mode=4, start=1320, count=87, numInstances=1) at vbo/vbo_exec_array.c:677 #21 0x00007f96a8c92855 in mozilla::WebGLContext::DrawArraysInstanced(unsigned int, int, int, int, char const*) () at /usr/lib64/firefox/libxul.so #22 0x00007f96a892f967 in mozilla::dom::WebGLRenderingContextBinding::drawArrays(JSContext*, JS::Handle<JSObject*>, mozilla::WebGLContext*, JSJitMethodCallArgs const&) () at /usr/lib64/firefox/libxul.so #23 0x00007f96a8c0e1c9 in mozilla::dom::GenericBindingMethod(JSContext*, unsigned int, JS::Value*) () at /usr/lib64/firefox/libxul.so #24 0x000021a97a2b79c6 in () #25 0x0000000000000000 in () The value of alu_node& n in frame #4, where things go downhill: (gdb) f 4 #4 r600_sb::expr_handler::fold_alu_op3 (this=0x7f94f3deeea8, n=...) at sb/sb_expr.cpp:951 951 value* v2 = n.src[2]->gvalue(); (gdb) list 946 return true; 947 } 948 949 value* v0 = n.src[0]->gvalue(); 950 value* v1 = n.src[1]->gvalue(); 951 value* v2 = n.src[2]->gvalue(); 952 953 /* LDS instructions look like op3 with no dst - don't fold. */ 954 if (!n.dst[0]) 955 return false; (gdb) p n $1 = (r600_sb::alu_node &) @0x7f94d904c7d0: {<r600_sb::node> = {_vptr.node = 0x7f94d0b692f8 <vtable for r600_sb::alu_node+16>, prev = 0x7f94d904c738, next = 0x7f94d904c938, parent = 0x7f94d904c000, type = r600_sb::NT_OP, subtype = r600_sb::NST_ALU_INST, flags = r600_sb::NF_EMPTY, pred = 0x0, dst = std::vector of length 1, capacity 1 = {0x7f94d90616f0}, src = std::vector of length 2, capacity 3 = {0x7f94d905d110, 0x7f94d905cf70}}, bc = { op_ptr = 0x7f94d0b628a0 <r600_alu_op_table>, op = 0, src = {{sel = 252, chan = 2, neg = 0, abs = 0, rel = 0, value = {{i = 0, u = 0, f = 0}}}, {sel = 0, chan = 0, neg = 0, abs = 0, rel = 0, value = {{i = 0, u = 0, f = 0}}}, {sel = 252, chan = 2, neg = 0, abs = 0, rel = 0, value = {{i = 0, u = 0, f = 0}}}}, dst_gpr = 6, dst_chan = 2, dst_rel = 0, clamp = 0, omod = 0, bank_swizzle = 0, index_mode = 0, last = 1, pred_sel = 0, fog_merge = 0, write_mask = 0, update_exec_mask = 0, update_pred = 0, slot = 2, lds_idx_offset = 0, slot_flags = AF_VS}} Created attachment 140354 [details]
apitrace of google maps in google chrome that asserts
$ apitrace replay chrome.trace
132 @0 glXCreateWindow(dpy = 0xb9b58c2b800, config = 0xb9b58fbd1c0, win = 169869314, attribList = {}) = 169869315
132: warning: unsupported glXCreateWindow call
133 @0 glXDestroyWindow(dpy = 0xb9b58c2b800, window = 169869315)
133: warning: unsupported glXDestroyWindow call
156 @0 glXCreateWindow(dpy = 0xb9b58c2b800, config = 0xb9b58fbd1c0, win = 169869316, attribList = {}) = 169869317
156: warning: unsupported glXCreateWindow call
692 @0 glXDestroyWindow(dpy = 0xb9b58c2b800, window = 169869317)
692: warning: unsupported glXDestroyWindow call
716 @0 glXCreateWindow(dpy = 0xb9b58c2b800, config = 0xb9b58fbd1c0, win = 169869322, attribList = {}) = 169869323
716: warning: unsupported glXCreateWindow call
1588 @0 glXCreateWindow(dpy = 0xb9b58c2b800, config = 0xb9b58fbd1c0, win = 169869324, attribList = {}) = 169869331
1588: warning: unsupported glXCreateWindow call
/usr/include/c++/8/bits/stl_vector.h:932: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = r600_sb::value*; _Alloc = std::allocator<r600_sb::value*>; std::vector<_Tp, _Alloc>::reference = r600_sb::value*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.
apitrace: warning: caught signal 6
113301: error: caught an unhandled exception
/usr/bin/glretrace+0x299350
/lib64/libpthread.so.0+0x11fbf
/lib64/libc.so.6: gsignal+0x10b
/lib64/libc.so.6: abort+0x12a
/usr/lib64/dri/r600_dri.so+0x6c4964
/usr/lib64/dri/r600_dri.so+0x6c10cc
/usr/lib64/dri/r600_dri.so+0x6e33fb
/usr/lib64/dri/r600_dri.so+0x6c8380
/usr/lib64/dri/r600_dri.so+0x6c856d
/usr/lib64/dri/r600_dri.so+0x6cc56f
/usr/lib64/dri/r600_dri.so+0x6cc650
/usr/lib64/dri/r600_dri.so+0x6bc0b9
/usr/lib64/dri/r600_dri.so+0x688bcc
/usr/lib64/dri/r600_dri.so+0x692e63
/usr/lib64/dri/r600_dri.so+0x694076
/usr/lib64/dri/r600_dri.so+0x47370f
/usr/lib64/dri/r600_dri.so+0x25ea2e
/usr/lib64/dri/r600_dri.so+0x22370a
/usr/lib64/dri/r600_dri.so+0x223a4b
/usr/bin/glretrace+0x1325ae
/usr/bin/glretrace+0x649de
/usr/bin/glretrace+0x65184
/usr/bin/glretrace+0x5fbcf
/lib64/libc.so.6: __libc_start_main+0xea
/usr/bin/glretrace: _start+0x29
?
apitrace: info: taking default action for signal 6
At a quick try I can't reproduce this on my HD 5750. Albeit I tried chromium, and probably not the same version. The replay doesn't assert neither, though I get lots of link errors due to usage of glProgramBinary it seems (also needed gl/glsl version overrides). Might need to disable ARB_get_program_binary when capturing? From the backtrace it sort of looks like trying to fold a alu3 op with only 2 sources but I've no idea really. I hope it's not compiler dependent (valgrind was of no use neither). Since this appears to be a regression, you could try to git bisect. (In reply to Roland Scheidegger from comment #6) > At a quick try I can't reproduce this on my HD 5750. Albeit I tried > chromium, and probably not the same version. > The replay doesn't assert neither, though I get lots of link errors due to > usage of glProgramBinary it seems (also needed gl/glsl version overrides). > Might need to disable ARB_get_program_binary when capturing? > From the backtrace it sort of looks like trying to fold a alu3 op with only > 2 sources but I've no idea really. I hope it's not compiler dependent > (valgrind was of no use neither). > Since this appears to be a regression, you could try to git bisect. Please configure the build with: CXXFLAGS="-Wp,-D_GLIBCXX_ASSERTIONS" ./autogen.sh Rest assured that the assert will be triggered. I have tried with build of Mesa 18.0.5 (git-ca0037aaef). Just invoke firefox from command line and navigate to Google Maps. Created attachment 140382 [details]
apitrace of google maps in google chrome that asserts with current mainline mesa
Attached trace that fails with current mainline
commit e83cd38eac26167cdab9205f17fd276c73daf3fb (HEAD -> master, origin/master, origin/HEAD)
when configured with _GLIBCXX_ASSERTIONS.
There are a bunch of harmless warnings, but the trace works with R600_DEBUG=nosb and fails without:
132 @0 glXCreateWindow(dpy = 0xb154f920800, config = 0xb154fcc10e0, win = 182452226, attribList = {}) = 182452227
132: warning: unsupported glXCreateWindow call
133 @0 glXDestroyWindow(dpy = 0xb154f920800, window = 182452227)
133: warning: unsupported glXDestroyWindow call
156 @0 glXCreateWindow(dpy = 0xb154f920800, config = 0xb154fcc10e0, win = 182452228, attribList = {}) = 182452229
156: warning: unsupported glXCreateWindow call
694 @0 glXDestroyWindow(dpy = 0xb154f920800, window = 182452229)
694: warning: unsupported glXDestroyWindow call
718 @0 glXCreateWindow(dpy = 0xb154f920800, config = 0xb154fcc10e0, win = 182452234, attribList = {}) = 182452235
718: warning: unsupported glXCreateWindow call
1591 @0 glXCreateWindow(dpy = 0xb154f920800, config = 0xb154fcc10e0, win = 182452236, attribList = {}) = 182452243
1591: warning: unsupported glXCreateWindow call
4762 @0 glProgramBinary(program = 6, binaryFormat = 34655, binary = blob(5017), length = 5017)
4762: warning: link failed
4879 @0 glProgramBinary(program = 9, binaryFormat = 34655, binary = blob(4629), length = 4629)
4879: warning: link failed
4966 @0 glProgramBinary(program = 12, binaryFormat = 34655, binary = blob(5177), length = 5177)
4966: warning: link failed
60680 @0 glProgramBinary(program = 24, binaryFormat = 34655, binary = blob(6349), length = 6349)
60680: warning: link failed
61187 @0 glProgramBinary(program = 27, binaryFormat = 34655, binary = blob(11341), length = 11341)
61187: warning: link failed
67815 @0 glProgramBinary(program = 30, binaryFormat = 34655, binary = blob(6993), length = 6993)
67815: warning: link failed
/usr/include/c++/8/bits/stl_vector.h:932: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = r600_sb::value*; _Alloc = std::allocator<r600_sb::value*>; std::vector<_Tp, _Alloc>::reference = r600_sb::value*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.
apitrace: warning: caught signal 6
68048: error: caught an unhandled exception
/usr/bin/glretrace+0x299350
/lib64/libpthread.so.0+0x11fbf
/lib64/libc.so.6: gsignal+0x10b
/lib64/libc.so.6: abort+0x12a
/ssd/uros/git/mesa/lib/gallium//r600_dri.so+0x639afa
...
(In reply to ubizjak from comment #7) > Please configure the build with: > > CXXFLAGS="-Wp,-D_GLIBCXX_ASSERTIONS" ./autogen.sh That didn't do anything neither. However I figured out the problem more or less in the code, and some googling said that using -D_GLIBCXX_DEBUG should make it trigger reliably, and indeed it does... The issue is that (you already showed that actually) src = std::vector of length 2, capacity 3 = {0x7f94d905d110, 0x7f94d905cf70}} And trying to access element src[2]. There's an early exit in the function if src.size() is < 3. Since this didn't hit, apparently fold_assoc() resized the vector. And indeed it can do that (there's an explicit n->src.resize(2) somewhere, and it would still return false in this case). I think something like this should do: diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp b/src/gallium/drivers/r600/sb/sb_expr.cpp index 1df78da660..c77b9f2d7d 100644 --- a/src/gallium/drivers/r600/sb/sb_expr.cpp +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp @@ -945,6 +945,8 @@ bool expr_handler::fold_alu_op3(alu_node& n) { if (!sh.safe_math && (n.bc.op_ptr->flags & AF_M_ASSOC)) { if (fold_assoc(&n)) return true; + else if (n.src.size() < 3) + return fold_alu_op2(n); } value* v0 = n.src[0]->gvalue(); But I'm not entirely convinced it's really the right thing to do (maybe what fold_assoc() did isn't quite what it's supposed to do?). It fixes the particular fold_alu_op3 crash for me, but the shader (not sure it's actually the same one) crashes later anyway: /usr/include/c++/4.8/debug/safe_iterator.h:225: Error: attempt to copy from a singular iterator. Objects involved in the operation: iterator "this" @ 0x0x7ffff3c71e80 { type = Thread 1 "glretrace" received signal SIGSEGV, Segmentation fault. ... #3 0x00007ffff36538cb in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<r600_sb::region_node**, std::__cxx1998::vector<r600_sb::region_node*, std::allocator<r600_sb::region_node*> > >, std::__debug::vector<r600_sb::region_node*, std::allocator<r600_sb::region_node*> > >::operator= (this=0x7fffffffb840, __x=) at /usr/include/c++/4.8/debug/safe_iterator.h:221 #4 0x00007ffff365376d in std::reverse_iterator<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<r600_sb::region_node**, std::__cxx1998::vector<r600_sb::region_node*, std::allocator<r600_sb::region_node*> > >, std::__debug::vector<r600_sb::region_node*, std::allocator<r600_sb::region_node*> > > >::operator= (this=0x7fffffffb840) at /usr/include/c++/4.8/bits/stl_iterator.h:96 #5 r600_sb::if_conversion::run (this=0x7fffffffbee0) at sb/sb_if_conversion.cpp:46 #6 0x00007ffff3632765 in r600_sb_bytecode_process (rctx=0x10e4660, bc=0x1980bf0, pshader=0x1980be8, dump_bytecode=1, optimize=1) at sb/sb_core.cpp:195 I don't know though if that's just due to the D_GLIBCXX_DEBUG thing or it will also cause crashes without it in some other libstdc++ versions... (in any case, it probably should be fixed, but this code isn't my area of expertise). https://patchwork.freedesktop.org/series/45627/ should fix the second breakage, after Roland's fix for the first one. (In reply to Roland Scheidegger from comment #9) > (In reply to ubizjak from comment #7) > > Please configure the build with: > > > > CXXFLAGS="-Wp,-D_GLIBCXX_ASSERTIONS" ./autogen.sh > > That didn't do anything neither. However I figured out the problem more or > less in the code, and some googling said that using -D_GLIBCXX_DEBUG should > make it trigger reliably, and indeed it does... Great ;) > The issue is that (you already showed that actually) > src = std::vector of length 2, capacity 3 = {0x7f94d905d110, 0x7f94d905cf70}} > And trying to access element src[2]. > There's an early exit in the function if src.size() is < 3. Since this > didn't hit, apparently fold_assoc() resized the vector. And indeed it can do > that (there's an explicit n->src.resize(2) somewhere, and it would still > return false in this case). > > I think something like this should do: > diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp > b/src/gallium/drivers/r600/sb/sb_expr.cpp > index 1df78da660..c77b9f2d7d 100644 > --- a/src/gallium/drivers/r600/sb/sb_expr.cpp > +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp > @@ -945,6 +945,8 @@ bool expr_handler::fold_alu_op3(alu_node& n) { > if (!sh.safe_math && (n.bc.op_ptr->flags & AF_M_ASSOC)) { > if (fold_assoc(&n)) > return true; > + else if (n.src.size() < 3) > + return fold_alu_op2(n); > } > > value* v0 = n.src[0]->gvalue(); I wonder if we should fix expr_handler::fold_assoc instead. Digging through the code, fold_assoc is called only from expr_handler::fold_alu_op2 and expr_handler::fold_alu_op3. When true is returned, it triggers an early exit from expr_handler::fold_alu_op{2,3} functions. The part we are looking for in expr_handler::fold_assoc is: } else { // MULADD => ADD n->src[0] = n->src[2]; n->bc.src[0] = n->bc.src[2]; n->src[1] = sh.get_const_value(cr); memset(&n->bc.src[1], 0, sizeof(bc_alu_src)); n->src.resize(2); n->bc.set_op(ALU_OP2_ADD); } So, let's call fold_alu_op2 here and return true to trigger early exit in expr_handler::fold_alu_op3. This is what the code a couple of lines above the presented code does when an operand degenerates to mov. The (effectively the same patch as yours) proposed patch would be: diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp b/src/gallium/drivers/r600/sb/sb_expr.cpp index 7a5d62c8e8..a609d1377f 100644 --- a/src/gallium/drivers/r600/sb/sb_expr.cpp +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp @@ -714,6 +714,8 @@ bool expr_handler::fold_assoc(alu_node *n) { n->src.resize(2); n->bc.set_op(ALU_OP2_ADD); + fold_alu_op2(*n); + return true; } } else if (last_arg >= 0) { n->src[0] = a->src[last_arg]; WDYT? On a side note, maybe -D_GLIBCXX_ASSERTIONS should be added to mesa testsuite. This is the flag that Fedora 28 builds use by default now, so it would be beneficial to catch these bugs early in the development cycle, before they reach users. (In reply to ubizjak from comment #11) > The (effectively the same patch as yours) proposed patch would be: > > diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp > b/src/gallium/drivers/r600/sb/sb_expr.cpp > index 7a5d62c8e8..a609d1377f 100644 > --- a/src/gallium/drivers/r600/sb/sb_expr.cpp > +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp > @@ -714,6 +714,8 @@ bool expr_handler::fold_assoc(alu_node *n) { > > n->src.resize(2); > n->bc.set_op(ALU_OP2_ADD); > + fold_alu_op2(*n); > + return true; > } > } else if (last_arg >= 0) { > n->src[0] = a->src[last_arg]; > > WDYT? I am not quite convinced it's ok to return true (in fold_alu_op3) if the expression hasn't really been folded. You are quite right that just above it looks similar, but all other places always return the return value of fold_alu_op2 when calling into it from fold_alu_op3. (Not saying it isn't correct, just saying I can't tell...) > On a side note, maybe -D_GLIBCXX_ASSERTIONS should be added to mesa > testsuite. This is the flag that Fedora 28 builds use by default now, so it > would be beneficial to catch these bugs early in the development cycle, > before they reach users. I was actually going to suggest to enable -D_GLIBCXX_DEBUG for debug builds always, but it wouldn't work in general (due to not being able to link anything which hasn't been compiled with it, for instance llvm). So using -D_GLIBCXX_ASSERTIONS instead (which should be link-compatible) looks like a good idea to me, albeit it didn't do anything for me (too old libstdc++ I suppose). I'm also not quite sure which issues it actually catches (vs the DEBUG one) but even if it just catches some that should be a plus... (In reply to Roland Scheidegger from comment #12) > (In reply to ubizjak from comment #11) > > The (effectively the same patch as yours) proposed patch would be: > > > > diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp > > b/src/gallium/drivers/r600/sb/sb_expr.cpp > > index 7a5d62c8e8..a609d1377f 100644 > > --- a/src/gallium/drivers/r600/sb/sb_expr.cpp > > +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp > > @@ -714,6 +714,8 @@ bool expr_handler::fold_assoc(alu_node *n) { > > > > n->src.resize(2); > > n->bc.set_op(ALU_OP2_ADD); > > + fold_alu_op2(*n); > > + return true; > > } > > } else if (last_arg >= 0) { > > n->src[0] = a->src[last_arg]; > > > > WDYT? > > I am not quite convinced it's ok to return true (in fold_alu_op3) if the > expression hasn't really been folded. You are quite right that just above it > looks similar, but all other places always return the return value of > fold_alu_op2 when calling into it from fold_alu_op3. > (Not saying it isn't correct, just saying I can't tell...) Beeing a newcomer, I'm also not too familiar with this code, and there are no comments on what the return value really means. Instead of guessing, is it possible for you to invite the author or other knowledgeable people from mesa community to the discussion in this bugreport? Looking at the commit logs of src/gallium/drivers/r600/sb, there are quite some experts that can perhaps help here... I think Roland's first patch is correct, just call fold_alu_op2 if we get back 2 sources from fold_assoc. return true is for when we've finished all folding on that instruction, so I don't think that's correct without calling fold_alu_op2 (In reply to Dave Airlie from comment #14) > I think Roland's first patch is correct, just call fold_alu_op2 if we get > back 2 sources from fold_assoc. return true is for when we've finished all > folding on that instruction, so I don't think that's correct without calling > fold_alu_op2 LGTM then (it also fixes the failure for me), but can we please drop the "else" from the patch, it is really confusing. (In reply to Dave Airlie from comment #14) > I think Roland's first patch is correct, just call fold_alu_op2 if we get > back 2 sources from fold_assoc. return true is for when we've finished all > folding on that instruction, so I don't think that's correct without calling > fold_alu_op2 Well it would still be called (in fold_assoc instead) in this case so the only difference in the end really is that fold_alu_op3 would return false instead of true if fold_alu_op2 (called when fold_assoc reduced number of src to 2) returned false. If the return value is just indicating if we're finished with folding, then I suppose it does not actually matter for correctness either way This code was all written by Vadim Girlin, but I don't think he's still around here. (In reply to ubizjak from comment #15) > (In reply to Dave Airlie from comment #14) > > I think Roland's first patch is correct, just call fold_alu_op2 if we get > > back 2 sources from fold_assoc. return true is for when we've finished all > > folding on that instruction, so I don't think that's correct without calling > > fold_alu_op2 > > LGTM then (it also fixes the failure for me), but can we please drop the > "else" from the patch, it is really confusing. Sure that can be done. (I was actually also wondering if the test for n.src.size() < 3 at the beginning of the function is actually necessary, are we really supposed to have already fewer sources when this is called? But I couldn't quite answer that. This code is a bit confusing to me.) (In reply to Roland Scheidegger from comment #12) > (In reply to ubizjak from comment #11) > > The (effectively the same patch as yours) proposed patch would be: > > > > diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp > > b/src/gallium/drivers/r600/sb/sb_expr.cpp > > index 7a5d62c8e8..a609d1377f 100644 > > --- a/src/gallium/drivers/r600/sb/sb_expr.cpp > > +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp > > @@ -714,6 +714,8 @@ bool expr_handler::fold_assoc(alu_node *n) { > > > > n->src.resize(2); > > n->bc.set_op(ALU_OP2_ADD); > > + fold_alu_op2(*n); > > + return true; > > } > > } else if (last_arg >= 0) { > > n->src[0] = a->src[last_arg]; > > > > WDYT? > > I am not quite convinced it's ok to return true (in fold_alu_op3) if the > expression hasn't really been folded. You are quite right that just above it > looks similar, but all other places always return the return value of > fold_alu_op2 when calling into it from fold_alu_op3. > (Not saying it isn't correct, just saying I can't tell...) I guess it is OK. There already is the same pattern in fold_mul_add where it tries to fold: > ADD x, MUL(y, z) into: > MULADD y, z, x and when it succeeds, it does: > > fold_alu_op3(*n); > return true; That means that no matter if fold_alu_op3 folds the instruction any further, fold_mul_add returns true because it converted ADD and MUL into MULADD. In fold_alu_op2 where fold_mul_add is called, the return value is used in an early return condition: > if (n.bc.op == ALU_OP2_ADD) { > if (fold_mul_add(&n)) > return true; > } It seems OK to me to return true after calling fold_alu_op2 in fold_alu_op3 as a sign that the node should not be folded as op3 anymore. Also, the return value of fold_alu_op* is thrown away anyway here: > bool expr_handler::try_fold(value* v) { > assert(!v->gvn_source); > > if (v->def) Here expr_handler::try_fold(node* n) calls the fold_alu_op* methods: > try_fold(v->def); > > if (v->gvn_source) > return true; > > return false; > } I like the patch that ubizjak@gmail.com posted: > diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp b/src/gallium/drivers/r600/sb/sb_expr.cpp > index 1df78da660..8cbff6f577 100644 > --- a/src/gallium/drivers/r600/sb/sb_expr.cpp > +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp > @@ -723,6 +723,8 @@ bool expr_handler::fold_assoc(alu_node *n) { > > n->src.resize(2); > n->bc.set_op(ALU_OP2_ADD); > + fold_alu_op2(n); > + return true; > } > } else if (last_arg >= 0) { > n->src[0] = a->src[last_arg]; By the way, I can reproduce the same problem when I open Google Maps in Firefox. The tab crashes during loading. After few crashes it stops to use WebGL and then it starts to work. Sorry for the unnecessary bump. I did not notice that a patch is already in the staging/18.1 branch: https://github.com/mesa3d/mesa/commit/d68f2d7edeedebf2db58b28a9ac6befb7757afb7 Roland, is this fixed by? commit 817efd89685efc6b5866e09cbdad01c4ff21c737 Author: Roland Scheidegger <sroland@vmware.com> Date: Wed Jul 4 04:44:17 2018 +0200 r600/sb: fix crash in fold_alu_op3 fold_assoc() called from fold_alu_op3() can lower the number of src to 2, which then leads to an invalid access to n.src[2]->gvalue(). This didn't seem to have caused much harm in the past, but on Fedora 28 it will crash (presumably because -D_GLIBCXX_ASSERTIONS is used, although with libstdc++ 4.8.5 this didn't do anything, -D_GLIBCXX_DEBUG was needed to show the issue). An alternative fix would be to instead call fold_alu_op2() from within fold_assoc() when the number of src is reduced and return always TRUE from fold_assoc() in this case, with the only actual difference being the return value from fold_alu_op3() then. I'm not sure what the return value actually should be in this case (or whether it even can make a difference). https://bugs.freedesktop.org/show_bug.cgi?id=106928 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Dave Airlie <airlied@redhat.com> Yes, should be fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.