Bug 98595

Summary: glsl: ralloc assertion "info->canary == CANARY" failed
Product: Mesa Reporter: Jonathan Gray <jsg>
Component: glsl-compilerAssignee: mesa-dev
Status: RESOLVED WONTFIX QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: OpenBSD   
Whiteboard:
i915 platform: i915 features:
Attachments: proposed fix (or work-around)

Description Jonathan Gray 2016-11-05 02:54:20 UTC
With Mesa git (0c17b0b6f089e325de6a3f871c8d799326be4202) I now see the following assertion triggered on OpenBSD/amd64 with i965 and Broadwell (
Intel HD Graphics 5500).

X.Org X Server 1.18.4
Release Date: 2016-07-19
X Protocol Version 11, Revision 0
Build Operating System: OpenBSD 6.0 amd64 
Current Operating System: OpenBSD stan.jsg.id.au 6.0 GENERIC.MP#2 amd64
Build Date: 19 October 2016  01:54:48PM
 
Current version of pixman: 0.34.0
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sat Nov  5 13:39:12 2016
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/X11R6/share/X11/xorg.conf.d"
Mesa warning: Kernel 4.1 required to properly query GPU properties.

assertion "info->canary == CANARY" failed: file "ralloc.c", line 84, function "get_header"

Program received signal SIGABRT, Aborted.
0x000008d39043788a in thrkill () at <stdin>:2
2       <stdin>: No such file or directory.
        in <stdin>
Current language:  auto; currently asm
(gdb) bt
#0  0x000008d39043788a in thrkill () at <stdin>:2
#1  0x000008d3903e9559 in *_libc_abort () at /usr/src/lib/libc/stdlib/abort.c:52
#2  0x000008d390411914 in *_libc___assert2 (file=Variable "file" is not available.
) at /usr/src/lib/libc/gen/assert.c:52
#3  0x000008d2be38d4f5 in get_header (ptr=Variable "ptr" is not available.
) at ralloc.c:84
#4  0x000008d2be38d50d in ralloc_set_destructor (ptr=Variable "ptr" is not available.
) at ralloc.c:347
#5  0x000008d2be3618e6 in glsl_symbol_table::add_type (this=0x8d36c6f91b0, name=0x8d38a69c8f0 "void", t=0x8d2beaba7e0) at glsl/glsl_symbol_table.cpp:30
#6  0x000008d2be3303ea in _mesa_glsl_initialize_types (state=0x8d325402830) at glsl/builtin_types.cpp:270
#7  0x000008d2be3578a5 in _mesa_glsl_parse (state=0x8d325402830) at glsl_parser.yy:311
#8  0x000008d2be360313 in _mesa_glsl_compile_shader (ctx=0x8d2baac2030, shader=0x8d2daf2d230, dump_ast=false, dump_hir=false) at glsl/glsl_parser_extras.cpp:1911
#9  0x000008d2be1c3c72 in _mesa_compile_shader (ctx=0x8d2baac2030, sh=0x8d2daf2d230) at main/shaderapi.c:1034
#10 0x000008d372d30986 in glamor_compile_glsl_prog () from /usr/X11R6/lib/modules/libglamoregl.so
#11 0x000008d372d30f1c in glamor_init_finish_access_shaders () from /usr/X11R6/lib/modules/libglamoregl.so
#12 0x000008d372d2c328 in glamor_init () from /usr/X11R6/lib/modules/libglamoregl.so
#13 0x000008d2f9fb5a99 in ScreenInit () from /usr/X11R6/lib/modules/drivers/modesetting_drv.so
#14 0x000008d0a390d60a in AddScreen () from /usr/X11R6/bin/Xorg
#15 0x000008d0a395b58c in InitOutput () from /usr/X11R6/bin/Xorg
#16 0x000008d0a3918448 in dix_main () from /usr/X11R6/bin/Xorg
#17 0x000008d0a3901942 in _start () from /usr/X11R6/bin/Xorg
#18 0x0000000000000000 in ?? ()
Comment 1 Timothy Arceri 2016-11-05 07:30:01 UTC

*** This bug has been marked as a duplicate of bug 98592 ***
Comment 2 Marek Olšák 2016-11-05 17:52:59 UTC
For some reason, OpenBSD things symbol_table_entry has a non-trivial destructor and calls ralloc_set_destructor. However, it doesn't have a non-trivial destructor. This looks like a C++ compiler bug.
Comment 3 Mark Janes 2016-11-05 18:03:28 UTC
I can't say that this is a duplicate of bug 98592, which was fixed for intel hardware on linux.
Comment 4 Marek Olšák 2016-11-05 18:07:56 UTC
It's not a duplicated of that bug.

This bug is different. It looks like the HAS_TRIVIAL_DESTRUCTOR macro isn't done correctly on OpenBSD.
Comment 5 Jonathan Gray 2016-11-06 00:47:41 UTC
Indeed the abort is not triggered with gcc 4.9 but is with gcc 4.2.1.

For gcc < 4.4.3 and for clang HAS_TRIVIAL_DESTRUCTOR is set to __has_trivial_destructor(T) otherwise it is (false).

Everything worked last weeked, though that was before cc6aa1d161280f10ded7834d1ec2413bc97589fe.
Comment 6 Timothy Arceri 2016-11-06 05:44:25 UTC
(In reply to Jonathan Gray from comment #5)
> Indeed the abort is not triggered with gcc 4.9 but is with gcc 4.2.1.
> 
> For gcc < 4.4.3 and for clang HAS_TRIVIAL_DESTRUCTOR is set to
> __has_trivial_destructor(T) otherwise it is (false).
> 
> Everything worked last weeked, though that was before
> cc6aa1d161280f10ded7834d1ec2413bc97589fe.

Did you bisect to that commit? Besides me jumping the gun (sorry about that) and marking this a duplicate of a bug caused by that change it doesn't look like it should have anything to do with this bug.

More likely to do will the linear allocator changes that landed 6 days ago. e.g 
23e373eb4f9ca374313306701890642c30e8877e
Comment 7 Jonathan Gray 2016-11-06 11:11:31 UTC
Sorry that commit was just looking over commits.  An actual bisect results in
"a4a93103fb8f5c21c4cd17e89f07badfab14c0ab is the first bad commit".

glsl: use the linear allocator for ast_node and derived classes
Comment 8 Brian Paul 2016-11-09 16:40:20 UTC
FYI: I'm seeing the same assertion/regression with Visual Studio.
Comment 9 Brian Paul 2016-11-09 18:03:25 UTC
Created attachment 127868 [details] [review]
proposed fix (or work-around)

This fixes things for me with MSVC.  I'm not 100% sure if there's any negative side-effects since I'm still learning my way through ralloc and the new linear allocator.

Comments?
Comment 10 Brian Paul 2016-11-09 22:06:34 UTC
I pushed 5b92008ae279962dc09bcf98c9e5511a325a2bd9, which is a tweaked version of George Kyriazis's patch.  He says he has a v2 patch.  And since this is still broken with older gcc, I'm leaving this open for now.
Comment 11 Jonathan Gray 2016-11-09 23:03:57 UTC
The proposed fix stops the assertion for me.

With i965 I now hit a different problem that seems to be unrelated, the (OpenBSD) kernel logs
error: [drm:pid19649:i915_context_is_banned] *ERROR* context hanging too fast, declaring banned!

Running Xorg with softpipe is fine though.
Comment 12 Brian Paul 2016-11-14 17:32:57 UTC
(In reply to Jonathan Gray from comment #11)
> The proposed fix stops the assertion for me.
> 
> With i965 I now hit a different problem that seems to be unrelated, the
> (OpenBSD) kernel logs
> error: [drm:pid19649:i915_context_is_banned] *ERROR* context hanging too
> fast, declaring banned!
> 
> Running Xorg with softpipe is fine though.

If it's an unrelated issue, please create a new bug report and close this one.  Thanks.
Comment 13 Jonathan Gray 2016-11-16 22:48:00 UTC
The "proposed fix" patch attached here does not seem to have been applied to master yet?
Comment 14 Brian Paul 2016-11-16 22:54:49 UTC
(In reply to Jonathan Gray from comment #13)
> The "proposed fix" patch attached here does not seem to have been applied to
> master yet?

A different, better fix was committed and that fixed the assertion.
Comment 15 Jonathan Gray 2016-11-16 23:31:16 UTC
It remains broken here with latest master (a456ea17fb460a68e28c13dd4b7086dc4309f410).  Is the fix you are referring to the commit related to microsoft compilers or a different one?

Starting program: /usr/X11R6/bin/glxgears 
assertion "info->canary == CANARY" failed: file "ralloc.c", line 84, function "get_header"

Program received signal SIGABRT, Aborted.
0x000002b372f3794a in thrkill () at <stdin>:2
2       <stdin>: No such file or directory.
        in <stdin>
Current language:  auto; currently asm
(gdb) bt
#0  0x000002b372f3794a in thrkill () at <stdin>:2
#1  0x000002b372f4a499 in *_libc_abort () at /usr/src/lib/libc/stdlib/abort.c:52
#2  0x000002b372f69d04 in *_libc___assert2 (file=Variable "file" is not available.
) at /usr/src/lib/libc/gen/assert.c:52
#3  0x000002b32df6e9e5 in get_header (ptr=Variable "ptr" is not available.
) at ralloc.c:84
#4  0x000002b32df6e9fd in ralloc_set_destructor (ptr=Variable "ptr" is not available.
) at ralloc.c:347
#5  0x000002b32df42db6 in glsl_symbol_table::add_type (this=0x2b2f1c9b230, name=0x2b34faa4730 "void", 
    t=0x2b32e64b360) at glsl/glsl_symbol_table.cpp:30
#6  0x000002b32df1179a in _mesa_glsl_initialize_types (state=0x2b2d8467830) at glsl/builtin_types.cpp:270
#7  0x000002b32dcf5b00 in _mesa_get_fixed_func_fragment_program (ctx=0x2b34c248000)
    at main/ff_fragment_shader.cpp:1226
#8  0x000002b32dda5c95 in _mesa_update_state_locked (ctx=0x2b34c248000) at main/state.c:169
#9  0x000002b32dda5d95 in _mesa_update_state (ctx=0x2b34c248000) at main/state.c:497
#10 0x000002b32dcbaa80 in _mesa_Clear (mask=16640) at main/clear.c:172
#11 0x000002b0ba00192e in ?? () from /usr/X11R6/bin/glxgears
#12 0x000002b0ba00256d in ?? () from /usr/X11R6/bin/glxgears
#13 0x000002b0ba0007d2 in ?? () from /usr/X11R6/bin/glxgears
#14 0x0000000000000000 in ?? ()
Comment 16 Emil Velikov 2016-11-29 18:51:07 UTC
Not sure how exactly things are supposed to behave if we have a C++ construct wrapped in extern "C".

Namely: shouldn't the DECLARE_ALLOC_CXX_OPERATORS_TEMPLATE macro be outside of the extern C wrapper ?
Comment 17 Marek Olšák 2016-12-09 00:22:05 UTC
Macros are unaffected by extern "C".

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.