Bug 108933 - Unreal Tournament (UT99) segfault on opengl init
Summary: Unreal Tournament (UT99) segfault on opengl init
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-03 18:59 UTC by network723
Modified: 2018-12-12 17:28 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
valgrind log (339.05 KB, text/x-log)
2018-12-06 07:44 UTC, network723
Details
Workaround for mesa crashing on UT99 because of static global constructor from C++ iostream (4.14 KB, patch)
2018-12-08 14:51 UTC, iive
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description network723 2018-12-03 18:59:58 UTC
Unreal Tournament crashed upon opengl context creation. Tried both with stock OpenGLDrv.so and with UTPG one.

Mesa-git 89b4798c0619a2ba99046d5ad36f0e6851625f7a, tried both with radeonsi and llvmpipe, same result.


Game used to work with Mesa 18.0

Program received signal SIGSEGV, Segmentation fault.
0xeee7b5e2 in ?? () from /usr/lib/libstdc++.so.6
(gdb) bt
#0  0xeee7b5e2 in ?? () from /usr/lib/libstdc++.so.6
#1  0xeedeaa4a in bool std::has_facet<std::ctype<char> >(std::locale const&) ()
   from /usr/lib/libstdc++.so.6
#2  0xeeddca1f in std::basic_ios<char, std::char_traits<char> >::_M_cache_locale(std::locale const&) () from /usr/lib/libstdc++.so.6
#3  0xeeddce8b in std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*) () from /usr/lib/libstdc++.so.6
#4  0xeed82018 in std::ios_base::Init::Init() () from /usr/lib/libstdc++.so.6
#5  0xf464dbcc in _GLOBAL__sub_I_st_glsl_to_tgsi_array_merge.cpp () from /usr/lib/dri/swrast_dri.so
#6  0xf7fe5d3b in call_init.part () from /lib/ld-linux.so.2
#7  0xf7fe5e47 in _dl_init () from /lib/ld-linux.so.2
#8  0xf7fea3f2 in dl_open_worker () from /lib/ld-linux.so.2
#9  0xf7985c9b in _dl_catch_error () from /lib/libc.so.6
#10 0xf7fe9a69 in _dl_open () from /lib/ld-linux.so.2
#11 0xf7f7dc65 in dlopen_doit () from /lib/libdl.so.2
#12 0xf7985c9b in _dl_catch_error () from /lib/libc.so.6
#13 0xf7f7e36e in _dlerror_run () from /lib/libdl.so.2
#14 0xf7f7dcee in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2
#15 0xf55a532c in loader_open_driver () from /usr/lib/libGLX_mesa.so.0
#16 0xf559a95b in driOpenDriver () from /usr/lib/libGLX_mesa.so.0
#17 0xf5599eac in driswCreateScreen () from /usr/lib/libGLX_mesa.so.0
#18 0xf55758de in __glXInitialize () from /usr/lib/libGLX_mesa.so.0
#19 0xf55709d5 in GetGLXPrivScreenConfig () from /usr/lib/libGLX_mesa.so.0
#20 0xf5571428 in glXChooseVisual () from /usr/lib/libGLX_mesa.so.0
#21 0xf7b5dc10 in X11_GL_GetVisual () from ./libSDL-1.1.so.0
#22 0xf7b62e85 in X11_CreateWindow () from ./libSDL-1.1.so.0
#23 0xf7b636cd in X11_SetVideoMode () from ./libSDL-1.1.so.0
#24 0xf7b53c1c in SDL_SetVideoMode () from ./libSDL-1.1.so.0
#25 0xf6b6467a in USDLViewport::ResizeViewport(unsigned int, int, int, int) () from ./SDLDrv.so
#26 0xf6157721 in UOpenGLRenderDevice::SetRes(int, int, int, int) () from ./OpenGLDrv.so
#27 0xf61573a7 in UOpenGLRenderDevice::Init(UViewport *, int, int, int, int) () from ./OpenGLDrv.so
#28 0xf6b641ed in USDLViewport::TryRenderDevice(char const *, int, int, int, int) ()
   from ./SDLDrv.so
#29 0xf6b64f97 in USDLViewport::OpenWindow(unsigned int, int, int, int, int, int) ()
   from ./SDLDrv.so
#30 0xf7da63a2 in UGameEngine::Init(void) () from ./Engine.so
#31 0x0804d7a6 in _start ()
Comment 1 iive 2018-12-06 01:50:42 UTC
Mesa developers need to be able to reproduce the problem and they are not going to install UT to debug this. So we'll have to do a bit more investigating on this one.


There are few things for you to try:

1. See if changing your locale has effect on the crash. That is "LANG" and "LC_*" environment variables. If that makes difference, make sure that you have the locale files and they are not corrupted. (Keep the suspected files, you may need to report bug to glibc or gcc.)

2. The problem might be random memory corruption, since both games are native for linux, try running them under valgrind. It might produce a lot of noise (using uninitialized value ...) but if there is out-of-bound write or use-after-free it should get it.

3. Try git bisect. First see if you can use existing releases to narrow down the moment when things broke, then use them as good and bad points to find what commit broke them. Do 18.0, 18.2, then maybe 18.1 . No need to try different patch versions (aka 18.0.1/2/3/4/..), as they are in separate branches that get changes backported. Do check 18.0, just to be sure that it still works.


I don't have the linux version installed atm, and it would take some manual tweaking to trick the loki installer to run on x64 and Unreal Anthology. Running Windows UT99 under wine crashes at startup, Unreal.log shows that it crashed while glGetString(GL_EXTENSIONS). That might be totally unrelated... but it reminds me of a bug where the game does not have big enough buffer to get all GL_EXTENSIONS.

The game writes log file in ~/.loki/ut/System/*.log see if there is something relevant.
Comment 2 network723 2018-12-06 07:44:04 UTC
Created attachment 142739 [details]
valgrind log

(In reply to iive from comment #1)
> 1. See if changing your locale has effect on the crash.

I've tried with C locale, got same result. Also, I tried different kernel versions, just in case it has something to do with recent cpu vulnerability fixes, but it's also not the case.

> 2. The problem might be random memory corruption, since both games are
> native for linux, try running them under valgrind.

I'm not competent enough to use valgrind correctly, but it's interesting that running 'valgrid ./ut-bin' gives SIGILL fault rather than SIGSEGV. Also, fault in Core.so (game component), but it works fine with game's built-in software renderer (SDLSoftDrv.so)
 
> 3. Try git bisect.

Bisecting this involves lots of cross-compiling, unfortunately I cannot afford it right now :(
 
> Unreal.log shows that it crashed while glGetString(GL_EXTENSIONS).

Too long GL extension list was definitely a problem a some point, but there are plenty of fixed OpenGLDrv.so libraries all over the Internet (they explicitly have it mentioned in changelog). I tried all of them I could find, no changes.
Comment 3 iive 2018-12-06 14:46:31 UTC
I got the UT99 working. 

valgrind doesn't show anything more, the first error is the one from the report.

When trying to narrow mesa releases that work, I got mesa-18.1.7 working and mesa-18.2.0 not working. However the bisect failed.
Even compiling mesa-18.0.0 also produces broken compilation.

Since I've done major update before compiling my mesa-18.2.0, it makes sense that the bug is gcc/glibc related.

I had libstdc++.so.6.25 used, so I got an older libstdc++.so.6.24 one instead. The problem remained. (I'm sure it got used, because I forgot to fix the link and got another error.)

I tried compiling with -O0, but the bug remains. It is not miscompilation, per se. It is more likely to be something related to ABI/API.

Here is the backtrace of current Mesa 19.0.0-devel (git-3b2ad8b290)
---
#1  0xf1be6ba8 in bool std::has_facet<std::ctype<char> >(std::locale const&) () from /usr/lib/libstdc++.so.6
#2  0xf1bd6f1a in std::basic_ios<char, std::char_traits<char> >::_M_cache_locale(std::locale const&) () from /usr/lib/libstdc++.so.6
#3  0xf1bd7399 in std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*) () from /usr/lib/libstdc++.so.6
#4  0xf1b75563 in std::ios_base::Init::Init() () from /usr/lib/libstdc++.so.6
#5  0xf5485d4b in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /usr/include/c++/8.2.0/iostream:74
#6  0xf5485d93 in _GLOBAL__sub_I_st_glsl_to_tgsi_temprename.cpp(void) () at state_tracker/st_glsl_to_tgsi_temprename.cpp:1426
#7  0xf595cc82 in __do_global_ctors_aux () from /usr/lib/xorg/modules/dri/r600_dri.so
#8  0xf2230dc0 in ?? () from /usr/lib/libLLVM-6.0.so
#9  0xf514c025 in _init () from /usr/lib/xorg/modules/dri/r600_dri.so
#10 0xf4df26bc in ?? () from /usr/lib/libLLVM-6.0.so
---

The disassembly of the function in frame #1 looks like this:
---
   [...]
   0xf1be6b99 <+73>:    mov    -0x2a8(%ebx),%eax
   0xf1be6b9f <+79>:    mov    %eax,0x4(%esp)
   0xf1be6ba3 <+83>:    call   0xf1b5a570 <__dynamic_cast@plt>
=> 0xf1be6ba8 <+88>:    test   %eax,%eax
   0xf1be6baa <+90>:    setne  %al
   0xf1be6bad <+93>:    add    $0x18,%esp
   0xf1be6bb0 <+96>:    pop    %ebx
   0xf1be6bb1 <+97>:    ret    
---

Frame #6 is the closing bracket of dump_instruction() debug function.
Hollowing the whole function (#if/#endif) just moved the line number.

I'm not that familiar with C++ and debugging it. Maybe somebody could weight in.
It seems that on init something calls global constructors and one of them needs something that is not yet initialized or something.
Comment 4 Michel Dänzer 2018-12-06 15:44:27 UTC
Which version of g++/gcc are you using? Some 8.2 snapshots have a bug which causes mis-compilation of Mesa code: https://bugzilla.redhat.com/show_bug.cgi?id=1645400
Comment 5 iive 2018-12-06 18:07:34 UTC
The upgrade was from gcc-7.3.0 to gcc-8.2.0. You can see 8.2.0 include in the backtrace.

I don't think that we can blame gcc-8.2.0 for the redhat bugreport, as they do use a development snapshot. (Now stable releases of GCC are always x.y.0).

I would repeat that I did a compile with -O0 for CFLAGS and CXXFLAGS.
Comment 6 iive 2018-12-08 14:51:41 UTC
Created attachment 142752 [details] [review]
Workaround for mesa crashing on UT99 because of static global constructor from C++ iostream

Just few observations so far.

1.
I said that using libstdc++.so.24 gave different error. That's because I replaced it after mesa compilation.
If mesa is compiled with g++-8.2 and libstdc++.so.24 it works with UT99.

2.
Another workaround is to remove completely all "include <iostream>".
There are 3 places where it is used.
"st_glsl_to_tgsi.cpp" and "st_glsl_to_tgsi_temprename.cpp" - for these just put "#define NDEBUG 1" at the top of the files and all output would be disabled.
Things are more complicated for "st_glsl_to_tgsi_array_merge.cpp/h". The debugging there is disabled by default, however not all printing functions are cut out. So you need to add a bunch of extra #if/#endif to disable them. Note that the header file also contains inline functions.
(Also, I'm with older LLVM, so if LLVM ever uses iostream, it may cause the same problem.)

3.
As to why "include <iostream>" causes/triggers the problem.
A bit of googling turned out this problem:
https://isocpp.org/wiki/faq/ctors#static-init-order

Static global constructors are called before main(). But their order is random. If one depends on another, they could be called in the wrong order.

The "iosteam" has this line "static ios_base::Init __ioinit;". If it looks familiar, you've seen it in the backtrace.

For now the main question is why it fails only with UT99 but not others. I suspect that the problem may be linked to ldopen() usage and not loading libstdc++ by the application. Unfortunately I'm having problem finding simple sample demo programs that does that. (And even then, there might be something more to it.)
Comment 7 Gustaw Smolarczyk 2018-12-08 15:55:38 UTC
Static initialization order is undefined between translation units (i.e. source files) but it is defined within one translation unit - it is the global variable definition order. Since #include <iostream> defines (not declares) a static global variable with initializer, you can safely use std::cout and friends from other static initializers that are defined after the <iostream> include.

The segfault looks like a mismatch in the standard library. Does UT99 use the /usr/lib/libstdc++.so.6 or does it use a local version? If it's the latter, what happens when you force it to use the distro one?
Comment 8 iive 2018-12-08 19:33:50 UTC
(In reply to Gustaw Smolarczyk from comment #7)
> Static initialization order is undefined between translation units (i.e.
> source files) but it is defined within one translation unit - it is the
> global variable definition order. Since #include <iostream> defines (not
> declares) a static global variable with initializer, you can safely use
> std::cout and friends from other static initializers that are defined after
> the <iostream> include.

I've already tried placing "#include <locale>" before "iostream", but it has no effect. I retested it to be sure.


> The segfault looks like a mismatch in the standard library. Does UT99 use
> the /usr/lib/libstdc++.so.6 or does it use a local version? If it's the
> latter, what happens when you force it to use the distro one?

The game does not link to dynamic libstdc++ , most likely it had been statically linked. The binaries are dated from 2006.

The binaries themselves could be obtained from the freely available linux installer(s), but they do not contain enough to reproduce the crash. (You need some of the game files).
Comment 9 Gustaw Smolarczyk 2018-12-08 20:32:16 UTC
(In reply to iive from comment #8)
> I've already tried placing "#include <locale>" before "iostream", but it has
> no effect. I retested it to be sure.

That won't do anything. The locale stuff is handled by libstdc++.so itself, the include order in the mesa source file doesn't matter.

> The game does not link to dynamic libstdc++ , most likely it had been
> statically linked. The binaries are dated from 2006.
> 
> The binaries themselves could be obtained from the freely available linux
> installer(s), but they do not contain enough to reproduce the crash. (You
> need some of the game files).

The Core.so binary seems to export the __dynamic_cast symbol. It suggests that it has been statically linked with some old libstdc++ library that is incompatible with the most recent one.

It might be impossible to run it correctly with any library written in C++. Mesa and LLVM usually avoid using RTTI, so the <iostream> might be the only thing that struggles. However, that is still a work-around. Some other driver might still not work correctly.

I am not sure if removing the iostream sub-library usage from mesa is acceptable in general.
Comment 10 iive 2018-12-08 22:41:05 UTC
(In reply to Gustaw Smolarczyk from comment #9)
> The Core.so binary seems to export the __dynamic_cast symbol. It suggests
> that it has been statically linked with some old libstdc++ library that is
> incompatible with the most recent one.

You solved the mystery!

It makes sense since this is the last called function in the disassembly.

I can confirm that the issue does go away after changing the string "__dynamic_cast" to "__dynamicZcast" in Core.so .

Can you recommend a more clean way to remove that?


> It might be impossible to run it correctly with any library written in C++.
> Mesa and LLVM usually avoid using RTTI, so the <iostream> might be the only
> thing that struggles. However, that is still a work-around. Some other
> driver might still not work correctly.

How about versioning __dynamic_cast?
If it behaves differently in different version...

Another solution might be linking Mesa plugins statically to the g++ listdc++. I'm not sure if this is supported atm. It would have been very useful when Steam used older version.


> I am not sure if removing the iostream sub-library usage from mesa is
> acceptable in general.

Mesa3D is mostly written in C. There are few parts in C++ and it seems that the files I've patched are the only ones using "iostream". Since iostream is used only for debugging, it is feasible to disable it on release builds.

But as I've said before, I don't know what LLVM compiled with latest libstdc++ would do.
Comment 11 iive 2018-12-08 23:07:20 UTC
(In reply to iive from comment #10)

> I can confirm that the issue does go away after changing the string
> "__dynamic_cast" to "__dynamicZcast" in Core.so .

That doesn't work. It is enough for the game to load, render the intro and let you in the menu. When you start an actual game however, the game exits with "undefined symbol" error.
Comment 12 Gustaw Smolarczyk 2018-12-09 01:51:55 UTC
(In reply to iive from comment #10)
> You solved the mystery!
> 
> It makes sense since this is the last called function in the disassembly.
> 
> I can confirm that the issue does go away after changing the string
> "__dynamic_cast" to "__dynamicZcast" in Core.so .
> 
> Can you recommend a more clean way to remove that?

As you have already found, that doesn't work unless the symbol is unused.

You could try patching all of the binaries that reference __dynamic_cast, but I can't promise it would work correctly in the end.

> How about versioning __dynamic_cast?
> If it behaves differently in different version...

That's a question for libstdc++ developers. It's possible that UT was statically linked against libstdc++.so.5 which is completely incompatible with libstdc++.so.6 that is used today. If it wasn't, that might imply there is some kind of a bug in compatibility between different libstdc++.so.6 versions. Also, I recall there being strange stuff going on with RTTI and static libstdc++ (like dynamic casts not working correctly across libraries), though I don't think they would end up with a crash...

> Another solution might be linking Mesa plugins statically to the g++
> listdc++. I'm not sure if this is supported atm. It would have been very
> useful when Steam used older version.

Right, but I don't think it's currently supported. It would increase the disk and memory usage for everything that uses mesa. You would also need to do the same for LLVM, unless you use a driver that doesn't need it (like i965).

Maybe just adding -static-libstdc++ to the linker options would suffice. It is currently used for scons on Windows build.

> Mesa3D is mostly written in C. There are few parts in C++ and it seems that
> the files I've patched are the only ones using "iostream". Since iostream is
> used only for debugging, it is feasible to disable it on release builds.

<iostream> might not be the only include that you need to be wary about. It might be that any ios thing is dangerous, like fstream.

Making mesa compatible with applications that link statically against any libstdc++ might be desirable, but that needs to be discussed. You might want to send your patch to the mailing list [1] if you want to trigger the discussion.

> But as I've said before, I don't know what LLVM compiled with latest
> libstdc++ would do.

I believe recent libstdc++ versions are compatible with each other. Moreover, I think LLVM is 99% of the time linked against /usr/lib/libstdc++.so.6, so it will use the most recent version all the time, even if it was compiled with an older version.

On a slightly unrelated topic, UT99 seems to work fine (at least for me) while run on wine (or Steam's proton). You might want to try this path as a work-around.

[1] https://www.mesa3d.org/submittingpatches.html
Comment 13 network723 2018-12-09 09:03:01 UTC
(In reply to Gustaw Smolarczyk from comment #12)
> On a slightly unrelated topic, UT99 seems to work fine (at least for me)
> while run on wine (or Steam's proton). You might want to try this path as a
> work-around.

On my system, UT stops working in wine after you run it few time, until next reboot. This seems to be unrelated to mesa, as it behaves like this with any renderer, including software and third-party d3d9 one paired with nine.
(Just a side note, I don't really care about playing it. But I think that checking if older native programs still work is generally a good idea)
Comment 14 iive 2018-12-12 17:28:44 UTC
I'm still puzzled by a number of things related to this bug and the export of "__dynamic_cast".

If I rename it to "__dyn_cast_old" in Core.so, the game exits with "undefined __dyn_cast_old". It does so during the start of "some" game levels. It turned out this might be related to lazy linking.

Using "LD_BIND_NOW=1" causes the "undefined" error to happen at load time.


Now you said that it is an export. So I assumed that some of the other modules might use it, but the original string is available only in two other modules and they are NullDrv.so and NullRender.so. These are not (usually) used by the game and are not loaded. Making the same change in them makes no difference.

Nothing else in the game contains the string "dynamic_cast"... I'm starting to suspect some UUID or hashes might be involved.

objdump -dtT Core.so , seems to show __dynamic_cast@PLT and __dynamic_cast@@BASE function that contains executable code, probably from the static library.


Playing with LD_DEBUG=all, it seems like if libstdc++ is loaded at some point, the __dynamic_cast of that library would be used.
Here is excerpts from the log: 

---
      6449:     symbol=__dynamic_cast;  lookup in file=./ut-bin [0]
      6449:     symbol=__dynamic_cast;  lookup in file=/lib/libdl.so.2 [0]
      6449:     symbol=__dynamic_cast;  lookup in file=/lib/libnsl.so.1 [0]
      6449:     symbol=__dynamic_cast;  lookup in file=/lib/libpthread.so.0 [0]
      6449:     symbol=__dynamic_cast;  lookup in file=./Engine.so [0]
      6449:     symbol=__dynamic_cast;  lookup in file=./Core.so [0]
      6449:     binding file ./Core.so [0] to ./Core.so [0]: normal symbol `__dynamic_cast'
[...]
      6449:     binding file /usr/lib/libstdc++.so.6 [0] to /usr/lib/libstdc++.so.6 [0]: normal symbol `_ZNSo9_M_insertIxEERSoT_' [GLIBCXX_3.4.9]
      6449:     symbol=__dynamic_cast;  lookup in file=./ut-bin [0]
      6449:     symbol=__dynamic_cast;  lookup in file=/lib/libdl.so.2 [0]
      6449:     symbol=__dynamic_cast;  lookup in file=/lib/libnsl.so.1 [0]
      6449:     symbol=__dynamic_cast;  lookup in file=/lib/libpthread.so.0 [0]
      6449:     symbol=__dynamic_cast;  lookup in file=./Engine.so [0]
      6449:     symbol=__dynamic_cast;  lookup in file=./Core.so [0]
      6449:     binding file /usr/lib/libstdc++.so.6 [0] to ./Core.so [0]: normal symbol `__dynamic_cast' [CXXABI_1.3]
---

The last part is most interesting to me, because it points that the symbol is with specific (old) API. Meaning that it should already be versioned.
Or rather that this function should be used only with this ABI.

I also confirmed that the Core.so __dynamic_cast is used, with tracing the assembly call with gdb. It is Core.so variant that tries to execute code at address that is not mapped in memory (that's why valgrind says illegal instruction, all memory access with it succeeds).

I also tried `LD_PRELOAD=/usr/lib/libstdc++.so.6 ./ut-bin` . It lets the game start, but it segfaults when you start certain levels (aka when the game uses __dynamic_cast() on its own, just like with the lazy binding).

I may try to bisect... if I manage to clone the repo, compile working version, etc... 
IMHO this is gcc/libstdc++ regression and it should be reported to them.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.