Bug 66955 - Running the game FTL with LIBGL_ALWAYS_INDIRECT=y set causes the Xserver to crash.
Summary: Running the game FTL with LIBGL_ALWAYS_INDIRECT=y set causes the Xserver to c...
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-16 09:54 UTC by Phil Armstrong
Modified: 2013-07-18 10:54 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
gdb backtrace of all threads (4.22 KB, text/plain)
2013-07-16 10:26 UTC, Phil Armstrong
Details
Backtrace with dri symbols (5.03 KB, text/plain)
2013-07-16 10:42 UTC, Phil Armstrong
Details

Description Phil Armstrong 2013-07-16 09:54:53 UTC
System: Radeon HD 5770, AMD Phenom II. Debian Linux kernel 3.9.8, mesa 9.1.4 libdrm-radeon 2.4.45 xserver-xorg-video-radeon 6.14.4

Running the Linux version of the game FTL causes the Xserver to segfault.

The backtrace I get is:

Backtrace:
0: /usr/bin/Xorg (xorg_backtrace+0x36) [0x7fba2ad6dd06]
1: /usr/bin/Xorg (0x7fba2abef000+0x182859) [0x7fba2ad71859]
2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fba29f14000+0xf210) [0x7fba29f23210]
3: /usr/lib/x86_64-linux-gnu/dri/r600_dri.so (0x7fba256e5000+0x10c7a7) [0x7fba257f17a7]
4: /usr/lib/xorg/modules/extensions/libglx.so (0x7fba273a2000+0xddb1) [0x7fba273afdb1]
5: /usr/lib/xorg/modules/extensions/libglx.so (0x7fba273a2000+0x3c223) [0x7fba273de223]
6: /usr/bin/Xorg (0x7fba2abef000+0x52e61) [0x7fba2ac41e61]
7: /usr/bin/Xorg (0x7fba2abef000+0x41ec5) [0x7fba2ac30ec5]
8: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf5) [0x7fba28b9f995]
9: /usr/bin/Xorg (0x7fba2abef000+0x4219d) [0x7fba2ac3119d]

Segmentation fault at address 0x2d6a83b0

I'll see if I can get a better backtrace by installing the dbg packages.

No errors appear in the dmesg output - this appears to be a userspace crash.
Comment 1 Phil Armstrong 2013-07-16 10:25:43 UTC
Turns out the real problem is that FTL bundles a version of libstdc++ that the DRI drivers won't link against.

It looks like the net result is that *no* DRI drivers (not even swrast) can be loaded, and the Xserver dies when trying to invoke the first GLX call.

Here's the output of the program with LIBGL_DEBUG=verbose :

$ cat libgl_debug-output2.txt 
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/r600_dri.so
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
libGL error: dlopen /usr/lib/x86_64-linux-gnu/dri/r600_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1))
libGL: OpenDriver: trying ${ORIGIN}/dri/tls/r600_dri.so
libGL: OpenDriver: trying ${ORIGIN}/dri/r600_dri.so
libGL error: dlopen ${ORIGIN}/dri/r600_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1))
libGL: OpenDriver: trying /usr/lib/dri/tls/r600_dri.so
libGL: OpenDriver: trying /usr/lib/dri/r600_dri.so
libGL error: dlopen /usr/lib/dri/r600_dri.so failed (/usr/lib/dri/r600_dri.so: cannot open shared object file: No such file or directory)
libGL error: unable to load driver: r600_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: r600
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
libGL error: dlopen /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1))
libGL: OpenDriver: trying ${ORIGIN}/dri/tls/swrast_dri.so
libGL: OpenDriver: trying ${ORIGIN}/dri/swrast_dri.so
libGL error: dlopen ${ORIGIN}/dri/swrast_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1))
libGL: OpenDriver: trying /usr/lib/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib/dri/swrast_dri.so
libGL error: dlopen /usr/lib/dri/swrast_dri.so failed (/usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory)
libGL error: unable to load driver: swrast_dri.so
libGL error: failed to load driver: swrast
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":3"
      after 10305 requests (10305 known processed) with 0 events remaining.


If I remove the bundled libstdc++.so & use the system one then everything works as expected.

Obviously this is still an Xorg crash bug though: the server ought not to crash if a userspace program fails to load a glx driver!
Comment 2 Phil Armstrong 2013-07-16 10:26:21 UTC
Created attachment 82474 [details]
gdb backtrace of all threads

Backtrace
Comment 3 Phil Armstrong 2013-07-16 10:42:49 UTC
Created attachment 82475 [details]
Backtrace with dri symbols
Comment 4 Phil Armstrong 2013-07-16 10:47:19 UTC
Changed title to reflect underlying bug: I don't believe that the user should be able to make the XServer process segfault by substituting the wrong libstdc++ library when running an ordinary user process.
Comment 5 Marek Olšák 2013-07-16 13:17:51 UTC
The solution is simple: don't use an older libstdc++. Lots of closed source apps do that, which breaks them if the Mesa driver was linked against a newer version. There is nothing we can do about that.
Comment 6 Marek Olšák 2013-07-16 13:23:34 UTC
Oh and by the way, the X crash seems to be caused by indirect rendering, which had been broken according to what Keith Packard said at XDC2012, IIRC. The issue will be trivially resolved by nuking indirect rendering if I understood Keith's plan correctly, leaving you with no rendering whatsoever and an even stronger incentive to delete the old libstdc++.
Comment 7 Phil Armstrong 2013-07-16 14:07:34 UTC
(In reply to comment #5)
> The solution is simple: don't use an older libstdc++. Lots of closed source
> apps do that, which breaks them if the Mesa driver was linked against a
> newer version. There is nothing we can do about that.

Obviously the solution is simple: I've already implemented it.

But seriously? You think a hard Xserver crash caused by a userspace client is NotABug?

I don't believe that it's reasonable for the Xserver to crash in this fashion: Refuse to run the application? Sure. Fall back to software rendering? Why not. Segfault and kill the entire desktop? That doesn't seem very user friendly to me frankly.
Comment 8 Marek Olšák 2013-07-16 14:28:59 UTC
Sorry for my hastiness. If you have a fix, that's great! Feel free to send it to the appropriate mailing list. Thanks!
Comment 9 Phil Armstrong 2013-07-16 14:58:58 UTC
Sadly I was only referring to the 'dump the old libstdc++' solution to the immediate problem of the Xserver crashing. I don't have a patch at this point.

If the reality is 'this problem is real, but will be going away in the next Xorg release because that rendering pipeline is going away' so it's a WONTFIX (or rather a WILLFIXINLATERRELEASE perhaps), then maybe that's ok. I do (personally) think crashing is always a bug however: the user shouldn't be able to crash the Xserver like this. (Segfaults smell of potential security issues too, even if in reality anyone with the level of access required to trigger this one can probably punch any number of holes in the system.)

I'll take a look at the code and see what's going on, but if it turns out that the code in question is going away anyway then maybe it should be simply ripped out forthwith if it crashes like this?
Comment 10 Ian Romanick 2013-07-17 17:35:11 UTC
Is the libstdc++ that the Xserver sees replaced?  If the server is picking up the wrong libstdc++... yeah, don't do that.  You wouldn't replace parts of your car engine with random parts from a different model, would you? :)

If the driver loaded by the server is still using the correct system libstdc++, it should work fine.  Two things to try:

1. Try running the app with LIBGL_ALWAYS_INDIRECT=y.  Does it still crash the server?

2. Try collecting an apitrace of the application.  You'll probably have to run it via ssh or from the console.  Otherwise everything will be forcibly killed when the server dies, and you won't get a trace.  It might be sketchy any way.  Does replaying the trace on an unmangled system still crash the server?
Comment 11 Phil Armstrong 2013-07-17 22:41:31 UTC
<i>Is the libstdc++ that the Xserver sees replaced?  If the server is picking up the wrong libstdc++... yeah, don't do that.  You wouldn't replace parts of your car engine with random parts from a different model, would you? :)</i>

Nope, the Xserver is being linked against the system libstdc++ - it's being launched by gdm3 in a completely stock fashion.

The only place the older libstdc++ is being used is when the binary in question is run: the shell script wrapper sets LD_LIBRARY_PATH to point to a directory of support libs, including the old libstdc++. I'm running it from a terminal which in turn is running on the desktop of the original Xsession launched by gdm3.

If you look at the error messages from the program, it appears that the r600_dri.so (or any of the other mesa drivers) can't load as a result, because they're trying to link against the old libstdc++ (thanks to the LD_LIBRARY_PATH). I suspect the Xserver crashes because it tries to call into them anyway, despite the fact that the dlopen() call failed.

I'll try the INDIRECT thing in the morning, if I get a chance. I doubt the API trace will kill the Xserver, because removing the old libstdc++ from the LD_LIBRARY_PATH of the binary works just fine, although I suppose the binary could be looking at GL features and changing it's behaviour depending on what's available: this is doubtful though as the openGL usage is very basic. It's just texture blits and scaling from watching the program in action. Can't hurt to try of course!
Comment 12 Phil Armstrong 2013-07-17 22:44:28 UTC
NB. To put this another way, why is the Xserver letting a userspace program decide which libraries it should link it's own glx drivers against? Isn't that asking for trouble?
Comment 13 Alan Coopersmith 2013-07-17 22:57:14 UTC
(In reply to comment #12)
> NB. To put this another way, why is the Xserver letting a userspace program
> decide which libraries it should link it's own glx drivers against?

It shouldn't, unless that program is doing something like ldconfig to change
the global linker configuration underneath the X server - the X server relies
on the system loader & dlopen() to find its libraries.
Comment 14 Phil Armstrong 2013-07-17 23:07:33 UTC
Given that the program is being run as an ordinary unprivileged user, it shouldn't be playing games with ldconfig.
Comment 15 Ian Romanick 2013-07-17 23:23:12 UTC
(In reply to comment #14)
> Given that the program is being run as an ordinary unprivileged user, it
> shouldn't be playing games with ldconfig.

It seems unlikely that it is, and that's why I've asked for those tests.  Removing the actual application and the old libstdc++ from the equation (by using the apitrace with force indirect-rendering) will confirm whether or not this is a legit Xserver (or Mesa driver) bug or a system configuration issue.
Comment 16 Phil Armstrong 2013-07-18 06:55:24 UTC
So, running the Xserver as usual (ie, unchanged from stock install) & running FTL linked against the old libstdc++ but with LIBGL_ALWAYS_INDIRECT=y causes the Xserver to crash as before.

I'm fairly sure that the setting is having an effect, because if I also set LIBGL_DEBUG=verbose, I don't get any extra output, whereas if I just set LIBGL_DEBUG & not LIBGL_ALWAYS_INDIRECT, I get the expected debugging output as seen above.

I'm installing apitrace as I type.
Comment 17 Phil Armstrong 2013-07-18 08:02:23 UTC
Replaying the crashing trace recorded by apitrace does not cause the Xserver to crash, which seems unsurprising since everything is fine if the binary in question is linked against the system libstdc++ instead of the older bundled one. During the replay, we're recreating the latter situation, so it seems consistent that the Xserver is fine.
Comment 18 Michel Dänzer 2013-07-18 08:49:57 UTC
(In reply to comment #17)
> Replaying the crashing trace recorded by apitrace does not cause the Xserver
> to crash, [...]

What if you replay it with LIBGL_ALWAYS_INDIRECT=y?

I think this is just a normal indirect rendering bug, and libstdc++ only matters insofar as the bad one causes the app to fall back to indirect rendering.
Comment 19 Phil Armstrong 2013-07-18 09:19:08 UTC
Ah, I'd missed that case!
Comment 20 Phil Armstrong 2013-07-18 09:29:29 UTC
(Hit post accidentally there)

OK, so if I just set LIBGL_ALWAYS_INDIRECT, and link the binary against the usual system libraries, not the bundled ones (verified with ldd & running the binary directly, not via any shellscripts) then the Xserver crashes.

Replaying the trace doesn't seem the trigger the crash though. It claims that the final call is 'incomplete' so perhaps I'm missing some crucial data?
Comment 21 Phil Armstrong 2013-07-18 09:38:51 UTC
Running apitrace on the trace generated by running

$ DISPLAY=:0.0 LIBGL_ALWAYS_INDIRECT=y apitrace trace ./amd64/bin/FTL

from an ssh shell (which kills the Xserver)

gives me the following output when I replay the trace:

$ LIBGL_ALWAYS_INDIRECT=y apitrace replay FTL.trace 
apitrace: warning: caught signal 11
11813: error: caught an unhandled exception
apitrace: info: taking default action for signal 11

but the Xserver remains live. The trace is 85Mb or so.
Comment 22 Phil Armstrong 2013-07-18 09:41:01 UTC
tail of the trace dump is as follows:

11773 glGenTextures(n = 1, textures = &1870)
11774 glBindTexture(target = GL_TEXTURE_2D, texture = 1870)
11775 glTexImage2D(target = GL_TEXTURE_2D, level = 0, internalformat = GL_RGBA, width = 32, height = 32, border = 0, format = GL_RGBA, type = GL_UNSIGNED_BYTE, pixels = blob(4096))
11776 glTexParameterf(target = GL_TEXTURE_2D, pname = GL_TEXTURE_MIN_FILTER, param = GL_NEAREST)
11777 glTexParameterf(target = GL_TEXTURE_2D, pname = GL_TEXTURE_MAG_FILTER, param = GL_NEAREST)
11778 glGenLists(range = 256) = 1
11779 glGenTextures(n = 256, textures = ?) // incomplete
Comment 23 Phil Armstrong 2013-07-18 09:46:20 UTC
Sorry: the mismatch in numbers is because the replay came from a different dump. Running it on the dump I posted gives the expected

$ ~/Code/apitrace/build/apitrace replay FTL.trace 
apitrace: warning: caught signal 11
11779: error: caught an unhandled exception
apitrace: info: taking default action for signal 11


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.