Summary: | Running the game FTL with LIBGL_ALWAYS_INDIRECT=y set causes the Xserver to crash. | ||
---|---|---|---|
Product: | Mesa | Reporter: | Phil Armstrong <phil> |
Component: | Mesa core | Assignee: | mesa-dev |
Status: | RESOLVED NOTOURBUG | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | cworth, idr |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
gdb backtrace of all threads
Backtrace with dri symbols |
Description
Phil Armstrong
2013-07-16 09:54:53 UTC
Turns out the real problem is that FTL bundles a version of libstdc++ that the DRI drivers won't link against. It looks like the net result is that *no* DRI drivers (not even swrast) can be loaded, and the Xserver dies when trying to invoke the first GLX call. Here's the output of the program with LIBGL_DEBUG=verbose : $ cat libgl_debug-output2.txt libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/r600_dri.so libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/r600_dri.so libGL error: dlopen /usr/lib/x86_64-linux-gnu/dri/r600_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1)) libGL: OpenDriver: trying ${ORIGIN}/dri/tls/r600_dri.so libGL: OpenDriver: trying ${ORIGIN}/dri/r600_dri.so libGL error: dlopen ${ORIGIN}/dri/r600_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1)) libGL: OpenDriver: trying /usr/lib/dri/tls/r600_dri.so libGL: OpenDriver: trying /usr/lib/dri/r600_dri.so libGL error: dlopen /usr/lib/dri/r600_dri.so failed (/usr/lib/dri/r600_dri.so: cannot open shared object file: No such file or directory) libGL error: unable to load driver: r600_dri.so libGL error: driver pointer missing libGL error: failed to load driver: r600 libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/swrast_dri.so libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so libGL error: dlopen /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1)) libGL: OpenDriver: trying ${ORIGIN}/dri/tls/swrast_dri.so libGL: OpenDriver: trying ${ORIGIN}/dri/swrast_dri.so libGL error: dlopen ${ORIGIN}/dri/swrast_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1)) libGL: OpenDriver: trying /usr/lib/dri/tls/swrast_dri.so libGL: OpenDriver: trying /usr/lib/dri/swrast_dri.so libGL error: dlopen /usr/lib/dri/swrast_dri.so failed (/usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory) libGL error: unable to load driver: swrast_dri.so libGL error: failed to load driver: swrast XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":3" after 10305 requests (10305 known processed) with 0 events remaining. If I remove the bundled libstdc++.so & use the system one then everything works as expected. Obviously this is still an Xorg crash bug though: the server ought not to crash if a userspace program fails to load a glx driver! Created attachment 82474 [details]
gdb backtrace of all threads
Backtrace
Created attachment 82475 [details]
Backtrace with dri symbols
Changed title to reflect underlying bug: I don't believe that the user should be able to make the XServer process segfault by substituting the wrong libstdc++ library when running an ordinary user process. The solution is simple: don't use an older libstdc++. Lots of closed source apps do that, which breaks them if the Mesa driver was linked against a newer version. There is nothing we can do about that. Oh and by the way, the X crash seems to be caused by indirect rendering, which had been broken according to what Keith Packard said at XDC2012, IIRC. The issue will be trivially resolved by nuking indirect rendering if I understood Keith's plan correctly, leaving you with no rendering whatsoever and an even stronger incentive to delete the old libstdc++. (In reply to comment #5) > The solution is simple: don't use an older libstdc++. Lots of closed source > apps do that, which breaks them if the Mesa driver was linked against a > newer version. There is nothing we can do about that. Obviously the solution is simple: I've already implemented it. But seriously? You think a hard Xserver crash caused by a userspace client is NotABug? I don't believe that it's reasonable for the Xserver to crash in this fashion: Refuse to run the application? Sure. Fall back to software rendering? Why not. Segfault and kill the entire desktop? That doesn't seem very user friendly to me frankly. Sorry for my hastiness. If you have a fix, that's great! Feel free to send it to the appropriate mailing list. Thanks! Sadly I was only referring to the 'dump the old libstdc++' solution to the immediate problem of the Xserver crashing. I don't have a patch at this point. If the reality is 'this problem is real, but will be going away in the next Xorg release because that rendering pipeline is going away' so it's a WONTFIX (or rather a WILLFIXINLATERRELEASE perhaps), then maybe that's ok. I do (personally) think crashing is always a bug however: the user shouldn't be able to crash the Xserver like this. (Segfaults smell of potential security issues too, even if in reality anyone with the level of access required to trigger this one can probably punch any number of holes in the system.) I'll take a look at the code and see what's going on, but if it turns out that the code in question is going away anyway then maybe it should be simply ripped out forthwith if it crashes like this? Is the libstdc++ that the Xserver sees replaced? If the server is picking up the wrong libstdc++... yeah, don't do that. You wouldn't replace parts of your car engine with random parts from a different model, would you? :) If the driver loaded by the server is still using the correct system libstdc++, it should work fine. Two things to try: 1. Try running the app with LIBGL_ALWAYS_INDIRECT=y. Does it still crash the server? 2. Try collecting an apitrace of the application. You'll probably have to run it via ssh or from the console. Otherwise everything will be forcibly killed when the server dies, and you won't get a trace. It might be sketchy any way. Does replaying the trace on an unmangled system still crash the server? <i>Is the libstdc++ that the Xserver sees replaced? If the server is picking up the wrong libstdc++... yeah, don't do that. You wouldn't replace parts of your car engine with random parts from a different model, would you? :)</i> Nope, the Xserver is being linked against the system libstdc++ - it's being launched by gdm3 in a completely stock fashion. The only place the older libstdc++ is being used is when the binary in question is run: the shell script wrapper sets LD_LIBRARY_PATH to point to a directory of support libs, including the old libstdc++. I'm running it from a terminal which in turn is running on the desktop of the original Xsession launched by gdm3. If you look at the error messages from the program, it appears that the r600_dri.so (or any of the other mesa drivers) can't load as a result, because they're trying to link against the old libstdc++ (thanks to the LD_LIBRARY_PATH). I suspect the Xserver crashes because it tries to call into them anyway, despite the fact that the dlopen() call failed. I'll try the INDIRECT thing in the morning, if I get a chance. I doubt the API trace will kill the Xserver, because removing the old libstdc++ from the LD_LIBRARY_PATH of the binary works just fine, although I suppose the binary could be looking at GL features and changing it's behaviour depending on what's available: this is doubtful though as the openGL usage is very basic. It's just texture blits and scaling from watching the program in action. Can't hurt to try of course! NB. To put this another way, why is the Xserver letting a userspace program decide which libraries it should link it's own glx drivers against? Isn't that asking for trouble? (In reply to comment #12) > NB. To put this another way, why is the Xserver letting a userspace program > decide which libraries it should link it's own glx drivers against? It shouldn't, unless that program is doing something like ldconfig to change the global linker configuration underneath the X server - the X server relies on the system loader & dlopen() to find its libraries. Given that the program is being run as an ordinary unprivileged user, it shouldn't be playing games with ldconfig. (In reply to comment #14) > Given that the program is being run as an ordinary unprivileged user, it > shouldn't be playing games with ldconfig. It seems unlikely that it is, and that's why I've asked for those tests. Removing the actual application and the old libstdc++ from the equation (by using the apitrace with force indirect-rendering) will confirm whether or not this is a legit Xserver (or Mesa driver) bug or a system configuration issue. So, running the Xserver as usual (ie, unchanged from stock install) & running FTL linked against the old libstdc++ but with LIBGL_ALWAYS_INDIRECT=y causes the Xserver to crash as before. I'm fairly sure that the setting is having an effect, because if I also set LIBGL_DEBUG=verbose, I don't get any extra output, whereas if I just set LIBGL_DEBUG & not LIBGL_ALWAYS_INDIRECT, I get the expected debugging output as seen above. I'm installing apitrace as I type. Replaying the crashing trace recorded by apitrace does not cause the Xserver to crash, which seems unsurprising since everything is fine if the binary in question is linked against the system libstdc++ instead of the older bundled one. During the replay, we're recreating the latter situation, so it seems consistent that the Xserver is fine. (In reply to comment #17) > Replaying the crashing trace recorded by apitrace does not cause the Xserver > to crash, [...] What if you replay it with LIBGL_ALWAYS_INDIRECT=y? I think this is just a normal indirect rendering bug, and libstdc++ only matters insofar as the bad one causes the app to fall back to indirect rendering. Ah, I'd missed that case! (Hit post accidentally there) OK, so if I just set LIBGL_ALWAYS_INDIRECT, and link the binary against the usual system libraries, not the bundled ones (verified with ldd & running the binary directly, not via any shellscripts) then the Xserver crashes. Replaying the trace doesn't seem the trigger the crash though. It claims that the final call is 'incomplete' so perhaps I'm missing some crucial data? Running apitrace on the trace generated by running $ DISPLAY=:0.0 LIBGL_ALWAYS_INDIRECT=y apitrace trace ./amd64/bin/FTL from an ssh shell (which kills the Xserver) gives me the following output when I replay the trace: $ LIBGL_ALWAYS_INDIRECT=y apitrace replay FTL.trace apitrace: warning: caught signal 11 11813: error: caught an unhandled exception apitrace: info: taking default action for signal 11 but the Xserver remains live. The trace is 85Mb or so. tail of the trace dump is as follows: 11773 glGenTextures(n = 1, textures = &1870) 11774 glBindTexture(target = GL_TEXTURE_2D, texture = 1870) 11775 glTexImage2D(target = GL_TEXTURE_2D, level = 0, internalformat = GL_RGBA, width = 32, height = 32, border = 0, format = GL_RGBA, type = GL_UNSIGNED_BYTE, pixels = blob(4096)) 11776 glTexParameterf(target = GL_TEXTURE_2D, pname = GL_TEXTURE_MIN_FILTER, param = GL_NEAREST) 11777 glTexParameterf(target = GL_TEXTURE_2D, pname = GL_TEXTURE_MAG_FILTER, param = GL_NEAREST) 11778 glGenLists(range = 256) = 1 11779 glGenTextures(n = 256, textures = ?) // incomplete Sorry: the mismatch in numbers is because the replay came from a different dump. Running it on the dump I posted gives the expected $ ~/Code/apitrace/build/apitrace replay FTL.trace apitrace: warning: caught signal 11 11779: error: caught an unhandled exception apitrace: info: taking default action for signal 11 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.