Summary: | llvm symbols leak through, cause trouble with software rendering in llvm-linked software | ||
---|---|---|---|
Product: | Mesa | Reporter: | Tobias Schlüter <tobi> |
Component: | Drivers/Gallium/llvmpipe | Assignee: | mesa-dev |
Status: | RESOLVED MOVED | QA Contact: | mesa-dev |
Severity: | normal | ||
Priority: | medium | CC: | 0xe2.0x9a.0x9b, jfonseca, vedran |
Version: | 10.1 | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | Test-case for global symbol contamination due to LLVM dependency. |
Description
Tobias Schlüter
2015-11-25 15:58:25 UTC
I said over and over again that building Mesa drivers with shared LLVM library (as opposed to statically linked) was a bad idea. You need to build with --disable-llvm-shared-libs and lobby the Linux distributions to not do it. In addition to that, we probably also need to use a LD version script to ensure that LLVM symbols don't pop in the dynamic symbol table. Hmm I'm pretty sure that I removed all of those an year or two ago. And looking at the patches in said report, it seems that it was a problem on their end -> they were not hiding the (should be) internal symbols. The only thing that can remotely cause problems is that we dlopen(RTLD_GLOBAL) the module which internally references LLVM. You can try the following patch, although we cannot get it upstream without some serious work or we'll break a lot of applications. diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c index 8a56385..2c2eef6 100644 --- a/src/glx/dri_common.c +++ b/src/glx/dri_common.c @@ -103,7 +103,7 @@ driOpenDriver(const char *driverName) int len; /* Attempt to make sure libGL symbols will be visible to the driver */ - glhandle = dlopen(GL_LIB_NAME, RTLD_NOW | RTLD_GLOBAL); + glhandle = dlopen(GL_LIB_NAME, RTLD_NOW | RTLD_LOCAL); libPaths = NULL; if (geteuid() == getuid()) { @@ -131,14 +131,14 @@ driOpenDriver(const char *driverName) snprintf(realDriverName, sizeof realDriverName, "%.*s/tls/%s_dri.so", len, p, driverName); InfoMessageF("OpenDriver: trying %s\n", realDriverName); - handle = dlopen(realDriverName, RTLD_NOW | RTLD_GLOBAL); + handle = dlopen(realDriverName, RTLD_NOW | RTLD_LOCAL); #endif if (handle == NULL) { snprintf(realDriverName, sizeof realDriverName, "%.*s/%s_dri.so", len, p, driverName); InfoMessageF("OpenDriver: trying %s\n", realDriverName); - handle = dlopen(realDriverName, RTLD_NOW | RTLD_GLOBAL); + handle = dlopen(realDriverName, RTLD_NOW | RTLD_LOCAL); } if (handle != NULL) (In reply to Jose Fonseca from comment #1) > In addition to that, we probably also need to use a LD version script to > ensure that LLVM symbols don't pop in the dynamic symbol table. We have those for a while. Atm only the autotools build uses them (hint hint scons). Also note that as of version 3.6, LLVM uses versioned symbols, which may help for this issue. Thank you for the quick response! I'll take the comments to mean that this shouldn't happen with very recent versions of libmesa and llvm. It does happen with fairly recent versions though. We verified on a colleague's system that the same error appears on Ubuntu 15.05 with libmesa 10.5.9 and libllvm 3.6, (there are some differences in the backtrace because we used a slightly simpler way of triggering the bug as documented in the ROOT bug report I was linking to). I would be surprised if Ubuntu uses scons for the build, but I don't know. Backtrace: =========================================================== #5 0x00007f5c3a87de31 in llvm::cl::AddLiteralOption(llvm::cl::Option&, char const*) () from /home/ritter/belle2/externals/development/Linux_x86_64/opt/root/lib/libCling.so #6 0x00007f5c2d944e3a in llvm::PassRegistry::enumerateWith(llvm::PassRegistrationListener*) () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1 #7 0x00007f5c2d6e86a8 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1 #8 0x00007f5c3d4585ba in ?? () from /lib64/ld-linux-x86-64.so.2 #9 0x00007f5c3d4586cb in ?? () from /lib64/ld-linux-x86-64.so.2 #10 0x00007f5c3d45d587 in ?? () from /lib64/ld-linux-x86-64.so.2 #11 0x00007f5c3d458464 in ?? () from /lib64/ld-linux-x86-64.so.2 #12 0x00007f5c3d45c9a3 in ?? () from /lib64/ld-linux-x86-64.so.2 #13 0x00007f5c3bea5fc9 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2 #14 0x00007f5c3d458464 in ?? () from /lib64/ld-linux-x86-64.so.2 #15 0x00007f5c3bea662d in ?? () from /lib/x86_64-linux-gnu/libdl.so.2 #16 0x00007f5c3bea6061 in dlopen () from /lib/x86_64-linux-gnu/libdl.so.2 #17 0x00007f5c3938c47f in cling::DynamicLibraryManager::loadLibrary(std::string const&, bool) () from /home/ritter/belle2/externals/development/Linux_x86_64/opt/root/lib/libCling.so ... ============================================================ I don't agree with Emil that it is a problem on the ROOT side (even though the installed a workaround): one should be able to use libmesa without being afraid that symbols leak through that are not part of libmesa's public interface. (In reply to Tobias Schlüter from comment #5) > Thank you for the quick response! I'll take the comments to mean that this > shouldn't happen with very recent versions of libmesa and llvm. It does > happen with fairly recent versions though. > A "global" (affecting all drivers) version script has been added in mesa circa 10.2. > We verified on a colleague's system that the same error appears on Ubuntu > 15.05 with libmesa 10.5.9 and libllvm 3.6, (there are some differences in > the backtrace because we used a slightly simpler way of triggering the bug > as documented in the ROOT bug report I was linking to). I would be > surprised if Ubuntu uses scons for the build, but I don't know. Pretty much every Linux distributions uses the autotools. Check, if ever in doubt about the exposed symbols. libGL itself $ nm -CD --defined-only /lib/libGL.so | grep -v " gl" 00000000002969c8 b __bss_start 00000000002969c8 d _edata 0000000000297480 b _end 0000000000074c54 T _fini 000000000004e9d0 T _glapi_create_table_from_handle 0000000000017338 T _init DRI module, used by libGL (do check all the "*_dri.so" found in /lib) $ nm -CD --defined-only /lib/xorg/modules/dri/swrast_dri.so 0000000000642e60 T amdgpu_winsys_create 0000000000a62480 B __driDriverExtensions 0000000000073510 T __driDriverGetExtensions_kms_swrast 00000000000735d0 T __driDriverGetExtensions_nouveau 00000000000735f0 T __driDriverGetExtensions_r300 0000000000073610 T __driDriverGetExtensions_r600 0000000000073630 T __driDriverGetExtensions_radeonsi 0000000000073440 T __driDriverGetExtensions_swrast 0000000000073650 T __driDriverGetExtensions_vmwgfx 00000000003fa060 T nouveau_drm_screen_create 000000000062c9b0 T radeon_drm_winsys_create As you can see, nothing from LLVM/Clang is explicitly exported/leaked. > > I don't agree with Emil that it is a problem on the ROOT side (even though > the installed a workaround): one should be able to use libmesa without being > afraid that symbols leak through that are not part of libmesa's public > interface. First and foremost I would encourage you to try with statically linked LLVM as Jose suggested. On the actual issue and who's "doing things wrong" it's a combination of bugs and suboptimal decisions: 1 To avoid pollution of (conflicts in) global namespace, people must make sure that they hide their symbols - as done with the ROOT report. Imho this is a must fix. 2 Mesa should not dlopen its module with RTLD_GLOBAL. This in itself may pollute the global namespace (haven't checked), despite that we've hidden the exported symbols. 3 Using private libraries conflicting(incompatible) with the system ones is a very bad idea. If needed one can 1) static link the private ones in their application or 2) do LD_PRELOAD/LD_LIBRARY_PATH/RPATH magic to ensure the correct libraries are loaded. Fixing this on the client side is trivial (both 1 and 3.1). I'm afraid that fixing the one in mesa is a lot more convoluted than the patch in comment 2 - we tried and quickly had to revert it :-( Not saying that mesa is perfect - on the contrary we should drop the RTLD_GLOBAL hack. Then again others could do their fair share in keeping things sane. I wonder if Tobias is using XLIB-based libGL.so.1, as opposed to the DRI based libGL.so.1. (In reply to Jose Fonseca from comment #7) > I wonder if Tobias is using XLIB-based libGL.so.1, as opposed to the DRI > based libGL.so.1. I fixed that one got around mesa 10.3 - see src/gallium/targets/libgl-xlib/libgl-xlib.sym. I've been a good boy cleaning there up :-P (In reply to Emil Velikov from comment #6) > Check, if ever in doubt about the exposed symbols. > > libGL itself > $ nm -CD --defined-only /lib/libGL.so | grep -v " gl" [...] > > DRI module, used by libGL (do check all the "*_dri.so" found in /lib) > $ nm -CD --defined-only /lib/xorg/modules/dri/swrast_dri.so [...] > > As you can see, nothing from LLVM/Clang is explicitly exported/leaked. My understanding from Tobias description is that the LLVM symbols from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1 are clashing with a custom LLVM build for cling -- https://root.cern.ch/cling-build-instructions The real question is what symbols libLLVM-3.6.so provides, and $ nm -CD --defined-only /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1 shows lot of them. LLVM is used in lots of projects nowadays -- language interpreters, etc. So are OpenGL drivers -- they get loaded in all sort of processes. So when distro decided to use a shared LLVM for the opengl drivers to save a few bytes, they basically gave the finger to everybody who needs to use bleeding edge LLVM and OpenGL... (In reply to Emil Velikov from comment #3) > > In addition to that, we probably also need to use a LD version script to > > ensure that LLVM symbols don't pop in the dynamic symbol table. > We have those for a while. I'm not sure it helps if the problem is is /usr/lib/x86_64-linux-gnu/libLLVM-3.4.so.1 . The only solution is if the system libLLVM-X.X.so an unique symbol version/namespace, or maybe the RTLD_LOCAL as you mentioned. > Atm only the autotools build uses them (hint hint > scons). Sure. I'll look into it. Probably no longer relevant per #8, and I don't know how to tell different libGL.so.1 apart with certainty, but I guess you can do this based on ldd output, so here goes (DRI appears, Xlib doesn't): $ ldd /usr/lib/x86_64-linux-gnu/mesa/libGL.so linux-vdso.so.1 => (0x00007ffe71b89000) libglapi.so.0 => /usr/lib/x86_64-linux-gnu/libglapi.so.0 (0x00007f118cf53000) libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f118cd41000) libXdamage.so.1 => /usr/lib/x86_64-linux-gnu/libXdamage.so.1 (0x00007f118cb3e000) libXfixes.so.3 => /usr/lib/x86_64-linux-gnu/libXfixes.so.3 (0x00007f118c938000) libX11-xcb.so.1 => /usr/lib/x86_64-linux-gnu/libX11-xcb.so.1 (0x00007f118c736000) libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f118c401000) libxcb-glx.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-glx.so.0 (0x00007f118c1ea000) libxcb-dri2.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-dri2.so.0 (0x00007f118bfe5000) libxcb-dri3.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-dri3.so.0 (0x00007f118bde2000) libxcb-present.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-present.so.0 (0x00007f118bbdf000) libxcb-sync.so.1 => /usr/lib/x86_64-linux-gnu/libxcb-sync.so.1 (0x00007f118b9d9000) libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f118b7ba000) libxshmfence.so.1 => /usr/lib/x86_64-linux-gnu/libxshmfence.so.1 (0x00007f118b5b8000) libXxf86vm.so.1 => /usr/lib/x86_64-linux-gnu/libXxf86vm.so.1 (0x00007f118b3b2000) libdrm.so.2 => /usr/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007f118b1a6000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f118af88000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f118ad84000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f118a9bf000) libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f118a7bb000) libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f118a5b5000) /lib64/ld-linux-x86-64.so.2 (0x00007f118d3e0000) $ Concerning #6, from which I learnt a lot, thank you for that, two comments: 1) I wouldn't consider it good practice to redefine LLVM as a "system library" and then blame problems related to it on users 2) I guess I could read your comment as saying that the bug lies with LLVM as they (at least up to version 3.6) don't version their symbols correct (though from a user-experience side there is an issue with libmesa, even if it is inherited) I have pointed the ROOT people to this bug report, I hope they can draw their conclusions from this. We're currently setting up mesa 11 and the latest version of ROOT, and I will report back. ps concerning the choices about who should statically link, I agree with #9 (submitted while I was putting this comment together). (In reply to Jose Fonseca from comment #9) > $ nm -CD --defined-only /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1 > > shows lot of them. > I have the very vague memory that older versions of LLVM exported internal symbols as well. Perhaps this bug is related to that defect ? > So when > distro decided to use a shared LLVM for the opengl drivers to save a few > bytes, they basically gave the finger to everybody who needs to use bleeding > edge LLVM and OpenGL... > Not much we can do there I'm afraid. Distros have their policies and we have the config switch for people to go the route they like. > > (In reply to Emil Velikov from comment #3) > > > In addition to that, we probably also need to use a LD version script to > > > ensure that LLVM symbols don't pop in the dynamic symbol table. > > We have those for a while. > > I'm not sure it helps if the problem is is > /usr/lib/x86_64-linux-gnu/libLLVM-3.4.so.1 . The only solution is if the > system libLLVM-X.X.so an unique symbol version/namespace, or maybe the > RTLD_LOCAL as you mentioned. > Michel mentioned that llvm 3.6 has versioned symbols, although I cannot see any tag in my Archlinux cmake llvm 3.7 build. I do see a very nasty looking RPATH though - $ORIGIN/../lib ... ouch. > > Atm only the autotools build uses them (hint hint > > scons). > > Sure. I'll look into it. Thank you ! (In reply to Tobias Schlüter from comment #10) > Probably no longer relevant per #8, and I don't know how to tell different > libGL.so.1 apart with certainty, but I guess you can do this based on ldd > output, so here goes (DRI appears, Xlib doesn't): > $ ldd /usr/lib/x86_64-linux-gnu/mesa/libGL.so > libglapi.so.0 => /usr/lib/x86_64-linux-gnu/libglapi.so.0 Based on this I'd say > (0x00007f118cf53000) > libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f118cd41000) > libXdamage.so.1 => /usr/lib/x86_64-linux-gnu/libXdamage.so.1 > (0x00007f118cb3e000) > libXfixes.so.3 => /usr/lib/x86_64-linux-gnu/libXfixes.so.3 > (0x00007f118c938000) > libX11-xcb.so.1 => /usr/lib/x86_64-linux-gnu/libX11-xcb.so.1 > (0x00007f118c736000) > libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f118c401000) > libxcb-glx.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-glx.so.0 > (0x00007f118c1ea000) > libxcb-dri2.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-dri2.so.0 > (0x00007f118bfe5000) > libxcb-dri3.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-dri3.so.0 > (0x00007f118bde2000) > libxcb-present.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-present.so.0 > (0x00007f118bbdf000) > libxcb-sync.so.1 => /usr/lib/x86_64-linux-gnu/libxcb-sync.so.1 > (0x00007f118b9d9000) > libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f118b7ba000) > libxshmfence.so.1 => /usr/lib/x86_64-linux-gnu/libxshmfence.so.1 > (0x00007f118b5b8000) > libXxf86vm.so.1 => /usr/lib/x86_64-linux-gnu/libXxf86vm.so.1 > (0x00007f118b3b2000) > libdrm.so.2 => /usr/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007f118b1a6000) > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 > (0x00007f118af88000) > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f118ad84000) > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f118a9bf000) > libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f118a7bb000) > libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 > (0x00007f118a5b5000) > /lib64/ld-linux-x86-64.so.2 (0x00007f118d3e0000) > $ > > Concerning #6, from which I learnt a lot, thank you for that, two comments: Glad to hear. > 1) I wouldn't consider it good practice to redefine LLVM as a "system > library" and then blame problems related to it on users In my view once any library is installed system wide it can be considered "system". As to who and why did that is not a another question for me to answer ;-) > 2) I guess I could read your comment as saying that the bug lies with LLVM > as they (at least up to version 3.6) don't version their symbols correct > (though from a user-experience side there is an issue with libmesa, even if > it is inherited) > That plus the possible RTLD_GLOBAL issue. On the good side I should be looking into our end soonish. > I have pointed the ROOT people to this bug report, I hope they can draw > their conclusions from this. We're currently setting up mesa 11 and the > latest version of ROOT, and I will report back. > > ps concerning the choices about who should statically link, I agree with #9 > (submitted while I was putting this comment together). Great, good luck (in an honest, non sarcastic way). (In reply to Emil Velikov from comment #12) > (In reply to Tobias Schlüter from comment #10) > > I have pointed the ROOT people to this bug report, I hope they can draw > > their conclusions from this. We're currently setting up mesa 11 and the > > latest version of ROOT, and I will report back. > Great, good luck (in an honest, non sarcastic way). Thanks, they seem to work together, at least the simple crash-provoking mechanism fails there. FWIW, IME LLVM's symbol versioning seems to only help if all LLVM libraries involved have versioned symbols. I did verify at one point with gambas that radeonsi using a different LLVM library from gambas worked if both LLVM libraries had versioned symbols but broke if the LLVM library used by gambas didn't have versioned symbols. BTW, the LLVM symbol versioning was introduced by http://www.llvm.org/viewvc/llvm-project?view=revision&revision=214418 , but I'm not sure if that's already active with a cmake build as well or if something needs to be done for that. The commit mentioned addresses only their autoconf (more like gnumake really) build. Current upstream covers both builds with should be identical auto-generated (?) LD version scripts. Small aside (C symbols): on my system llvm 3.7 provides some 130+ that are not part of the API. This is 20% of out of the total ~800 (note: haven't checked for preprocessor macros). Created attachment 128707 [details]
Test-case for global symbol contamination due to LLVM dependency.
I've run into a variant of this bug where when using swrast, symbols from a shared object linked by LLVM (libedit) conflict with symbols from a shared object used by a client application (readline/python). In general opening a shared object with RTLD_GLOBAL appears to propagate that flag to all dependencies of that object, which means that even with LLVM adopting versioned symbols problems may still occur due to LLVM dependencies. I've attached a reduced test-case which shows the issue (llvm-libedit-bug.py). Is it possible to dlopen DRI drivers as RTLD_LOCAL? The original RTLD_LOCAL patch had to be reverted (#79469) due to client software not dlopening libglapi.so, but opening the DRI drivers as RTLD_LOCAL seems like it should still work. Failing that, if building with --disable-llvm-shared-libs fixes the issue, it should either be the default or (preferably) required. Unfortunately I can't seem to get that option to work for me on 62a819184141133478cfdcfa76b62d5bb7e14fd5 with the set of configure options listed below (ldd shows that swrast_dri.so still imports libLLVM-3.9.so.1, and the test case still fails with the same root cause): ./configure --build=x86_64-linux-gnu \ --prefix=/usr \ --includedir=/usr/include \ --mandir=/usr/share/man \ --infodir=/usr/share/info \ --sysconfdir=/etc \ --localstatedir=/var \ --disable-silent-rules \ --libdir=/usr/lib/x86_64-linux-gnu \ --libexecdir=/usr/lib/x86_64-linux-gnu \ --disable-dependency-tracking \ --disable-llvm-shared-libs \ "--with-dri-drivers= nouveau i915 i965 r200 radeon" \ --with-dri-driverdir=/usr/lib/x86_64-linux-gnu/dri \ --with-dri-searchpath=/usr/lib/x86_64-linux-gnu/dri:/usr/lib/dri \ --with-sha1=libmd \ "--with-vulkan-drivers= intel radeon" \ --enable-osmesa \ --enable-texture-float \ --disable-xvmc \ --enable-driglx-direct \ --enable-dri3 \ "--with-egl-platforms=x11 wayland drm" \ --enable-xa \ --enable-opencl \ --enable-opencl-icd \ --enable-gallium-llvm \ --enable-vdpau \ --enable-va \ --enable-gallium-extra-hud \ --enable-lmsensors \ "--with-gallium-drivers= nouveau svga virgl r600 r300 radeonsi swrast" \ ac_cv_path_LLVM_CONFIG=llvm-config-3.9 -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/236. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.