Summary: | [PATCH][regression][bisect] Xorg fails to start after f50aa21456d82c8cb6fbaa565835f1acc1720a5d | ||
---|---|---|---|
Product: | Mesa | Reporter: | Laurent carlier <lordheavym> |
Component: | Drivers/Gallium/swr | Assignee: | mesa-dev |
Status: | RESOLVED FIXED | QA Contact: | mesa-dev |
Severity: | blocker | ||
Priority: | medium | CC: | andyrtr, bero, nick.tenney, timothy.o.rowley |
Version: | 17.2 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
xorg log file with the segfault
Workaround debug output from knob initialization Fix |
Description
Laurent carlier
2017-07-18 23:06:29 UTC
mesa is built with: ./autogen.sh --prefix=/usr \ --sysconfdir=/etc \ --with-dri-driverdir=/usr/lib/xorg/modules/dri \ --with-gallium-drivers=r300,r600,radeonsi,nouveau,svga,swrast,virgl,swr \ --with-dri-drivers=i915,i965,r200,radeon,nouveau,swrast \ --with-platforms=x11,drm,wayland \ --with-vulkan-drivers=intel,radeon \ --disable-xvmc \ --enable-llvm \ --enable-llvm-shared-libs \ --enable-shared-glapi \ --enable-libglvnd \ --enable-libunwind \ --enable-lmsensors \ --enable-egl \ --enable-glx \ --enable-glx-tls \ --enable-gles1 \ --enable-gles2 \ --enable-gbm \ --enable-dri \ --enable-gallium-osmesa \ --enable-gallium-extra-hud \ --enable-texture-float \ --enable-xa \ --enable-vdpau \ --enable-omx \ --enable-nine \ --enable-opencl \ --enable-opencl-icd \ --with-clang-libdir=/usr/lib building without swr fixes the problem Seems like some binary is having unresolved symbols - unw_get_proc_name at least. AFAICT it cannot happen for the DRI module, and since you're not using SWR none of it backends should be attempted, let alone loaded. Please check all the binaries for "undefined symbol" via $ldd -r $binary Thanks! got this: [lordh@lordh-pc lib]$ ldd -r libswrAVX2.so.0.0.0 linux-vdso.so.1 (0x00007ffefe1b9000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f8206c77000) libm.so.6 => /usr/lib/libm.so.6 (0x00007f8206965000) libc.so.6 => /usr/lib/libc.so.6 (0x00007f82065bf000) /usr/lib64/ld-linux-x86-64.so.2 (0x000055844c2fe000) libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f82063a8000) undefined symbol: pthread_create (./libswrAVX2.so.0.0.0) undefined symbol: pthread_setaffinity_np (./libswrAVX2.so.0.0.0) undefined symbol: pthread_setname_np (./libswrAVX2.so.0.0.0) [lordh@lordh-pc lib]$ ldd -r libswrAVX.so.0.0.0 linux-vdso.so.1 (0x00007ffdd86e7000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f6e74166000) libm.so.6 => /usr/lib/libm.so.6 (0x00007f6e73e54000) libc.so.6 => /usr/lib/libc.so.6 (0x00007f6e73aae000) /usr/lib64/ld-linux-x86-64.so.2 (0x0000557dc176b000) libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f6e73897000) undefined symbol: pthread_create (./libswrAVX.so.0.0.0) undefined symbol: pthread_setaffinity_np (./libswrAVX.so.0.0.0) undefined symbol: pthread_setname_np (./libswrAVX.so.0.0.0) Right, so I may have misread the Xorg.log, but at least the AVX binaries will not have unresolved symbols, props to https://patchwork.freedesktop.org/patch/168170/ Possible symbol collision comes to mind, but I'm not working on either SWR or radeonsi :-\ Tim, can you please have a look? The patchset fixes the unresolved symbols, but segfault is still here. I will try to grab a better backtrace. With debug symbols, backtrace is a bit different: [442527.173] (EE) Backtrace: [442527.174] (EE) 0: /usr/lib/xorg-server/Xorg (OsSigHandler+0x2a) [0x5645b1d61fba] [442527.174] (EE) 1: /usr/lib/libpthread.so.0 (funlockfile+0x50) [0x7fe7639f982f] [442527.174] (EE) 2: /usr/lib/libc.so.6 (strlen+0x26) [0x7fe7636c48c6] [442527.175] (EE) 3: /usr/lib/xorg/modules/dri/radeonsi_dri.so (_ZN8KnobBase30autoExpandEnvironmentVariablesERNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x120) [0x7fe75eb34630] [442527.175] (EE) 4: /usr/lib/xorg/modules/dri/radeonsi_dri.so (_ZN11GlobalKnobsC1Ev+0x13b) [0x7fe75eb34fbb] [442527.175] (EE) 5: /usr/lib/xorg/modules/dri/radeonsi_dri.so (_GLOBAL__sub_I_gen_knobs.cpp+0x10) [0x7fe75e3f25f0] [442527.176] (EE) 6: /lib64/ld-linux-x86-64.so.2 (call_init.part.0+0x9a) [0x7fe765c5237a] [442527.176] (EE) 7: /lib64/ld-linux-x86-64.so.2 (_dl_init+0x76) [0x7fe765c52486] [442527.176] (EE) 8: /lib64/ld-linux-x86-64.so.2 (dl_open_worker+0x38e) [0x7fe765c5693e] [442527.177] (EE) 9: /usr/lib/libc.so.6 (_dl_catch_error+0x84) [0x7fe763769e44] [442527.177] (EE) 10: /lib64/ld-linux-x86-64.so.2 (_dl_open+0xca) [0x7fe765c5615a] [442527.177] (EE) unw_get_proc_name failed: no unwind info found [-10] [442527.177] (EE) 11: /usr/lib/libdl.so.2 (?+0xca) [0x7fe7652c3f1a] [442527.177] (EE) 12: /usr/lib/libc.so.6 (_dl_catch_error+0x84) [0x7fe763769e44] [442527.178] (EE) 13: /usr/lib/libdl.so.2 (dlerror+0x2e7) [0x7fe7652c4827] [442527.178] (EE) 14: /usr/lib/libdl.so.2 (dlopen+0x42) [0x7fe7652c3f42] [442527.178] (EE) 15: /usr/lib/libgbm.so.1 (dri_open_driver.isra.5+0x1b4) [0x7fe75fef8984] [442527.178] (EE) 16: /usr/lib/libgbm.so.1 (dri_screen_create_dri2+0x2c) [0x7fe75fef8aac] [442527.178] (EE) 17: /usr/lib/libgbm.so.1 (dri_device_create+0x168) [0x7fe75fef8f28] [442527.178] (EE) 18: /usr/lib/libgbm.so.1 (gbm_create_device+0x57) [0x7fe75fef6e07] [442527.178] (EE) 19: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (_init+0x7ffd) [0x7fe7603218dd] [442527.179] (EE) 20: /usr/lib/xorg-server/Xorg (InitOutput+0xb10) [0x5645b1c3edc0] [442527.179] (EE) 21: /usr/lib/xorg-server/Xorg (dix_main+0x1e2) [0x5645b1bfbb92] [442527.179] (EE) 22: /usr/lib/libc.so.6 (__libc_start_main+0xea) [0x7fe7636624ca] [442527.179] (EE) 23: /usr/lib/xorg-server/Xorg (_start+0x2a) [0x5645b1be553a] [442527.179] (EE) [442527.179] (EE) Segmentation fault at address 0x0 [442527.179] (EE) Fatal server error: [442527.179] (EE) Caught signal 11 (Segmentation fault). Server aborting [442527.179] (EE) [442527.179] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. This is unrelated to the radeonsi driver -- the exact same commit causes a similar crash here on a laptop with an Intel and Nouveau GPU. (EE) Backtrace: (EE) 0: /usr/libexec/Xorg (xorg_backtrace+0x33) [0x56095d] (EE) 1: /usr/libexec/Xorg (0x400000+0x164414) [0x564414] (EE) 2: /lib64/libpthread.so.0 (0x3547e00000+0xfb70) [0x3547e0fb70] (EE) 3: /lib64/libc.so.6 (0x3547a00000+0x113131) [0x3547b13131] (EE) 4: /usr/lib64/dri/nouveau_dri.so (0x7f8127608000+0xa74c4e) [0x7f812807cc4e] (EE) 5: /usr/lib64/dri/nouveau_dri.so (0x7f8127608000+0xa752e1) [0x7f812807d2e1] (EE) 6: /usr/lib64/dri/nouveau_dri.so (0x7f8127608000+0x77ff0) [0x7f812767fff0] (EE) 7: /lib64/ld-linux-x86-64.so.2 (0x3547600000+0xc0c9) [0x354760c0c9] (EE) 8: /lib64/ld-linux-x86-64.so.2 (0x3547600000+0xc1d0) [0x354760c1d0] (EE) 9: /lib64/ld-linux-x86-64.so.2 (0x3547600000+0xf6dc) [0x354760f6dc] (EE) 10: /lib64/libc.so.6 (_dl_catch_error+0x72) [0x3547aee17b] (EE) 11: /lib64/ld-linux-x86-64.so.2 (0x3547600000+0xedc1) [0x354760edc1] (EE) 12: /lib64/libdl.so.2 (0x3548200000+0x1006) [0x3548201006] (EE) 13: /lib64/libc.so.6 (_dl_catch_error+0x72) [0x3547aee17b] (EE) 14: /lib64/libdl.so.2 (0x3548200000+0x1505) [0x3548201505] (EE) 15: /lib64/libdl.so.2 (dlopen+0x35) [0x3548201043] (EE) 16: /usr/lib64/libgbm.so.1 (0x7f81285af000+0x4ab4) [0x7f81285b3ab4] (EE) 17: /usr/lib64/libgbm.so.1 (0x7f81285af000+0x4bf4) [0x7f81285b3bf4] (EE) 18: /usr/lib64/libgbm.so.1 (0x7f81285af000+0x52a8) [0x7f81285b42a8] (EE) 19: /usr/lib64/libgbm.so.1 (0x7f81285af000+0x2e1b) [0x7f81285b1e1b] (EE) 20: /usr/lib64/libgbm.so.1 (gbm_create_device+0x39) [0x7f81285b1e89] (EE) 21: /usr/lib64/xorg/modules/libglamoregl.so (glamor_egl_init+0x80) [0x7f8128630ca0] (EE) 22: /usr/lib64/xorg/modules/drivers/modesetting_drv.so (0x7f812a2ca000+0x7d72) [0x7f812a2d1d72] (EE) 23: /usr/libexec/Xorg (InitOutput+0x1660) [0x46be93] (EE) 24: /usr/libexec/Xorg (0x400000+0x20de1) [0x420de1] (EE) 25: /lib64/libc.so.6 (__libc_start_main+0x15a) [0x3547a21de3] (EE) 26: /usr/libexec/Xorg (_start+0x2a) [0x420aba] (EE) (EE) Segmentation fault at address 0x0 Created attachment 133549 [details] [review] Workaround This "fixes" it (forward-port of reverting the commit causing the problem, applies cleanly on 17.2.0-rc4) -- but obviously it isn't a perfect fix because it brings back the problems the original commit was meant to solve. Certainly better than X crashing on startup though ;) Created attachment 133557 [details] [review] debug output from knob initialization Looking at Laurent's backtrace, it appears to be a problem with the initialization of the swr knobs structure (global c++ object constructor). Not sure why that ends up crashing. I'm not setup for running an X server with a built dri driverset; if you could try running with the following patch, the messages might help point to what's happening. Thanks. The output of the patch doesn't look too helpful to me: (II) glamor: OpenGL accelerated X.org driver based. SWR_DEBUG env /tmp/Rast/DebugOutput SWR_DEBUG env ${HOME}/.swr/jitcache (EE) (EE) Backtrace: (EE) 0: /usr/libexec/Xorg (xorg_backtrace+0x33) [0x56095d] (EE) 1: /usr/libexec/Xorg (0x400000+0x164414) [0x564414] (EE) 2: /lib64/libpthread.so.0 (0x3210e00000+0xfb70) [0x3210e0fb70] (EE) 3: /lib64/libc.so.6 (0x3210a00000+0x113341) [0x3210b13341] (EE) 4: /usr/lib64/dri/nouveau_dri.so (0x7fa196120000+0xade2d8) [0x7fa196bfe2d8] (EE) 5: /usr/lib64/dri/nouveau_dri.so (0x7fa196120000+0xadec23) [0x7fa196bfec23] (EE) 6: /usr/lib64/dri/nouveau_dri.so (0x7fa196120000+0x74086) [0x7fa196194086] (EE) 7: /lib64/ld-linux-x86-64.so.2 (0x3210600000+0xc0c2) [0x321060c0c2] (EE) 8: /lib64/ld-linux-x86-64.so.2 (0x3210600000+0xc1c9) [0x321060c1c9] (EE) 9: /lib64/ld-linux-x86-64.so.2 (0x3210600000+0xf6d5) [0x321060f6d5] (EE) 10: /lib64/libc.so.6 (_dl_catch_error+0x72) [0x3210aee38b] (EE) 11: /lib64/ld-linux-x86-64.so.2 (0x3210600000+0xedba) [0x321060edba] (EE) 12: /lib64/libdl.so.2 (0x3211200000+0x1006) [0x3211201006] (EE) 13: /lib64/libc.so.6 (_dl_catch_error+0x72) [0x3210aee38b] (EE) 14: /lib64/libdl.so.2 (0x3211200000+0x1505) [0x3211201505] (EE) 15: /lib64/libdl.so.2 (dlopen+0x35) [0x3211201043] (EE) 16: /usr/lib64/libgbm.so.1 (0x371a000000+0x4a34) [0x371a004a34] (EE) 17: /usr/lib64/libgbm.so.1 (0x371a000000+0x5197) [0x371a005197] (EE) 18: /usr/lib64/libgbm.so.1 (0x371a000000+0x2dab) [0x371a002dab] (EE) 19: /usr/lib64/libgbm.so.1 (gbm_create_device+0x44) [0x371a002e14] (EE) 20: /usr/lib64/xorg/modules/libglamoregl.so (glamor_egl_init+0x80) [0x7fa1989efca0] (EE) 21: /usr/lib64/xorg/modules/drivers/modesetting_drv.so (0x7fa198a2d000+0x7d72) [0x7fa198a34d72] (EE) 22: /usr/libexec/Xorg (InitOutput+0x1660) [0x46be93] (EE) 23: /usr/libexec/Xorg (0x400000+0x20de1) [0x420de1] (EE) 24: /lib64/libc.so.6 (__libc_start_main+0x15a) [0x3210a21de3] (EE) 25: /usr/libexec/Xorg (_start+0x2a) [0x420aba] (EE) (EE) Segmentation fault at address 0x0 (EE) Fatal server error: (EE) Caught signal 11 (Segmentation fault). Server aborting This is with 17.2-rc5. Still works correctly with my workaround patch applied. The debug output was more useful than I initially thought ;) The crash happens when getenv() in GetEnv() returns NULL, leading to the std::string constructor getting a NULL constructor. Created attachment 133894 [details] [review] Fix Here's a fix... Probably should (in addition to this) catch the cache directory pointing somewhere invalid though... Obviously sddm, kdm, lightdm, gdm and friends won't have $HOME set when starting X... (In reply to Bernhard Rosenkraenzer from comment #13) > Created attachment 133894 [details] [review] [review] > Fix > > Here's a fix... Probably should (in addition to this) catch the cache > directory pointing somewhere invalid though... > > Obviously sddm, kdm, lightdm, gdm and friends won't have $HOME set when > starting X... Bernhard please send git patches to the list [1]. Do include the following two lines in the commit message. CC: Tim Rowley <timothy.o.rowley@intel.com> Fixes: a25093de718 ("swr/rast: Implement JIT shader caching to disk") [1] https://www.mesa3d.org/submittingpatches.html Fixed with commit 21e271024d8e050b75361c2da2e5783100f2e87b |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.