Created attachment 136731 [details] logcat with chrome app crash Hi, while testing Android 8.1 build with latest mesa 17.4.0devel and libdrm 2.4.89 SIGSEGV MAPERR due to NULL pointer dereference happens with Chrome (and Firefox browser) apps, here is logcat extract for chrome --------- beginning of crash ... F libc : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x4 in tid 4851 (RenderThread), pid 4503 (.android.chrome) ... I crash_dump32: obtaining output fd from tombstoned, type: kDebuggerdTombstone ... I /system/bin/tombstoned: received crash request for pid 4503 ... I crash_dump32: performing dump of process 4503 (target tid = 4851) ... F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ... F DEBUG : Build fingerprint: 'Android-x86/android_x86_64/x86_64:8.1.0/OPM1.171019.011/utente12162000:eng/test-keys' ... F DEBUG : Revision: '0' ... F DEBUG : ABI: 'x86' ... F DEBUG : pid: 4503, tid: 4851, name: RenderThread >>> com.android.chrome <<< ... F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x4 ... F DEBUG : Cause: null pointer dereference ... F DEBUG : eax 00000000 ebx cca69cf4 ecx 00000009 edx bfbff004 ... F DEBUG : esi 00000000 edi 00000010 ... F DEBUG : xcs 00000023 xds 0000002b xes 0000002b xfs 0000006b xss 0000002b ... F DEBUG : eip cc78d089 ebp bfbfefb8 esp bfbfefb8 flags 00010286 ... F DEBUG : ... F DEBUG : backtrace: ... F DEBUG : #00 pc 00651089 /system/vendor/lib/dri/i965_dri.so (build_id_length+9) ... F DEBUG : #01 pc 00041ae2 /system/vendor/lib/dri/i965_dri.so (brw_program_binary_init+130) ... F DEBUG : #02 pc 0002dd50 /system/vendor/lib/dri/i965_dri.so (brwCreateContext+864) ... F DEBUG : #03 pc 003d7777 /system/vendor/lib/dri/i965_dri.so (driCreateContextAttribs+663) ... F DEBUG : #04 pc 000112a1 /android/system/vendor/lib/egl/libGLES_mesa.so ... F DEBUG : #05 pc 00004096 /android/system/vendor/lib/egl/libGLES_mesa.so ... F DEBUG : #06 pc 0000ef37 /android/system/lib/libEGL.so Analysing the backtrace with addr2line the NULL pointer is retured at src/util/build_id.c:117 utente@utente-Terabyte:~/oreo-x86_kernel_test$ addr2line -Cfe out/target/product/x86_64/symbols/system/vendor/lib/dri/i965_dri.so 00651089 build_id_length external/mesa/src/util/build_id.c:117 <= here it is 00041ae2 brw_program_binary_init external/mesa/src/mesa/drivers/dri/i965/brw_program_binary.c:53 0002dd50 brw_init_driver_functions external/mesa/src/mesa/drivers/dri/i965/brw_context.c:334 003d7777 driCreateContextAttribs external/mesa/src/mesa/drivers/dri/common/dri_util.c:479 utente@utente-Terabyte:~/oreo-x86_kernel_test$ addr2line -Cfe out/target/product/x86_64/symbols/system/vendor/lib/egl/libGLES_mesa.so 000112a1 dri2_create_context external/mesa/src/egl/drivers/dri2/egl_dri2.c:1332 00004096 eglCreateContext external/mesa/src/egl/main/eglapi.c:767 utente@utente-Terabyte:~/oreo-x86_kernel_test$ addr2line -Cfe out/target/product/x86_64/symbols/system/lib/libEGL.so 0000ef37 eglCreateContext frameworks/native/opengl/libs/EGL/eglApi.cpp:896 Recent changes in src/util/build_id.c may have removed some Android required code, or i965 needs to take into account the NULL case. Thanks for feedback Mauro
I haven't yet found a proper fix bug here's a workaround you can consider to use: https://github.com/intel/external-mesa/commit/df502da74998b45fbdd82455c7c9c98242db6d86 The problem is with the address comparison check in build_id callback (build_id_find_nhdr_callback). This comparison fails on Android when running 32bit. One alternative would be to always add '0x8000' to dlpi_addr during comparison on 32bit but that seems a bit weird.
Regarding the length, I've been also considering if we are missing required build flags somewhere. We are setting it currently in these places: src/intel/Android.vulkan.mk:LOCAL_LDFLAGS += -Wl,--build-id=sha1 Android.mk:MESA_DRI_LDFLAGS := -Wl,--build-id=sha1 It *should* be enough I think .. but maybe not.
Hi Tapani, Thanks a lot for sharing the workaround It seams a regression introduced by 5c98d3825ccbed9054a1bb2de607116b2b31d48b "util: Query build-id by symbol address, not library name". Is Chad Versace already having a look? In the former coding of build_id_find_nhdr_callback() there was a comment: - /* The first object visited by callback is the main program. - * Android's libc returns a NULL pointer for the first executable. - */ - if (info->dlpi_name == NULL) - return 0; and NULL was checked; does last return 0 mean that nothing was done for Android, if Android libc returns systematically NULL? In any case, I think code should also be robust to unconformant libraries and should not crash. Just a question for my knowledge, where does the add '0x8000' to dlpi_addr during comparison on 32bit comes from, is it due to some "Android thing"? Mauro
(In reply to Mauro Rossi from comment #3) > Hi Tapani, > > Thanks a lot for sharing the workaround > > It seams a regression introduced by 5c98d3825ccbed9054a1bb2de607116b2b31d48b > "util: Query build-id by symbol address, not library name". > Is Chad Versace already having a look? As far as I know, no. > In the former coding of build_id_find_nhdr_callback() there was a comment: > > - /* The first object visited by callback is the main program. > - * Android's libc returns a NULL pointer for the first executable. > - */ > - if (info->dlpi_name == NULL) > - return 0; > > and NULL was checked; does last return 0 mean that nothing was done for > Android, if Android libc returns systematically NULL? This callback gets called for binary itself + libraries. This NULL check is not needed anymore since address check is better guard (if it worked ..). It works fine on desktop, both 32bit and 64bit. On Android it also works fine on 64bit. > In any case, I think code should also be robust to unconformant libraries > and should not crash. Agreed, but if it's a linker bug I think then we want to get it fixed in bionic as well. Let's try to avoid duct-taping if possible. > Just a question for my knowledge, where does the add '0x8000' to dlpi_addr > during comparison on 32bit comes from, is it due to some "Android thing"? That seems to be the amount of offset there is so that comparison will work. I haven't figured out why this is.
This commit should fix it properly: https://github.com/me176c-dev/android_external_mesa/commit/07b637bcef660b3bf12d6854153692026ef3c698 Can you test this patch, Mauro? The main problem is that the build-id code doesn't handle the (unusual) case that there is an offset between the base address and the LOAD segment (as indicated by the virtual address in the ELF header). On Android 32-bit, the offset comes from the "relocation_packer"[1] in the Android build system. It seems to re-pack the binary and adds the offset for whatever reason. The ELF header tells us the offset so we just need to look through the header and add the offset to the base address (dlpi_addr) before comparing it to dli_fbase. [1] https://android.googlesource.com/platform/bionic/+/master/tools/relocation_packer
Hi Stephan, with the patch mentioned in comment #5 the problem is resolved, Chrome app and Firefox app are not crashing anymore. Please proceed in having the fix merged in mesa 18.0 cycle Thanks Mauro
Pushed to master. It should find its way to the stable releases in due time. commit 02e2009b929a0f101540b9b55c5f0ed859d1b3be Author: Stephan Gerhold <stephan@gerhold.net> Date: Wed Jan 24 15:13:24 2018 +0100 util/build-id: Fix address comparison for binaries with LOAD vaddr > 0
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.