Bug 104642 - Android: NULL pointer dereference with i965 mesa-dev, seems build_id_length related
Summary: Android: NULL pointer dereference with i965 mesa-dev, seems build_id_length r...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) other
: medium blocker
Assignee: Tapani Pälli
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-15 16:03 UTC by Mauro Rossi
Modified: 2018-02-05 17:04 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
logcat with chrome app crash (1.16 MB, text/plain)
2018-01-15 16:03 UTC, Mauro Rossi
Details

Description Mauro Rossi 2018-01-15 16:03:06 UTC
Created attachment 136731 [details]
logcat with chrome app crash

Hi,

while testing Android 8.1 build with latest mesa 17.4.0devel and libdrm 2.4.89

SIGSEGV MAPERR due to NULL pointer dereference happens with Chrome (and Firefox browser) apps, here is logcat extract for chrome


--------- beginning of crash
... F libc    : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x4 in tid 4851 (RenderThread), pid 4503 (.android.chrome)
... I crash_dump32: obtaining output fd from tombstoned, type: kDebuggerdTombstone
... I /system/bin/tombstoned: received crash request for pid 4503
... I crash_dump32: performing dump of process 4503 (target tid = 4851)
... F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
... F DEBUG   : Build fingerprint: 'Android-x86/android_x86_64/x86_64:8.1.0/OPM1.171019.011/utente12162000:eng/test-keys'
... F DEBUG   : Revision: '0'
... F DEBUG   : ABI: 'x86'
... F DEBUG   : pid: 4503, tid: 4851, name: RenderThread  >>> com.android.chrome <<<
... F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x4
... F DEBUG   : Cause: null pointer dereference
... F DEBUG   :     eax 00000000  ebx cca69cf4  ecx 00000009  edx bfbff004
... F DEBUG   :     esi 00000000  edi 00000010
... F DEBUG   :     xcs 00000023  xds 0000002b  xes 0000002b  xfs 0000006b  xss 0000002b
... F DEBUG   :     eip cc78d089  ebp bfbfefb8  esp bfbfefb8  flags 00010286
... F DEBUG   : 
... F DEBUG   : backtrace:
... F DEBUG   :     #00 pc 00651089  /system/vendor/lib/dri/i965_dri.so (build_id_length+9)
... F DEBUG   :     #01 pc 00041ae2  /system/vendor/lib/dri/i965_dri.so (brw_program_binary_init+130)
... F DEBUG   :     #02 pc 0002dd50  /system/vendor/lib/dri/i965_dri.so (brwCreateContext+864)
... F DEBUG   :     #03 pc 003d7777  /system/vendor/lib/dri/i965_dri.so (driCreateContextAttribs+663)
... F DEBUG   :     #04 pc 000112a1  /android/system/vendor/lib/egl/libGLES_mesa.so
... F DEBUG   :     #05 pc 00004096  /android/system/vendor/lib/egl/libGLES_mesa.so
... F DEBUG   :     #06 pc 0000ef37  /android/system/lib/libEGL.so


Analysing the backtrace with addr2line the NULL pointer is retured at 
src/util/build_id.c:117

utente@utente-Terabyte:~/oreo-x86_kernel_test$ addr2line -Cfe out/target/product/x86_64/symbols/system/vendor/lib/dri/i965_dri.so
00651089
build_id_length
external/mesa/src/util/build_id.c:117  <= here it is
00041ae2
brw_program_binary_init
external/mesa/src/mesa/drivers/dri/i965/brw_program_binary.c:53
0002dd50
brw_init_driver_functions
external/mesa/src/mesa/drivers/dri/i965/brw_context.c:334
003d7777
driCreateContextAttribs
external/mesa/src/mesa/drivers/dri/common/dri_util.c:479


utente@utente-Terabyte:~/oreo-x86_kernel_test$ addr2line -Cfe out/target/product/x86_64/symbols/system/vendor/lib/egl/libGLES_mesa.so
000112a1
dri2_create_context
external/mesa/src/egl/drivers/dri2/egl_dri2.c:1332
00004096
eglCreateContext
external/mesa/src/egl/main/eglapi.c:767


utente@utente-Terabyte:~/oreo-x86_kernel_test$ addr2line -Cfe out/target/product/x86_64/symbols/system/lib/libEGL.so
0000ef37
eglCreateContext
frameworks/native/opengl/libs/EGL/eglApi.cpp:896


Recent changes in src/util/build_id.c may have removed some Android required code, or i965 needs to take into account the NULL case.

Thanks for feedback
Mauro
Comment 1 Tapani Pälli 2018-01-16 05:59:15 UTC
I haven't yet found a proper fix bug here's a workaround you can consider to use:

https://github.com/intel/external-mesa/commit/df502da74998b45fbdd82455c7c9c98242db6d86

The problem is with the address comparison check in build_id callback (build_id_find_nhdr_callback). This comparison fails on Android when running 32bit. One alternative would be to always add '0x8000' to dlpi_addr during comparison on 32bit but that seems a bit weird.
Comment 2 Tapani Pälli 2018-01-16 06:45:43 UTC
Regarding the length, I've been also considering if we are missing required build flags somewhere. We are setting it currently in these places:

src/intel/Android.vulkan.mk:LOCAL_LDFLAGS += -Wl,--build-id=sha1
Android.mk:MESA_DRI_LDFLAGS := -Wl,--build-id=sha1

It *should* be enough I think .. but maybe not.
Comment 3 Mauro Rossi 2018-01-16 10:53:45 UTC
Hi Tapani,

Thanks a lot for sharing the workaround

It seams a regression introduced by 5c98d3825ccbed9054a1bb2de607116b2b31d48b "util: Query build-id by symbol address, not library name".
Is Chad Versace already having a look?

In the former coding of build_id_find_nhdr_callback() there was a comment:

-   /* The first object visited by callback is the main program.
-    * Android's libc returns a NULL pointer for the first executable.
-    */
-   if (info->dlpi_name == NULL)
-      return 0;

and NULL was checked; does last return 0 mean that nothing was done for Android, if Android libc returns systematically NULL?

In any case, I think code should also be robust to unconformant libraries and should not crash.

Just a question for my knowledge, where does the add '0x8000' to dlpi_addr during comparison on 32bit comes from, is it due to some "Android thing"?

Mauro
Comment 4 Tapani Pälli 2018-01-16 12:56:06 UTC
(In reply to Mauro Rossi from comment #3)
> Hi Tapani,
> 
> Thanks a lot for sharing the workaround
> 
> It seams a regression introduced by 5c98d3825ccbed9054a1bb2de607116b2b31d48b
> "util: Query build-id by symbol address, not library name".
> Is Chad Versace already having a look?

As far as I know, no.
 
> In the former coding of build_id_find_nhdr_callback() there was a comment:
> 
> -   /* The first object visited by callback is the main program.
> -    * Android's libc returns a NULL pointer for the first executable.
> -    */
> -   if (info->dlpi_name == NULL)
> -      return 0;
> 
> and NULL was checked; does last return 0 mean that nothing was done for
> Android, if Android libc returns systematically NULL?

This callback gets called for binary itself + libraries. This NULL check is not needed anymore since address check is better guard (if it worked ..). It works fine on desktop, both 32bit and 64bit. On Android it also works fine on 64bit.
 
> In any case, I think code should also be robust to unconformant libraries
> and should not crash.

Agreed, but if it's a linker bug I think then we want to get it fixed in bionic as well. Let's try to avoid duct-taping if possible.
 
> Just a question for my knowledge, where does the add '0x8000' to dlpi_addr
> during comparison on 32bit comes from, is it due to some "Android thing"?

That seems to be the amount of offset there is so that comparison will work. I haven't figured out why this is.
Comment 5 Stephan Gerhold 2018-01-22 16:45:13 UTC
This commit should fix it properly: https://github.com/me176c-dev/android_external_mesa/commit/07b637bcef660b3bf12d6854153692026ef3c698

Can you test this patch, Mauro?

The main problem is that the build-id code doesn't handle the (unusual) case that there is an offset between the base address and the LOAD segment (as indicated by the virtual address in the ELF header). On Android 32-bit, the offset comes from the "relocation_packer"[1] in the Android build system. It seems to re-pack the binary and adds the offset for whatever reason.

The ELF header tells us the offset so we just need to look through the header and add the offset to the base address (dlpi_addr) before comparing it to dli_fbase.

[1] https://android.googlesource.com/platform/bionic/+/master/tools/relocation_packer
Comment 6 Mauro Rossi 2018-01-23 23:46:44 UTC
Hi Stephan,

with the patch mentioned in comment #5
the problem is resolved, Chrome app and Firefox app are not crashing anymore.

Please proceed in having the fix merged in mesa 18.0 cycle

Thanks
Mauro
Comment 7 Emil Velikov 2018-02-05 17:04:04 UTC
Pushed to master. It should find its way to the stable releases in due time.

commit 02e2009b929a0f101540b9b55c5f0ed859d1b3be
Author: Stephan Gerhold <stephan@gerhold.net>
Date:   Wed Jan 24 15:13:24 2018 +0100

    util/build-id: Fix address comparison for binaries with LOAD vaddr > 0


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.