Created attachment 140796 [details] logcat of api_overhead test Hi, while performing some test on vulkan radv HAL on Android 8.1 (oreo-x86) build a segfault was noticed and traced with 3Dmark vulkan benchmark. The build is experimental and based on hwcomposer.drm + gralloc.gbm and tested on HD7790 gpu (Bonaire) The same 3Dmark vulkan benchmarks work without segfault on Intel anv running on Skylake GT2, so it is better to investigate why radv crashes. Inspection of crash with addr2line -Cfe returns last line of code: src/amd/vulkan/radv_image.c: 1289 Mauro Rossi android-x86 team --------- beginning of crash 07-22 21:31:15.938 4949 4995 F libc : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x1c in tid 4995 (Thread-9), pid 4949 (cation:workload) ... 07-22 21:31:15.956 5007 5007 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 07-22 21:31:15.956 5007 5007 F DEBUG : Build fingerprint: 'Android-x86/android_x86_64/x86_64:8.1.0/OPM6.171019.030.E1/maur07141051:userdebug/test-keys' 07-22 21:31:15.956 5007 5007 F DEBUG : Revision: '0' 07-22 21:31:15.956 5007 5007 F DEBUG : ABI: 'x86' 07-22 21:31:15.956 5007 5007 F DEBUG : pid: 4949, tid: 4995, name: Thread-9 >>> com.futuremark.dmandroid.application:workload <<< 07-22 21:31:15.956 5007 5007 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x1c 07-22 21:31:15.956 5007 5007 F DEBUG : Cause: null pointer dereference 07-22 21:31:15.956 5007 5007 F DEBUG : eax 00000000 ebx c65b690c ecx c2a6d0ac edx 00000097 07-22 21:31:15.956 5007 5007 F DEBUG : esi c2a6d0ac edi c1fbbb00 07-22 21:31:15.956 5007 5007 F DEBUG : xcs 00000023 xds 0000002b xes 0000002b xfs 0000006b xss 0000002b 07-22 21:31:15.956 5007 5007 F DEBUG : eip c64e0d47 ebp c2b7eb38 esp c2b7eaf0 flags 00010286 07-22 21:31:16.071 5007 5007 F DEBUG : 07-22 21:31:16.071 5007 5007 F DEBUG : backtrace: 07-22 21:31:16.071 5007 5007 F DEBUG : #00 pc 000f1d47 /system/vendor/lib/hw/vulkan.radv.so (radv_image_create+199) 07-22 21:31:16.071 5007 5007 F DEBUG : #01 pc 000f2f26 /system/vendor/lib/hw/vulkan.radv.so (radv_CreateImage+166) 07-22 21:31:16.071 5007 5007 F DEBUG : #02 pc 0000b59a /android/system/lib/libvulkan.so
Created attachment 140797 [details] logcat of slingshot extreeme vulkan test
Do you have a hash of the upstream git version you based this on? (just so I can see how the lines correlate in radv_image.c)
Hi Bas, the problem was obeserved in the last month, but the relevant parts radv_image.c and radv_android.c did not change, so anything between 3da693b and HEAD the addr2line output is not always reliable in terms of line of code, looking at the logcat the call causing the segfault is radv_image_create() at line 1261, the symbols resolved by crash dump should be reliable, based on previous experiences. I had a look at differences between radv and anv and I saw two of them: 1. wsi extension implementation is different, so I tried to revert a50f93e (radv/image: Implement the wsi "extension") but the problem was not solved 2. radv_image_create() passes .scanout parameter and anv_image_create() does not. I have not tried yet to remove .scanout in radv_image_create(), but is it worth an attempt? Mauro
No, on ANDROID, the call to radv_image_from_gralloc should happen for wsi images and wsi_info should be always NULL. Is ANDROID properly defined?
>Is ANDROID properly defined? Yes, because it is defined by the build system itself, without need of passing the LOCAL_CFLAGS := -DANDROID Mauro
Created attachment 140802 [details] extended logcat to see what happens before and after extended logcat to see what happens before and after
Looking at the full backtrace it looks like plain texture creation, nothing Android specific so I'll need to debug the specific app sometime. 07-22 21:31:16.071 5007 5007 F DEBUG : backtrace: 07-22 21:31:16.071 5007 5007 F DEBUG : #00 pc 000f1d47 /system/vendor/lib/hw/vulkan.radv.so (radv_image_create+199) 07-22 21:31:16.071 5007 5007 F DEBUG : #01 pc 000f2f26 /system/vendor/lib/hw/vulkan.radv.so (radv_CreateImage+166) 07-22 21:31:16.071 5007 5007 F DEBUG : #02 pc 0000b59a /android/system/lib/libvulkan.so 07-22 21:31:16.071 5007 5007 F DEBUG : #03 pc 001488fe /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::graphics::vulkan::native_memory_utility::create_texture(VkImageCreateInfo const&)+78) 07-22 21:31:16.071 5007 5007 F DEBUG : #04 pc 0013ffe9 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::graphics::vulkan::vulkan_device::create_texture(eva::graphics::texture_desc const&, eva::graphics::subresource_data const*)+233) 07-22 21:31:16.071 5007 5007 F DEBUG : #05 pc 001458cf /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so 07-22 21:31:16.071 5007 5007 F DEBUG : #06 pc 0013e474 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::graphics::vulkan::create_dds_texture_from_memory(std::function<eva::graphics::native_texture (eva::graphics::texture_desc const&, eva::graphics::subresource_data const*)>, eva::graphics::texture_desc*, unsigned char const*, DXGI_FORMAT, unsigned int, unsigned int, unsigned int)+2484) 07-22 21:31:16.071 5007 5007 F DEBUG : #07 pc 00145b8c /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::graphics::vulkan::vulkan_resource_factory::load_texture(boost::filesystem::path const&, DXGI_FORMAT, int)+316) 07-22 21:31:16.071 5007 5007 F DEBUG : #08 pc 001ad794 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::graphics::assets::texture_loader::load(std::string const&) const+1412) 07-22 21:31:16.071 5007 5007 F DEBUG : #09 pc 00153e69 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::ui::loading_screen_contents::loading_screen_contents(eva::ui::loading_screen_creation_parameters const&, eva::globals&)+649) 07-22 21:31:16.071 5007 5007 F DEBUG : #10 pc 0011f1a1 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (mobile_player::player::create_loading_screen()+161) 07-22 21:31:16.071 5007 5007 F DEBUG : #11 pc 00125563 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (_ZN13mobile_player6playerC1ERN3eva7globalsESsNS1_4math6vectorIiLj2EEESsRNS1_2ui13invalidatableERNS7_12message_pumpEP13ANativeWindowiNS1_8graphics8platformE+6083) 07-22 21:31:16.071 5007 5007 F DEBUG : #12 pc 00118e37 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (android_native::native_application::create_player(std::string, bool, ANativeWindow*)+375) 07-22 21:31:16.071 5007 5007 F DEBUG : #13 pc 0011cd0e /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (activity::handle_cmd(int)+206) 07-22 21:31:16.071 5007 5007 F DEBUG : #14 pc 0011ce15 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (activity::handle_cmd(android_app*, int)+37) 07-22 21:31:16.071 5007 5007 F DEBUG : #15 pc 0025e46c /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so 07-22 21:31:16.071 5007 5007 F DEBUG : #16 pc 0011cb9e /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (activity::run()+126) 07-22 21:31:16.071 5007 5007 F DEBUG : #17 pc 0011cf02 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (android_main+98) 07-22 21:31:16.071 5007 5007 F DEBUG : #18 pc 0025ddd4 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
In the code between ANDROID braces, is it correct that code falls through in case of !gralloc_info or should it bail out? Just a question as I may be very wrong and confused #ifdef ANDROID const VkNativeBufferANDROID *gralloc_info = vk_find_struct_const(pCreateInfo->pNext, NATIVE_BUFFER_ANDROID); if (gralloc_info) /* this line 1252 */ return radv_image_from_gralloc(device, pCreateInfo, gralloc_info, pAllocator, pImage); #endif
Yes, images not explicitly using the path for Android cross-api interop should just follow the normal path.
If you're in a position to get more useful info, it would be great if you could dump the *pCreateInfo arg of the crashing radv_CreateImage.
Hi Bas, thanks for your instructions I've tracked the line crashing in the radv_image.c file in master branch utente@utente-MS-7576:~/oreo-x86_hwc1$ addr2line -Cfe out/target/product/x86_64/symbols/system/vendor/lib/hw/vulkan.radv.so 000f3407 vk_format_get_nr_components external/mesa/src/amd/vulkan/vk_format.h:535 000f45e6 radv_CreateImage external/mesa/src/amd/vulkan/radv_image.c:1261 ^C I don't trust too much addr2lib when the most recent code line is an header, so I added ALOGD lines with LOG_TAG RADV in order to have the incremental inspection of radv_image_from_gralloc() arguments to expose the argument affected by null pointer, here is what I did: From f2db3b5a70aaf66e9bf60c42da99560e6e5b6c13 Mon Sep 17 00:00:00 2001 From: Mauro Rossi <issor.oruam@gmail.com> Date: Thu, 26 Jul 2018 23:48:54 +0200 Subject: [PATCH] DO NOT MERGE: add incremental dump of radv_image_from_gralloc() arguments Looking at the line crashing we will identity the argument with null pointer --- src/amd/vulkan/radv_image.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index 826f898d28..b1141c71e2 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -25,6 +25,9 @@ * IN THE SOFTWARE. */ +#define LOG_TAG "RADV" + +#include <cutils/log.h> #include "radv_debug.h" #include "radv_private.h" #include "vk_format.h" @@ -1248,7 +1251,11 @@ radv_CreateImage(VkDevice device, #ifdef ANDROID const VkNativeBufferANDROID *gralloc_info = vk_find_struct_const(pCreateInfo->pNext, NATIVE_BUFFER_ANDROID); - + ALOGD("radv_image_from_gralloc() device=0x%x", device); + ALOGD("radv_image_from_gralloc() *pCreateInfo=0x%x", *pCreateInfo); + ALOGD("radv_image_from_gralloc() gralloc_info=0x%x", gralloc_info); + ALOGD("radv_image_from_gralloc() *pAllocator=0x%x", *pAllocator); + ALOGD("radv_image_from_gralloc() *pImage=0x%x", *pImage); if (gralloc_info) return radv_image_from_gralloc(device, pCreateInfo, gralloc_info, pAllocator, pImage); -- 2.17.1 The output in the logcat is: ... D RADV : radv_image_from_gralloc() device= ... D RADV : radv_image_from_gralloc() *pCreateInfo=0xe ... D RADV : radv_image_from_gralloc() gralloc_info=0xc502a0e8 ... F libc : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x10 in tid 5548 (Thread-9), pid 5501 (cation:workload) The crash happened with pAllocator, so this argument is the null pointer. Mauro
Note that *pCreateInfo and *gralloc_info are structs, not integers so printing them this way is not very useful, since you are missing most of the data (also gralloc_info is a pointer, printing that is not very useful either). pAllocator is allowed to be NULL and it likely does not really matter for this issue.
Hi, but the sequence of ALOGD customized LOG_TAG RADV log does tell that the line invoking pAllocator argument was the first not being executed, and thus is involved in the segfault, could you please give an advice on how to modify the ALOGD arguments in a way it may be more useful? Thanks Mauro
Created attachment 140879 [details] [review] Hack test case that avoids the segfault Hi Bas, I (truly) don't know how this was possible, because it is all based on trial and error, but the changes in the attachment allow to avoid the segfault and to proceed in both 3DMARK Slingshot Extreme and 3DMARK API Overhead, the benchmark now proceed without segfaults. The changes are inspired from anv code with some changes as explained in the commit message. Probably the problem is elsewhere, I'd like to understand why these changes work. Mauro
Hi Bas, the problem is not happening anymore, in both mesa 18.2 and 18.3 branches. The bug report can be closed. Mauro
Thanks for letting us know. Closing!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.