Crash happens when trying to login to google play store on android *OR* when trying to launch one of the 3DMark benchmarks (uses chrome webview as well) What happens is that aux_surface is NULL in 'get_fast_clear_rect'. backtrace: --- 8< --------------------- 07-26 11:29:45.832 3126 3126 F DEBUG : pid: 2951, tid: 3071, name: Chrome_InProcGp >>> com.google.android.gms.ui <<< 07-26 11:29:45.832 3126 3126 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x10 07-26 11:29:45.832 3126 3126 F DEBUG : rax 0000000100000001 rbx 0000000000000100 rcx 0000000000000000 rdx 000077230a5036c0 07-26 11:29:45.832 3126 3126 F DEBUG : rsi 0000000000000000 rdi 000077230a5030c0 07-26 11:29:45.832 3126 3126 F DEBUG : r8 0000000000000620 r9 0000000000000001 r10 0000000000000000 r11 0000000000000780 07-26 11:29:45.832 3126 3126 F DEBUG : r12 000077230a503808 r13 000077230a5037e0 r14 000077230a503808 r15 0000000000000000 07-26 11:29:45.832 3126 3126 F DEBUG : cs 0000000000000033 ss 000000000000002b 07-26 11:29:45.832 3126 3126 F DEBUG : rip 000077230df54c58 rbp 0000000000000780 rsp 000077230a5030a0 eflags 0000000000010246 07-26 11:29:45.840 3126 3126 F DEBUG : 07-26 11:29:45.840 3126 3126 F DEBUG : backtrace: 07-26 11:29:45.840 3126 3126 F DEBUG : #00 pc 00000000004a2c58 /system/lib64/dri/i965_dri.so (blorp_fast_clear+168) 07-26 11:29:45.840 3126 3126 F DEBUG : #01 pc 0000000000060f78 /system/lib64/dri/i965_dri.so (brw_blorp_clear_color+1960) 07-26 11:29:45.840 3126 3126 F DEBUG : #02 pc 0000000000063e20 /system/lib64/dri/i965_dri.so (brw_clear+752) 07-26 11:29:45.840 3126 3126 F DEBUG : #03 pc 0000000000c7a1bf /system/app/WebViewGoogle/WebViewGoogle.apk (offset 0xa6f000) --- 8< --------------------- bisected to this commit: --- 8< --------------------- commit f7f2fa8eb12ea60c1190ebf3c66901830150f4fd Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Thu Jun 15 22:58:49 2017 -0700 i965/miptree: Rework aux enabling This commit replaces the complex and confusing set of disable flags with two fairly straightforward fields which describe the intended auxiliary surface usage and whether or not the miptree supports fast clears. Right now, supports_fast_clear can be entirely derived from aux_usage but that will not always be the case. This commit makes functional changes. One of these changes is that it re-enables multisampled fast-clears which were accidentally disabled in cec30a666930ddb8476a9452a89364a24979ff62 around a year ago. Fixing this improves the SynMark v7 DeferredAA test by around ~3% on some gen9 hardware. This commit also gets us closer to enabling CCS_E for window-system buffers which are Y-tiled. Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason, let me know if you want some additional information. Any tips how to debug this appreciated!
The first thing to do would be to run with a debug build of the driver so you can get a pepper backtrace. Without knowing how the android bits tie into the miptree code, I couldn't say for sure. My best guess is that someone is calling miptree_create_for_bo but not calling intel_miptree_alloc_aux.
Here's a bit more symbols, it originates from a glClear call. ---- 8< ---- 07-27 05:14:51.060 3240 3240 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 07-27 05:14:51.060 3240 3240 F DEBUG : Build fingerprint: 'AndroidIA/androidia_64/androidia_64:7.1.1/NMF26Q/tpalli07260826:eng/test-keys' 07-27 05:14:51.060 3240 3240 F DEBUG : Revision: '0' 07-27 05:14:51.060 3240 3240 F DEBUG : ABI: 'x86_64' 07-27 05:14:51.060 3240 3240 F DEBUG : pid: 3072, tid: 3143, name: Chrome_InProcGp >>> com.google.android.gms.ui <<< 07-27 05:14:51.060 3240 3240 F DEBUG : signal 8 (SIGFPE), code 1 (FPE_INTDIV), fault addr 0x7ea19552a844 07-27 05:14:51.060 3240 3240 F DEBUG : rax 0000000000000000 rbx 00007ea195667dbc rcx 0000000000000000 rdx 0000000000000000 07-27 05:14:51.060 3240 3240 F DEBUG : rsi 0000000000000000 rdi 0000000000000000 07-27 05:14:51.060 3240 3240 F DEBUG : r8 0000000000000000 r9 0000000000000000 r10 00007ea1b626fc70 r11 0000000000000246 07-27 05:14:51.060 3240 3240 F DEBUG : r12 00007ea191ae0ea8 r13 00007ea18d9094dc r14 00007ea191ae0eb0 r15 00007ea191ae0eb4 07-27 05:14:51.060 3240 3240 F DEBUG : cs 0000000000000033 ss 000000000000002b 07-27 05:14:51.060 3240 3240 F DEBUG : rip 00007ea19552a844 rbp 00007ea191ae0e40 rsp 00007ea191ae0de0 eflags 0000000000010246 07-27 05:14:51.212 3240 3240 F DEBUG : 07-27 05:14:51.212 3240 3240 F DEBUG : backtrace: 07-27 05:14:51.212 3240 3240 F DEBUG : #00 pc 0000000000725844 /system/lib64/dri/i965_dri.so (get_fast_clear_rect+404) 07-27 05:14:51.212 3240 3240 F DEBUG : #01 pc 00000000007255d8 /system/lib64/dri/i965_dri.so (blorp_fast_clear+328) 07-27 05:14:51.212 3240 3240 F DEBUG : #02 pc 00000000000615d6 /system/lib64/dri/i965_dri.so (do_single_blorp_clear+1542) 07-27 05:14:51.212 3240 3240 F DEBUG : #03 pc 0000000000060fae /system/lib64/dri/i965_dri.so (brw_blorp_clear_color+206) 07-27 05:14:51.212 3240 3240 F DEBUG : #04 pc 000000000006671e /system/lib64/dri/i965_dri.so (brw_clear+494) 07-27 05:14:51.212 3240 3240 F DEBUG : #05 pc 0000000000318a32 /system/lib64/dri/i965_dri.so (_mesa_Clear+802) 07-27 05:14:51.212 3240 3240 F DEBUG : #06 pc 0000000000c7a1bf /system/app/WebViewGoogle/WebViewGoogle.apk (offset 0xa6f000)
I'm not sure if this is necessarily related but I can reproduce one failure also on desktop that bisects to this commit, Piglit tests egl-create-pbuffer-surface and egl-create-largest-pbuffer-surface start to throw this assertion: egl-create-pbuffer-surface: intel_mipmap_tree.c:1754: intel_miptree_alloc_ccs: Assertion `mt->aux_usage == ISL_AUX_USAGE_CCS_E || mt->aux_usage == ISL_AUX_USAGE_CCS_D' failed. before this commit test passes fine.
I did some more debugging and here is my 'braindump' on the matter. I found out that both of Piglit and Playstore failures are likely related as they both happen with pbuffer usage where we have doubleBufferMode 0 (Visual in gl_framebuffer), so maybe something that is done during flush/invalidate (in swapbuffers) is now not done, or the clear logic does not understand single buffered in some manner? I could not figure out a place where I could set aux_usage as ISL_AUX_USAGE_NONE based on the Visual setting since that gets lost somewhere but may be present when mt is initially created (?)
Created attachment 133071 [details] [review] runtime workaround Here's a workaround that fixes the Playstore crash, here we check visual during clear and use _mesa_meta_glsl_Clear instead of blorp.
With further debugging I've noticed that during the crash in blorp_fast_clear, aux surface format field can have complete garbage in it, sometimes 0x0 and sometimes random number, for example '0xd1a72000' (which goes way too big). When it works this field has ISL_FORMAT_GEN9_CCS_32BPP (0x38b) on Joule. So .. I'll try to understand and hunt down how this format field lives.
Created attachment 133167 [details] [review] workaround that works! workaround that fixes both desktop and android issues (earlier workaround caused dEQP regression)
I think this patch should fix it: https://patchwork.freedesktop.org/patch/170058/
Created attachment 133222 [details] [review] hopeful fix Jason, what do you think of this one? This fixes the Piglit issue. I will test this also on Android.
yep, patch in comment #10 fixes issues seen both on android and desktop
(In reply to Tapani Pälli from comment #10) > Created attachment 133222 [details] [review] [review] > hopeful fix > > Jason, what do you think of this one? This fixes the Piglit issue. I will > test this also on Android. Ok, that's really weird. That should already be handled by the mt->supports_fasrfclear check. How is mt->aux_usage == NONE but mt->supports_fast_clear == true? The only place can_fast_clear is set is in choose_auz_usage? A watchpoint in Feb should be able to answer that question.
IIRC aux buffer creation can succeed first but then this 'make shareable' func later marks mt with aux usage NONE. I can try to verify this.
(In reply to Tapani Pälli from comment #13) > IIRC aux buffer creation can succeed first but then this 'make shareable' > func later marks mt with aux usage NONE. I can try to verify this. I was able to reproduce on Haswell with X11. Yes, it was make_shareable that was the culpret though I still think the DRI2 path is a bit sketchy. I've got a new patch on the list which will hopefully also fix the android issue.
commit e7a52cc381d275b4ab8ee2fb230e32ab97090daf Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Thu Aug 3 09:48:55 2017 -0700 i965/miptree: Set supports_fast_clear = false in make_shareable The make_shareable function deletes the aux buffer and then whacks aux_usage to ISL_AUX_USAGE_NONE but not unsetting supports_fast_clear. Since we only look at supports_fast_clear to decide whether or not to do fast clears, this was causing assertion failures. Reported-by: Tapani Pälli <tapani.palli@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101925 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.