Bug 108782 - Android: i965/brw_draw affected by segfault in intel_disable_rb_aux_buffer()
Summary: Android: i965/brw_draw affected by segfault in intel_disable_rb_aux_buffer()
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-18 21:32 UTC by Mauro Rossi
Modified: 2018-12-14 08:32 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Mauro Rossi 2018-11-18 21:32:52 UTC
Many Android applications are affected

F DEBUG   : pid: 7402, tid: 7421, name: RenderThread  >>> jackpal.androidterm <<<
F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x68
F DEBUG   : Cause: null pointer dereference
F DEBUG   : 
F DEBUG   : backtrace:
F DEBUG   :     #00 pc 0003c16a  /system/vendor/lib/dri/i965_dri.so (intel_disable_rb_aux_buffer+138)
F DEBUG   :     #01 pc 0003b954  /system/vendor/lib/dri/i965_dri.so (brw_predraw_resolve_inputs+980)
F DEBUG   :     #02 pc 0003c526  /system/vendor/lib/dri/i965_dri.so (brw_draw_prims+774)


utente@utente-Giga:~/oreo-x86_kernel$ addr2line -Cfe out/target/product/x86_64/symbols/system/vendor/lib/dri/i965_dri.so

0003c16a  /system/vendor/lib/dri/i965_dri.so (intel_disable_rb_aux_buffer+138)
0003b954  /system/vendor/lib/dri/i965_dri.so (brw_predraw_resolve_inputs+980)
0003c526  /system/vendor/lib/dri/i965_dri.so (brw_draw_prims+774)

intel_disable_rb_aux_buffer
external/mesa/src/mesa/drivers/dri/i965/brw_draw.c:366
brw_predraw_resolve_inputs
external/mesa/src/mesa/drivers/dri/i965/brw_draw.c:532
brw_prepare_drawing
external/mesa/src/mesa/drivers/dri/i965/brw_draw.c:841

The null pointer is irb->mt at line 366 however adding a check prior to evaluation of irb->mt->bo avoids the segfault (I can hear music of games instead of getting the error/segfault) but it produces black rendering :-)

[attempted patch]
utente@utente-Giga:~/oreo-x86_kernel/external/mesa$ git diff
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c
index bc0b3683a2..3a921e1dea 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -362,11 +362,12 @@ intel_disable_rb_aux_buffer(struct brw_context *brw,
    for (unsigned i = 0; i < fb->_NumColorDrawBuffers; i++) {
       struct intel_renderbuffer *irb =
          intel_renderbuffer(fb->_ColorDrawBuffers[i]);
-
-      if (irb && irb->mt->bo == tex_mt->bo &&
-          irb->mt_level >= min_level &&
-          irb->mt_level < min_level + num_levels) {
-         found = draw_aux_buffer_disabled[i] = true;
+      if (irb && irb->mt) {
+        if (irb->mt->bo == tex_mt->bo &&
+             irb->mt_level >= min_level &&
+             irb->mt_level < min_level + num_levels) {
+           found = draw_aux_buffer_disabled[i] = true;
+         }
       }
    }
 

Please assist in finding a solution. Older versions of code seem to have a more explicit handling in miptree with bool disable_aux, the simplifications seem to cause systematic crashes in many apps e.g. Olympus Rising

Please also be aware that there are several cases in i965 where {irb,stencil_irb,depth_irb}->mt are causing SIGSEGV MAPERR with null pointer dereference, maybe this happens only with Android, but it is very severe problem in there.

Another thing to check is if having const inside for loops is correct.
Please have a look at the latest i965 commits in my development branch
and instruct if some of them should be pushed to mesa-dev/18.3 ML:

https://github.com/maurossi/mesa/commits/19.0.0-devel_w46

Mauro
Comment 1 Tapani Pälli 2018-11-19 07:54:15 UTC
Which hardware is being used here? I haven't seen any similar problems when running Android Celadon on Kabylake with latest Mesa. Not that I've been testing many apps but GL/vulkan benchmarks work fine.
Comment 2 Mauro Rossi 2018-11-19 09:43:53 UTC
The HW is Lenovo T460 with Intel HD Graphics 530 (Skylake GT2)

As other info I'm using kernels 4.19 and 4.20rc, latest libdrm and the problem appears in both mesa 18.3 branch and in master

Do you have any significant changes to deal with separate RenderThread in Android-IA(Celadon)?

I've also tried to comment the if (rendering) {...} at line 532 in brw_draw.c, but, obviously, I get the same result textures are rendered 'black',
do you know if bool rendering effectively represent all cases where intel_disable_rb_aux_buffer() should be called
or if there are some factors to take into account?

What should be the behavior in brw_draw.c @line 366 when irb->mt is null and evaluation of irb->mt->bo causes the segfault?

Available for testing patches/hacks.

Mauro
Comment 3 Tapani Pälli 2018-11-19 11:14:56 UTC
(In reply to Mauro Rossi from comment #2)
> The HW is Lenovo T460 with Intel HD Graphics 530 (Skylake GT2)
> 
> As other info I'm using kernels 4.19 and 4.20rc, latest libdrm and the
> problem appears in both mesa 18.3 branch and in master
> 
> Do you have any significant changes to deal with separate RenderThread in
> Android-IA(Celadon)?

I don't think we have any changes related to this.
 
> I've also tried to comment the if (rendering) {...} at line 532 in
> brw_draw.c, but, obviously, I get the same result textures are rendered
> 'black',
> do you know if bool rendering effectively represent all cases where
> intel_disable_rb_aux_buffer() should be called
> or if there are some factors to take into account?
> 
> What should be the behavior in brw_draw.c @line 366 when irb->mt is null and
> evaluation of irb->mt->bo causes the segfault?
> 

That case should not happen, there should always be a miptree allocated to a render buffer. Something is broken already earlier if we can end up in such a situation.
Comment 4 Tapani Pälli 2018-11-19 11:28:19 UTC
(In reply to Tapani Pälli from comment #3)
> (In reply to Mauro Rossi from comment #2)
> > 
> > What should be the behavior in brw_draw.c @line 366 when irb->mt is null and
> > evaluation of irb->mt->bo causes the segfault?
> > 
> 
> That case should not happen, there should always be a miptree allocated to a
> render buffer. Something is broken already earlier if we can end up in such
> a situation.

What I mean is that some of the miptree_create*() functions may have failed already earlier (for some reason) and this is where the actual bug might be lurking.
Comment 5 Mauro Rossi 2018-11-19 15:16:23 UTC
Hi, 

I will add some debug printout to further investigate,
but in the meantime could you please check
src/mesa/drivers/dri/i965/intel_mipmap_tree.c @line757
if the ISL_AUX_USAGE_CCS_E should taken into account to release the mt?

   /* Create the auxiliary surface up-front. CCS_D, on the other hand, can only
    * compress clear color so we wait until an actual fast-clear to allocate
    * it.
    */
   if (mt->aux_usage != ISL_AUX_USAGE_CCS_D &&
       !intel_miptree_alloc_aux(brw, mt)) {
      intel_miptree_release(&mt);
      return NULL;
   }

   return mt;
}
Comment 6 Mauro Rossi 2018-12-13 21:43:03 UTC
Hi,

this very tricky problem is caused 
by the android/gralloc_handle.h structure in libdrm project, 
which has different sizes for 32bit libs allocated handles and 64bit,
thus causing crashes in 64 bit builds only.

This can be closed and I'm opening a bug on mesa/drm project.

Mauro
Comment 7 Tapani Pälli 2018-12-14 08:32:34 UTC
Thanks Mauro fo(In reply to Mauro Rossi from comment #6)
> Hi,
> 
> this very tricky problem is caused 
> by the android/gralloc_handle.h structure in libdrm project, 
> which has different sizes for 32bit libs allocated handles and 64bit,
> thus causing crashes in 64 bit builds only.
> 
> This can be closed and I'm opening a bug on mesa/drm project.
> 
> Mauro

Thanks Mauro for figuring this out!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.