Bug 104546 - Crash happens when running compute pipeline after calling glxMakeCurrent two times
Summary: Crash happens when running compute pipeline after calling glxMakeCurrent two ...
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Topi Pohjolainen
QA Contact: Intel 3D Bugs Mailing List
Depends on:
Reported: 2018-01-09 06:48 UTC by xinghua
Modified: 2018-02-01 07:38 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:

a simple test case (3.10 KB, text/x-csrc)
2018-01-09 06:49 UTC, xinghua
patch (640 bytes, text/plain)
2018-01-16 09:06 UTC, vadym

Description xinghua 2018-01-09 06:48:55 UTC
Reproduce steps:
1. Attached file is the C source code;
2. Build the source file, "gcc -o test test.c -lX11 -lepoxy";
3. run "./test", crash happens.

Actually, I summerized above case from ANGLE project, the crash issue was found on ANGLE project initially. If you want to produce this issue on ANGLE, follow below steps:
1. Download ANGLE source code, https://github.com/google/angle
2. Build it follow ./doc/DevSetup.md document
3. run "./angle_deqp_gles31_tests --gtest_filter=dEQP_GLES31.Default/functional_synchronization_in_invocation_image_write_read

I had also debugged this issue in mesa, crash for attempting to access a null pointer, the call stack is as below,
#0  0x00007ffff356d818 in intel_disable_rb_aux_buffer (brw=0x7ffff7fc9040, tex_mt=0x5555558eb910, min_level=0, num_levels=4294967295, usage=0x7ffff3b8c0ad "as a shader image") at brw_draw.c:359
#1  0x00007ffff356dc8e in brw_predraw_resolve_inputs (brw=0x7ffff7fc9040, rendering=false) at brw_draw.c:444
#2  0x00007ffff35644b1 in brw_dispatch_compute_common (ctx=0x7ffff7fc9040) at brw_compute.c:180
#3  0x00007ffff35646f0 in brw_dispatch_compute (ctx=0x7ffff7fc9040, num_groups=0x7fffffffdd9c) at brw_compute.c:234
#4  0x00007ffff30786cd in dispatch_compute (no_error=false, num_groups_z=1, num_groups_y=2, num_groups_x=1) at main/compute.c:265
#5  _mesa_DispatchCompute (num_groups_x=1, num_groups_y=2, num_groups_z=1) at main/compute.c:280

In intel_disable_rb_aux_buffer function, the mip tree of renderbuffer is not created, irb->mt is null pointer,
   if (irb && irb->mt->bo == tex_mt->bo &&       // crash here
          irb->mt_level >= min_level &&
          irb->mt_level < min_level + num_levels)

The second drawable overrides the first one when call glXMakeCurrent the second time. In ./i965/brw_context.c::intelMakeCurrent function, brw->ctx.ViewportInitialized is setted to true when call glXMakeCurrent the first time, it does not call intel_prepare_render to update the render buffer when call glXMakeCurrent the second time, the mip tree is not created.
Because currently use computation pipeline, so it seems that render buffer does need to real memory buffer and mip tree, is it right? Is it a mesa bug? Or Do we use the apis incorrectly?
Comment 1 xinghua 2018-01-09 06:49:28 UTC
Created attachment 136625 [details]
a simple test case
Comment 2 vadym 2018-01-15 14:15:39 UTC
Tested on HP ZBook with Ubuntu 16.04 LTS.

Issue is reproducible with exactly the same crash on mesa 17.4-devel (42f421c).

I'll try to bisect the first bad commit.

My HW info:

Manufacturer: HP
Product Name: HP ZBook 14u G4
Serial Number: 5CG7442WKZ
UUID: C8764034-3466-44AD-4BD7-92CFB70F9AD0
Wake-up Type: Power Switch
SKU Number: 1LL55AV
Family: 103C_5336AN HP EliteBook

# glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Intel Open Source Technology Center (0x8086)
    Device: Mesa DRI Intel(R) HD Graphics 620 (Kaby Lake GT2) x86/MMX/SSE2 (0x5916)
    Version: 17.4.0
    Accelerated: yes
    Video memory: 1536MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.5
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 620 (Kaby Lake GT2) x86/MMX/SSE2
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.4.0-devel (git-42f421cbbf)
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.0 Mesa 17.4.0-devel (git-42f421cbbf)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 17.4.0-devel (git-42f421cbbf)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
Comment 3 vadym 2018-01-15 16:10:53 UTC
I've just found the first bad commit for this issue:

f5859b45b1686e8116380d870f48432495fb19c7 is the first bad commit
commit f5859b45b1686e8116380d870f48432495fb19c7
Author: Topi Pohjolainen <topi.pohjolainen@intel.com>
Date:   Tue Jun 27 18:10:31 2017 +0300

    i965/miptree: Switch remaining surfaces to isl
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
    Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Comment 4 xinghua 2018-01-16 03:20:43 UTC
The patch(f5859b4...) seems had been merged in Ubuntu 17.10 system graphics driver. I had found that all compute shader cases in angle_end2end_tests of ANGLE project (if call glDispatchCompute) would crash.
Not sure whether other apps would crash in the feture on Ubuntu 17.10.
Comment 5 vadym 2018-01-16 09:06:04 UTC
Created attachment 136746 [details]
Comment 6 vadym 2018-01-16 09:07:54 UTC
Hi xinghua,

Can you please test attached patch ?
Comment 7 Topi Pohjolainen 2018-01-16 11:52:21 UTC
I took a closer look how the ISL based differs compared to the old logic. Older version simply didn't call intel_disable_rb_aux_buffer() from brw_predraw_resolve_inputs(). This is because the texture object being examined doesn't have auxiliary buffer at all, i.e., the first condition "tex_obj->mt->aux_usage == ISL_AUX_USAGE_CCS_E" being false. With ISL, however, we choose to enable auxiliary for the miptree backing the texture object and therefore end up calling intel_disable_rb_aux_buffer().

Now the actual thing being wrong is to even consider render buffers for compute pipeline. Old version had the same flaw, it was just lucky enough not to get that far.

Jason just recently revised some of the surrounding logic and I need to take a look if it actually fixed this. Even in that case we probably needed something for stable. I'll look into that.
Comment 8 Topi Pohjolainen 2018-01-24 09:35:32 UTC
Fix pushed into master:

commit ec4bb693a0175744465f272a8bcea2db043ba1bc (HEAD -> 104546, origin_push/master, origin/master, origin/HEAD, fdo_push/jenkins)
Author: Topi Pohjolainen <topi.pohjolainen@intel.com>
Date:   Tue Jan 16 14:17:00 2018 +0200

    i965: Don't try to disable render aux buffers for compute
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104546
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.