Bug 107806

Summary: glsl_get_natural_size_align_bytes() ABORT with GfxBench Vulkan AztecRuins
Product: Mesa Reporter: Eero Tamminen <eero.t.tamminen>
Component: Drivers/Vulkan/intelAssignee: Intel 3D Bugs Mailing List <intel-3d-bugs>
Status: VERIFIED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: jason
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Eero Tamminen 2018-09-03 13:55:29 UTC
Setup:
* Ubuntu 18.04 (seems to enable extra stack checks for gcc compared to 16.04)
* Git versions of Mesa, X & kernel
* GfxBench v5 command line version

Test-case:
- AztecRuins Vulkan:
  bin/testfw_app --gfx vulkan --gl_api vulkan --width 1920 --height 1080 --test_id vulkan_5_normal

Expected outcome:
* 18.04 (gcc 7.3) build & run works as well as it does on 16.04 (gcc 5.4) build & run

Actual outcome:
* Stack checks detect an issue, which shows in the following way in Gdb:
-------------------------------
*** stack smashing detected ***: <unknown> terminated

Thread 2 "testfw_app" received signal SIGABRT, Aborted.
[Switching to LWP 7224]
0x00007ffff6ddc917 in raise () from /usr/lib64/libc.so.6
(gdb) bt
#0  0x00007ffff6ddc917 in raise () from /usr/lib64/libc.so.6
#1  0x00007ffff6dbe50d in abort () from /usr/lib64/libc.so.6
#2  0x00007ffff6e24c06 in ?? () from /usr/lib64/libc.so.6
#3  0x00007ffff6ec6ea1 in ?? () from /usr/lib64/libc.so.6
#4  0x00007ffff6ec6e62 in __stack_chk_fail () from /usr/lib64/libc.so.6
#5  0x00007ffff4b353c3 in glsl_get_natural_size_align_bytes (type=<optimized out>, size=<optimized out>, align=<optimized out>)
    at src/compiler/nir_types.cpp:568
#6  0x00007ffff4ad40c6 in nir_opt_large_constants (shader=shader@entry=0x7ffff20b7210, 
    size_align=0x7ffff4b35250 <glsl_get_natural_size_align_bytes(glsl_type const*, unsigned int*, unsigned int*)>, size_align@entry=0x0, 
    threshold=threshold@entry=32) at src/compiler/nir/nir_opt_large_constants.c:208
#7  0x00007ffff49e442f in brw_preprocess_nir (compiler=compiler@entry=0x7ffff000bd40, nir=nir@entry=0x7ffff20b7210)
    at src/intel/compiler/brw_nir.c:680
#8  0x00007ffff48edc1d in anv_shader_compile_to_nir (mem_ctx=<optimized out>, module=<optimized out>, entrypoint_name=<optimized out>, stage=<optimized out>, 
    spec_info=0x0, pipeline=<optimized out>, pipeline=<optimized out>) at src/intel/vulkan/anv_pipeline.c:222
#9  0x00007ffff48ee068 in anv_pipeline_compile_graphics (pipeline=pipeline@entry=0x7ffff2ff1bb0, cache=cache@entry=0x7ffff3569860, 
    info=info@entry=0x7ffff56111d0) at src/intel/vulkan/anv_pipeline.c:970
#10 0x00007ffff48ef625 in anv_pipeline_init (pipeline=pipeline@entry=0x7ffff2ff1bb0, device=device@entry=0x7ffff017fbb0, cache=cache@entry=0x7ffff3569860, 
    pCreateInfo=pCreateInfo@entry=0x7ffff56111d0, alloc=0x7ffff017fbb8, alloc@entry=0x0) at src/intel/vulkan/anv_pipeline.c:1446
#11 0x00007ffff49357a1 in gen9_graphics_pipeline_create (pPipeline=0x7ffff2c3c850, pAllocator=0x0, pCreateInfo=0x7ffff56111d0, cache=0x7ffff3569860, 
    _device=0x7ffff017fbb0) at src/intel/vulkan/genX_pipeline.c:1734
#12 gen9_CreateGraphicsPipelines (_device=0x7ffff017fbb0, pipelineCache=0x7ffff3569860, count=1, pCreateInfos=<optimized out>, pAllocator=0x0, 
    pPipelines=0x7ffff2c3c850) at src/intel/vulkan/genX_pipeline.c:1944
-------------------------------

Looking at the aborting code lines:
-------------------------------
glsl_get_natural_size_align_bytes(const struct glsl_type *type,
                                  unsigned *size, unsigned *align)
{
   switch (type->base_type) {
...
   case GLSL_TYPE_SAMPLER:
   case GLSL_TYPE_ATOMIC_UINT:
   case GLSL_TYPE_SUBROUTINE:
   case GLSL_TYPE_IMAGE:
   case GLSL_TYPE_VOID:
   case GLSL_TYPE_ERROR:
   case GLSL_TYPE_INTERFACE:
   case GLSL_TYPE_FUNCTION:
      unreachable("type does not have a natural size");
   }
}
-------------------------------

I assume it's unreachable that's directly triggering the stack smash, instead of this being a stack corruption that could have happened elsewhere.

Gdb output seems to validate that type used here is indeed sampler ("_sampler2DShadow_type"):
-------------------------------
...
(gdb) up
#6  0x00007ffff4ad40c6 in nir_opt_large_constants (shader=shader@entry=0x7ffff20b7210, 
    size_align=0x7ffff4b35250 <glsl_get_natural_size_align_bytes(glsl_type const*, unsigned int*, unsigned int*)>, size_align@entry=0x0, 
    threshold=threshold@entry=32) at src/compiler/nir/nir_opt_large_constants.c:208
208	src/compiler/nir/nir_opt_large_constants.c: No such file or directory.
(gdb) info locals
info = 0x7ffff2da1444
var_size = 4
var_align = 4
var = 0x7ffff2e06bf0
impl = <optimized out>
num_locals = <optimized out>
var_infos = 0x7ffff2da1430
first_block = <optimized out>
b = {cursor = {option = 4116723276, {block = 0x7ffff2f41ce0, instr = 0x7ffff2f41ce0}}, exact = 11, shader = 0x7ffff4ad6008 <nir_opt_undef+296>, 
  impl = 0x7ffff56036c7}
(gdb) print *info
$1 = {is_constant = true, found_read = false}
(gdb) print *var
$2 = {node = {next = 0x7ffff2e06e00, prev = 0x7ffff2e069e0}, type = 0x7ffff4e10920 <glsl_type::_sampler2DShadow_type>, name = 0x7ffff2e06cc0 "param", data = {
    mode = nir_var_local, read_only = 0, centroid = 0, sample = 0, patch = 0, invariant = 0, always_active_io = 0, interpolation = 0, origin_upper_left = 0, 
    pixel_center_integer = 0, location_frac = 0, compact = 0, fb_fetch_output = 0, bindless = 0, explicit_binding = 0, explicit_xfb_buffer = 0, 
    explicit_xfb_stride = 0, explicit_offset = 0, depth_layout = nir_depth_layout_none, location = -1, driver_location = 0, stream = 0, index = 10, 
    descriptor_set = 0, binding = 0, offset = 0, xfb_buffer = 0, xfb_stride = 0, how_declared = 0, image = {read_only = false, write_only = false, 
      coherent = false, _volatile = false, restrict_flag = false, format = 0}}, num_state_slots = 0, state_slots = 0x0, constant_initializer = 0x0, 
  interface_type = 0x0, num_members = 0, members = 0x0}
-------------------------------

(I got same result also from another GfxBench build, with a different backend, on Clear Linux / Wayland instead of Ubuntu 18.04 / X, when using same Mesa build.)

I've tested this on HSW & BXT and it happens on both, so I think it's a generic issue.  GL & GLES versions of Aztec Ruins works fine, so the issue is Anvil specific.

I've tested this with Mesa 743e11c10b from week ago and Mesa 233718a199 from today (I haven't tested 18.04 build earlier, so I have no idea whether this is a regression).
Comment 1 Eero Tamminen 2018-09-03 14:24:11 UTC
Lionel has a patch for this:
  https://patchwork.freedesktop.org/patch/245761/
Comment 2 Lionel Landwerlin 2018-09-05 10:05:04 UTC
Fixed in :

commit 07a2098a708a2bdba1697cbfeb435533b828d5c4
Author: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Date:   Thu Aug 23 14:34:19 2018 +0100

    intel: compiler: remove dead local variables at optimization pass
Comment 3 Eero Tamminen 2018-09-05 10:34:14 UTC
Verified, works fine now.

(HSW GT2 seems to have some other issue, Aztec Ruins Vulkan version fails there randomly.)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.