Bug 104246 - Talos Principle Vulkan version crash: spirv_to_nir() returns NULL entry_point
Summary: Talos Principle Vulkan version crash: spirv_to_nir() returns NULL entry_point
Status: VERIFIED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Jason Ekstrand
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-13 16:32 UTC by Eero Tamminen
Modified: 2017-12-19 13:24 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Eero Tamminen 2017-12-13 16:32:09 UTC
Setup:
- KBL GT3e
- Ubuntu 16.04
- Mesa git version
- Latest Talos Principle available from Steam downloaded
- Steam game launch options set to use Vulkan: "%command% +gfxStrAPI VLK"
- Talos Gfx options set to high GPU speed

Test-case:
- Start Talos Principle

Expected outcome:
- Talos starts, like with Mesa commit "mesa-17.3.0"

Actual outcome:
- Talos Principle segfaults before showing anything

Crash is because of NULL pointer access in spirv->nir fragment shader compilation:
---------------------------------------------------------
Thread 1 "Talos" received signal SIGSEGV, Segmentation fault.
anv_shader_compile_to_nir (pipeline=0x5142730, pipeline=0x5142730, spec_info=0x0, stage=MESA_SHADER_FRAGMENT, entrypoint_name=0x7fffffff90d0 "", 
    module=0x3c69600, mem_ctx=0x37a8170) at ../../../src/intel/vulkan/anv_pipeline.c:153
153	   nir_shader *nir = entry_point->shader;
(gdb) bt
#0  anv_shader_compile_to_nir (pipeline=0x5142730, pipeline=0x5142730, spec_info=0x0, stage=MESA_SHADER_FRAGMENT, entrypoint_name=0x7fffffff90d0 "", 
    module=0x3c69600, mem_ctx=0x37a8170) at ../../../src/intel/vulkan/anv_pipeline.c:153
#1  anv_pipeline_compile (pipeline=pipeline@entry=0x5142730, mem_ctx=mem_ctx@entry=0x37a8170, module=module@entry=0x3c69600, 
    entrypoint=entrypoint@entry=0x237b915 "main", stage=stage@entry=MESA_SHADER_FRAGMENT, spec_info=spec_info@entry=0x0, prog_data=0x7fffffff90d0, 
    map=0x7fffffff8ff0) at ../../../src/intel/vulkan/anv_pipeline.c:395
#2  0x00007fffe6056162 in anv_pipeline_compile_fs (pipeline=pipeline@entry=0x5142730, cache=cache@entry=0x3923c20, info=info@entry=0x7fffecabf8f0, 
    module=module@entry=0x3c69600, entrypoint=0x237b915 "main", spec_info=0x0) at ../../../src/intel/vulkan/anv_pipeline.c:871
#3  0x00007fffe605793e in anv_pipeline_init (pipeline=pipeline@entry=0x5142730, device=device@entry=0x3c059c0, cache=cache@entry=0x3923c20, 
    pCreateInfo=pCreateInfo@entry=0x7fffecabf8f0, alloc=0x3c059c8, alloc@entry=0x0) at ../../../src/intel/vulkan/anv_pipeline.c:1347
#4  0x00007fffe61f28cf in gen9_graphics_pipeline_create (pPipeline=0x7fffffffcd80, pAllocator=0x0, pCreateInfo=0x7fffecabf8f0, cache=0x3923c20, 
    _device=0x3c059c0) at ../../../src/intel/vulkan/genX_pipeline.c:1661
#5  gen9_CreateGraphicsPipelines (_device=0x3c059c0, pipelineCache=0x3923c20, count=1, pCreateInfos=<optimized out>, pAllocator=0x0, pPipelines=0x7fffffffcd80)
    at ../../../src/intel/vulkan/genX_pipeline.c:1864

(gdb) list anv_shader_compile_to_nir
...
149	   nir_function *entry_point =
150	      spirv_to_nir(spirv, module->size / 4,
151	                   spec_entries, num_spec_entries,
152	                   stage, entrypoint_name, &spirv_options, nir_options);
153	   nir_shader *nir = entry_point->shader;

(gdb) disassemble
Dump of assembler code for function anv_pipeline_compile:
...
   0x00007fffe6055a50 <+256>:	callq  0x7fffe63fa130 <spirv_to_nir>
=> 0x00007fffe6055a55 <+261>:	mov    0x18(%rax),%rbx
   0x00007fffe6055a59 <+265>:	mov    0x20(%rsp),%rdi

(gdb) info registers rax rbx
rax            0x0	0
rbx            0x0	0
---------------------------------------------------------


In case it matters, here are variable values & struct contents:
---------------------------------------------------------
(gdb) info locals
device = <optimized out>
spec_entries = 0x0
spirv_options = {lower_workgroup_access_to_offsets = true, caps = {float64 = true, image_ms_array = false, tessellation = true, draw_parameters = true, 
    image_read_without_format = false, image_write_without_format = true, int64 = true, multiview = true, variable_pointers = true, storage_16bit = true}, 
  debug = {func = 0x0, private_data = 0x0}}
entry_point = <optimized out>
nir = <optimized out>
compiler = 0x39d2330
nir_options = 0x7fffe644afc0 <scalar_nir_options>
spirv = 0x3c69618
num_spec_entries = 0

(gdb) print *module
$7 = {sha1 = "Y%cewe\242\022\065\064\225\t\354ͥ\222\222A\333 ", size = 1664, data = 0x3c69618 "\003\002#\a"}

(gdb) print *nir_options
$1 = {lower_fdiv = true, lower_ffma = false, fuse_ffma = false, lower_flrp32 = false, lower_flrp64 = true, lower_fpow = false, lower_fsat = false, 
  lower_fsqrt = false, lower_fmod32 = true, lower_fmod64 = false, lower_bitfield_extract = true, lower_bitfield_insert = true, lower_uadd_carry = true, 
  lower_usub_borrow = true, lower_negate = false, lower_sub = true, lower_scmp = true, lower_idiv = false, fdot_replicates = false, lower_ffract = false, 
  lower_pack_half_2x16 = true, lower_pack_unorm_2x16 = true, lower_pack_snorm_2x16 = true, lower_pack_unorm_4x8 = true, lower_pack_snorm_4x8 = true, 
  lower_unpack_half_2x16 = true, lower_unpack_unorm_2x16 = true, lower_unpack_snorm_2x16 = true, lower_unpack_unorm_4x8 = true, lower_unpack_snorm_4x8 = true, 
  lower_extract_byte = false, lower_extract_word = false, native_integers = true, vertex_id_zero_based = true, lower_cs_local_index_from_id = false, 
  use_interpolated_input_intrinsics = true, max_unroll_iterations = 32}
---------------------------------------------------------


Debug output I got by prefixing launch options with:
  gdbserver 127.0.0.1:1234

And in another terminal doing:
  (gdb) target remote :1234
Comment 1 Eero Tamminen 2017-12-13 17:15:46 UTC
Manual git bisect gave following as the first bad commit:
---------------------------------------------------------
commit a7c2be9944a9e2028a02fcfbab501891293401b1
Author:     Jason Ekstrand <jason.ekstrand@intel.com>
AuthorDate: Wed Dec 6 09:14:20 2017 -0800
Commit:     Jason Ekstrand <jason.ekstrand@intel.com>
CommitDate: Mon Dec 11 22:28:34 2017 -0800

    spirv: Add type validation for OpSelect
    
    Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
---------------------------------------------------------
Comment 2 Jason Ekstrand 2017-12-14 02:44:51 UTC
FYI: Ever since 94ca8e04adf681b0cad6ade1c9f28856efe35ae6, most SPIR-V errors result in spirv_to_nir bailing cleanly and returning a NULL.  anv_pipeline_compile_to_nir then dereferences the NULL and crashes.  A more useful backtrace would be if you set a breakpoint on _vtn_fail and gave me the backtrace from there.

Arguably, it may be better to add an abort() to _vtn_fail when built in debug mode because the NULL dereference is kind-of mean.
Comment 3 Eero Tamminen 2017-12-14 09:29:35 UTC
My dev SSD was completely corrupted this morning (fsck.ext4 has been listing inodes with issues for the last half an hour).

-> I won't be able to provide better backtrace before next year (without working setup it would take too much time, so other stuff than Steam gets more priority until that). :-/
Comment 4 Eero Tamminen 2017-12-14 16:55:14 UTC
Miraculously, the SSD got into fully working condition eventually (never happened to me before, with that much errors from fsck).

Here's the backtrace you asked:
----------------------------------------------
Thread 1 "Talos" hit Breakpoint 1, _vtn_fail (b=b@entry=0x5169c60, file=file@entry=0x7fffe6828cc0 "../../../src/compiler/spirv/spirv_to_nir.c", 
    line=line@entry=3517, 
    fmt=fmt@entry=0x7fffe6829980 "Condition type of OpSelect must be a scalar or vector of Boolean type. It must have the same number of components as Result Type") at ../../../src/compiler/spirv/spirv_to_nir.c:112
112	{
(gdb) bt
#0  _vtn_fail (b=b@entry=0x5169c60, file=file@entry=0x7fffe6828cc0 "../../../src/compiler/spirv/spirv_to_nir.c", line=line@entry=3517, 
    fmt=fmt@entry=0x7fffe6829980 "Condition type of OpSelect must be a scalar or vector of Boolean type. It must have the same number of components as Result Type") at ../../../src/compiler/spirv/spirv_to_nir.c:112
#1  0x00007fffe67b5f0c in vtn_handle_body_instruction (b=0x5169c60, opcode=<optimized out>, w=0x3c67bfc, count=<optimized out>)
    at ../../../src/compiler/spirv/spirv_to_nir.c:3514
#2  0x00007fffe67ae7a6 in vtn_foreach_instruction (b=b@entry=0x5169c60, start=<optimized out>, end=end@entry=0x3c67c60, 
    handler=handler@entry=0x7fffe67b4a00 <vtn_handle_body_instruction>) at ../../../src/compiler/spirv/spirv_to_nir.c:323
#3  0x00007fffe67c31e1 in vtn_emit_cf_list (b=b@entry=0x5169c60, cf_list=cf_list@entry=0x5121f38, switch_fall_var=switch_fall_var@entry=0x0, 
    has_switch_break=has_switch_break@entry=0x0, handler=handler@entry=0x7fffe67b4a00 <vtn_handle_body_instruction>) at ../../../src/compiler/spirv/vtn_cfg.c:703
#4  0x00007fffe67c3562 in vtn_function_emit (b=b@entry=0x5169c60, func=func@entry=0x5121f10, 
    instruction_handler=instruction_handler@entry=0x7fffe67b4a00 <vtn_handle_body_instruction>) at ../../../src/compiler/spirv/vtn_cfg.c:878
#5  0x00007fffe67b6394 in spirv_to_nir (words=<optimized out>, words@entry=0x3c675e8, word_count=416, spec=spec@entry=0x0, num_spec=num_spec@entry=0, 
    stage=stage@entry=MESA_SHADER_FRAGMENT, entry_point_name=<optimized out>, options=0x7fffffff8c80, nir_options=0x7fffe6806fc0 <scalar_nir_options>)
    at ../../../src/compiler/spirv/spirv_to_nir.c:3742
#6  0x00007fffe6411a55 in anv_shader_compile_to_nir (pipeline=0x5123fd0, pipeline=0x5123fd0, spec_info=0x0, stage=MESA_SHADER_FRAGMENT, 
    entrypoint_name=0x7fffffff8e40 "", module=0x3c675d0, mem_ctx=0x3c674b0) at ../../../src/intel/vulkan/anv_pipeline.c:149
#7  anv_pipeline_compile (pipeline=pipeline@entry=0x5123fd0, mem_ctx=mem_ctx@entry=0x3c674b0, module=module@entry=0x3c675d0, 
    entrypoint=entrypoint@entry=0x237b915 "main", stage=stage@entry=MESA_SHADER_FRAGMENT, spec_info=spec_info@entry=0x0, prog_data=0x7fffffff8e40, 
    map=0x7fffffff8d60) at ../../../src/intel/vulkan/anv_pipeline.c:395
#8  0x00007fffe6412162 in anv_pipeline_compile_fs (pipeline=pipeline@entry=0x5123fd0, cache=cache@entry=0x3af2090, info=info@entry=0x7fffec7ac9b0, 
    module=module@entry=0x3c675d0, entrypoint=0x237b915 "main", spec_info=0x0) at ../../../src/intel/vulkan/anv_pipeline.c:871
#9  0x00007fffe641393e in anv_pipeline_init (pipeline=pipeline@entry=0x5123fd0, device=device@entry=0x3bfde10, cache=cache@entry=0x3af2090, 
    pCreateInfo=pCreateInfo@entry=0x7fffec7ac9b0, alloc=0x3bfde18, alloc@entry=0x0) at ../../../src/intel/vulkan/anv_pipeline.c:1347
#10 0x00007fffe65ae8cf in gen9_graphics_pipeline_create (pPipeline=0x7fffffffcaf0, pAllocator=0x0, pCreateInfo=0x7fffec7ac9b0, cache=0x3af2090, 
    _device=0x3bfde10) at ../../../src/intel/vulkan/genX_pipeline.c:1661
#11 gen9_CreateGraphicsPipelines (_device=0x3bfde10, pipelineCache=0x3af2090, count=1, pCreateInfos=<optimized out>, pAllocator=0x0, pPipelines=0x7fffffffcaf0)
    at ../../../src/intel/vulkan/genX_pipeline.c:1864

(gdb) up
#1  0x00007fffe67b5f0c in vtn_handle_body_instruction (b=0x5169c60, opcode=<optimized out>, w=0x3c67bfc, count=<optimized out>)
    at ../../../src/compiler/spirv/spirv_to_nir.c:3514
3514	      vtn_fail_if(sel_val->type->type != sel_type,

(gdb) info locals
ssa = <optimized out>
sel_type = <optimized out>
res_type = <optimized out>

(gdb) print *b
$1 = {nb = {cursor = {option = nir_cursor_after_instr, {block = 0x516a5d0, instr = 0x516a5d0}}, exact = false, shader = 0x5125820, impl = 0x51220d0}, 
  fail_jump = {{__jmpbuf = {37206293, -845490108815733866, 85082064, 140737488326208, 63337960, 63337960, 845490111369350038, 845546221558336406}, 
      __mask_was_saved = 0, __saved_mask = {__val = {0 <repeats 16 times>}}}}, spirv = 0x3c675e8, shader = 0x5125820, options = 0x7fffffff8c80, block = 0x0, 
  spirv_offset = 1556, file = 0x0, line = -1, col = -1, const_table = 0x5122830, phi_table = 0x5122950, num_specializations = 0, specializations = 0x0, 
  value_id_bound = 24916, values = 0x516e490, entry_point_stage = MESA_SHADER_FRAGMENT, entry_point_name = 0x237b915 "main", entry_point = 0x51a5968, 
  origin_upper_left = true, pixel_center_integer = false, func = 0x0, functions = {head_sentinel = {next = 0x5121f10, prev = 0x0}, tail_sentinel = {next = 0x0, 
      prev = 0x5121f10}}, func_param_idx = 0, has_loop_continue = false}

(gdb) c
Continuing.

Thread 1 "Talos" received signal SIGSEGV, Segmentation fault.
anv_shader_compile_to_nir (pipeline=0x5123fd0, pipeline=0x5123fd0, spec_info=0x0, stage=MESA_SHADER_FRAGMENT, entrypoint_name=0x7fffffff8e40 "", 
    module=0x3c675d0, mem_ctx=0x3c674b0) at ../../../src/intel/vulkan/anv_pipeline.c:153
153	   nir_shader *nir = entry_point->shader;
----------------------------------------------
Comment 5 Jason Ekstrand 2017-12-15 03:56:47 UTC
There's a patch on the list to fix this:

https://patchwork.freedesktop.org/patch/193453/
Comment 6 Jason Ekstrand 2017-12-18 17:50:27 UTC
This is fixed by the following commit:

commit 3be382cd7cb637f463a4618dc19d87d66a644b0e
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Thu Dec 14 19:53:05 2017 -0800

    spirv: Relax the validation conditions of OpSelect
    
    The Talos Principle contains shaders with an OpSelect between two
    vectors where the condition is a scalar boolean.  This is technically
    against the spec bout nir_builder gracefully handles it by splatting
    out the condition to all the channels.  So long as the condition is a
    boolean, just emit a warning instead of failing.
    
    Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104246
Comment 7 Eero Tamminen 2017-12-19 13:24:03 UTC
Verified with Mesa git tip.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.