Setup: - KBL GT3e - Ubuntu 16.04 - Mesa git version - Latest Talos Principle available from Steam downloaded - Steam game launch options set to use Vulkan: "%command% +gfxStrAPI VLK" - Talos Gfx options set to high GPU speed Test-case: - Start Talos Principle Expected outcome: - Talos starts, like with Mesa commit "mesa-17.3.0" Actual outcome: - Talos Principle segfaults before showing anything Crash is because of NULL pointer access in spirv->nir fragment shader compilation: --------------------------------------------------------- Thread 1 "Talos" received signal SIGSEGV, Segmentation fault. anv_shader_compile_to_nir (pipeline=0x5142730, pipeline=0x5142730, spec_info=0x0, stage=MESA_SHADER_FRAGMENT, entrypoint_name=0x7fffffff90d0 "", module=0x3c69600, mem_ctx=0x37a8170) at ../../../src/intel/vulkan/anv_pipeline.c:153 153 nir_shader *nir = entry_point->shader; (gdb) bt #0 anv_shader_compile_to_nir (pipeline=0x5142730, pipeline=0x5142730, spec_info=0x0, stage=MESA_SHADER_FRAGMENT, entrypoint_name=0x7fffffff90d0 "", module=0x3c69600, mem_ctx=0x37a8170) at ../../../src/intel/vulkan/anv_pipeline.c:153 #1 anv_pipeline_compile (pipeline=pipeline@entry=0x5142730, mem_ctx=mem_ctx@entry=0x37a8170, module=module@entry=0x3c69600, entrypoint=entrypoint@entry=0x237b915 "main", stage=stage@entry=MESA_SHADER_FRAGMENT, spec_info=spec_info@entry=0x0, prog_data=0x7fffffff90d0, map=0x7fffffff8ff0) at ../../../src/intel/vulkan/anv_pipeline.c:395 #2 0x00007fffe6056162 in anv_pipeline_compile_fs (pipeline=pipeline@entry=0x5142730, cache=cache@entry=0x3923c20, info=info@entry=0x7fffecabf8f0, module=module@entry=0x3c69600, entrypoint=0x237b915 "main", spec_info=0x0) at ../../../src/intel/vulkan/anv_pipeline.c:871 #3 0x00007fffe605793e in anv_pipeline_init (pipeline=pipeline@entry=0x5142730, device=device@entry=0x3c059c0, cache=cache@entry=0x3923c20, pCreateInfo=pCreateInfo@entry=0x7fffecabf8f0, alloc=0x3c059c8, alloc@entry=0x0) at ../../../src/intel/vulkan/anv_pipeline.c:1347 #4 0x00007fffe61f28cf in gen9_graphics_pipeline_create (pPipeline=0x7fffffffcd80, pAllocator=0x0, pCreateInfo=0x7fffecabf8f0, cache=0x3923c20, _device=0x3c059c0) at ../../../src/intel/vulkan/genX_pipeline.c:1661 #5 gen9_CreateGraphicsPipelines (_device=0x3c059c0, pipelineCache=0x3923c20, count=1, pCreateInfos=<optimized out>, pAllocator=0x0, pPipelines=0x7fffffffcd80) at ../../../src/intel/vulkan/genX_pipeline.c:1864 (gdb) list anv_shader_compile_to_nir ... 149 nir_function *entry_point = 150 spirv_to_nir(spirv, module->size / 4, 151 spec_entries, num_spec_entries, 152 stage, entrypoint_name, &spirv_options, nir_options); 153 nir_shader *nir = entry_point->shader; (gdb) disassemble Dump of assembler code for function anv_pipeline_compile: ... 0x00007fffe6055a50 <+256>: callq 0x7fffe63fa130 <spirv_to_nir> => 0x00007fffe6055a55 <+261>: mov 0x18(%rax),%rbx 0x00007fffe6055a59 <+265>: mov 0x20(%rsp),%rdi (gdb) info registers rax rbx rax 0x0 0 rbx 0x0 0 --------------------------------------------------------- In case it matters, here are variable values & struct contents: --------------------------------------------------------- (gdb) info locals device = <optimized out> spec_entries = 0x0 spirv_options = {lower_workgroup_access_to_offsets = true, caps = {float64 = true, image_ms_array = false, tessellation = true, draw_parameters = true, image_read_without_format = false, image_write_without_format = true, int64 = true, multiview = true, variable_pointers = true, storage_16bit = true}, debug = {func = 0x0, private_data = 0x0}} entry_point = <optimized out> nir = <optimized out> compiler = 0x39d2330 nir_options = 0x7fffe644afc0 <scalar_nir_options> spirv = 0x3c69618 num_spec_entries = 0 (gdb) print *module $7 = {sha1 = "Y%cewe\242\022\065\064\225\t\354ͥ\222\222A\333 ", size = 1664, data = 0x3c69618 "\003\002#\a"} (gdb) print *nir_options $1 = {lower_fdiv = true, lower_ffma = false, fuse_ffma = false, lower_flrp32 = false, lower_flrp64 = true, lower_fpow = false, lower_fsat = false, lower_fsqrt = false, lower_fmod32 = true, lower_fmod64 = false, lower_bitfield_extract = true, lower_bitfield_insert = true, lower_uadd_carry = true, lower_usub_borrow = true, lower_negate = false, lower_sub = true, lower_scmp = true, lower_idiv = false, fdot_replicates = false, lower_ffract = false, lower_pack_half_2x16 = true, lower_pack_unorm_2x16 = true, lower_pack_snorm_2x16 = true, lower_pack_unorm_4x8 = true, lower_pack_snorm_4x8 = true, lower_unpack_half_2x16 = true, lower_unpack_unorm_2x16 = true, lower_unpack_snorm_2x16 = true, lower_unpack_unorm_4x8 = true, lower_unpack_snorm_4x8 = true, lower_extract_byte = false, lower_extract_word = false, native_integers = true, vertex_id_zero_based = true, lower_cs_local_index_from_id = false, use_interpolated_input_intrinsics = true, max_unroll_iterations = 32} --------------------------------------------------------- Debug output I got by prefixing launch options with: gdbserver 127.0.0.1:1234 And in another terminal doing: (gdb) target remote :1234
Manual git bisect gave following as the first bad commit: --------------------------------------------------------- commit a7c2be9944a9e2028a02fcfbab501891293401b1 Author: Jason Ekstrand <jason.ekstrand@intel.com> AuthorDate: Wed Dec 6 09:14:20 2017 -0800 Commit: Jason Ekstrand <jason.ekstrand@intel.com> CommitDate: Mon Dec 11 22:28:34 2017 -0800 spirv: Add type validation for OpSelect Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> ---------------------------------------------------------
FYI: Ever since 94ca8e04adf681b0cad6ade1c9f28856efe35ae6, most SPIR-V errors result in spirv_to_nir bailing cleanly and returning a NULL. anv_pipeline_compile_to_nir then dereferences the NULL and crashes. A more useful backtrace would be if you set a breakpoint on _vtn_fail and gave me the backtrace from there. Arguably, it may be better to add an abort() to _vtn_fail when built in debug mode because the NULL dereference is kind-of mean.
My dev SSD was completely corrupted this morning (fsck.ext4 has been listing inodes with issues for the last half an hour). -> I won't be able to provide better backtrace before next year (without working setup it would take too much time, so other stuff than Steam gets more priority until that). :-/
Miraculously, the SSD got into fully working condition eventually (never happened to me before, with that much errors from fsck). Here's the backtrace you asked: ---------------------------------------------- Thread 1 "Talos" hit Breakpoint 1, _vtn_fail (b=b@entry=0x5169c60, file=file@entry=0x7fffe6828cc0 "../../../src/compiler/spirv/spirv_to_nir.c", line=line@entry=3517, fmt=fmt@entry=0x7fffe6829980 "Condition type of OpSelect must be a scalar or vector of Boolean type. It must have the same number of components as Result Type") at ../../../src/compiler/spirv/spirv_to_nir.c:112 112 { (gdb) bt #0 _vtn_fail (b=b@entry=0x5169c60, file=file@entry=0x7fffe6828cc0 "../../../src/compiler/spirv/spirv_to_nir.c", line=line@entry=3517, fmt=fmt@entry=0x7fffe6829980 "Condition type of OpSelect must be a scalar or vector of Boolean type. It must have the same number of components as Result Type") at ../../../src/compiler/spirv/spirv_to_nir.c:112 #1 0x00007fffe67b5f0c in vtn_handle_body_instruction (b=0x5169c60, opcode=<optimized out>, w=0x3c67bfc, count=<optimized out>) at ../../../src/compiler/spirv/spirv_to_nir.c:3514 #2 0x00007fffe67ae7a6 in vtn_foreach_instruction (b=b@entry=0x5169c60, start=<optimized out>, end=end@entry=0x3c67c60, handler=handler@entry=0x7fffe67b4a00 <vtn_handle_body_instruction>) at ../../../src/compiler/spirv/spirv_to_nir.c:323 #3 0x00007fffe67c31e1 in vtn_emit_cf_list (b=b@entry=0x5169c60, cf_list=cf_list@entry=0x5121f38, switch_fall_var=switch_fall_var@entry=0x0, has_switch_break=has_switch_break@entry=0x0, handler=handler@entry=0x7fffe67b4a00 <vtn_handle_body_instruction>) at ../../../src/compiler/spirv/vtn_cfg.c:703 #4 0x00007fffe67c3562 in vtn_function_emit (b=b@entry=0x5169c60, func=func@entry=0x5121f10, instruction_handler=instruction_handler@entry=0x7fffe67b4a00 <vtn_handle_body_instruction>) at ../../../src/compiler/spirv/vtn_cfg.c:878 #5 0x00007fffe67b6394 in spirv_to_nir (words=<optimized out>, words@entry=0x3c675e8, word_count=416, spec=spec@entry=0x0, num_spec=num_spec@entry=0, stage=stage@entry=MESA_SHADER_FRAGMENT, entry_point_name=<optimized out>, options=0x7fffffff8c80, nir_options=0x7fffe6806fc0 <scalar_nir_options>) at ../../../src/compiler/spirv/spirv_to_nir.c:3742 #6 0x00007fffe6411a55 in anv_shader_compile_to_nir (pipeline=0x5123fd0, pipeline=0x5123fd0, spec_info=0x0, stage=MESA_SHADER_FRAGMENT, entrypoint_name=0x7fffffff8e40 "", module=0x3c675d0, mem_ctx=0x3c674b0) at ../../../src/intel/vulkan/anv_pipeline.c:149 #7 anv_pipeline_compile (pipeline=pipeline@entry=0x5123fd0, mem_ctx=mem_ctx@entry=0x3c674b0, module=module@entry=0x3c675d0, entrypoint=entrypoint@entry=0x237b915 "main", stage=stage@entry=MESA_SHADER_FRAGMENT, spec_info=spec_info@entry=0x0, prog_data=0x7fffffff8e40, map=0x7fffffff8d60) at ../../../src/intel/vulkan/anv_pipeline.c:395 #8 0x00007fffe6412162 in anv_pipeline_compile_fs (pipeline=pipeline@entry=0x5123fd0, cache=cache@entry=0x3af2090, info=info@entry=0x7fffec7ac9b0, module=module@entry=0x3c675d0, entrypoint=0x237b915 "main", spec_info=0x0) at ../../../src/intel/vulkan/anv_pipeline.c:871 #9 0x00007fffe641393e in anv_pipeline_init (pipeline=pipeline@entry=0x5123fd0, device=device@entry=0x3bfde10, cache=cache@entry=0x3af2090, pCreateInfo=pCreateInfo@entry=0x7fffec7ac9b0, alloc=0x3bfde18, alloc@entry=0x0) at ../../../src/intel/vulkan/anv_pipeline.c:1347 #10 0x00007fffe65ae8cf in gen9_graphics_pipeline_create (pPipeline=0x7fffffffcaf0, pAllocator=0x0, pCreateInfo=0x7fffec7ac9b0, cache=0x3af2090, _device=0x3bfde10) at ../../../src/intel/vulkan/genX_pipeline.c:1661 #11 gen9_CreateGraphicsPipelines (_device=0x3bfde10, pipelineCache=0x3af2090, count=1, pCreateInfos=<optimized out>, pAllocator=0x0, pPipelines=0x7fffffffcaf0) at ../../../src/intel/vulkan/genX_pipeline.c:1864 (gdb) up #1 0x00007fffe67b5f0c in vtn_handle_body_instruction (b=0x5169c60, opcode=<optimized out>, w=0x3c67bfc, count=<optimized out>) at ../../../src/compiler/spirv/spirv_to_nir.c:3514 3514 vtn_fail_if(sel_val->type->type != sel_type, (gdb) info locals ssa = <optimized out> sel_type = <optimized out> res_type = <optimized out> (gdb) print *b $1 = {nb = {cursor = {option = nir_cursor_after_instr, {block = 0x516a5d0, instr = 0x516a5d0}}, exact = false, shader = 0x5125820, impl = 0x51220d0}, fail_jump = {{__jmpbuf = {37206293, -845490108815733866, 85082064, 140737488326208, 63337960, 63337960, 845490111369350038, 845546221558336406}, __mask_was_saved = 0, __saved_mask = {__val = {0 <repeats 16 times>}}}}, spirv = 0x3c675e8, shader = 0x5125820, options = 0x7fffffff8c80, block = 0x0, spirv_offset = 1556, file = 0x0, line = -1, col = -1, const_table = 0x5122830, phi_table = 0x5122950, num_specializations = 0, specializations = 0x0, value_id_bound = 24916, values = 0x516e490, entry_point_stage = MESA_SHADER_FRAGMENT, entry_point_name = 0x237b915 "main", entry_point = 0x51a5968, origin_upper_left = true, pixel_center_integer = false, func = 0x0, functions = {head_sentinel = {next = 0x5121f10, prev = 0x0}, tail_sentinel = {next = 0x0, prev = 0x5121f10}}, func_param_idx = 0, has_loop_continue = false} (gdb) c Continuing. Thread 1 "Talos" received signal SIGSEGV, Segmentation fault. anv_shader_compile_to_nir (pipeline=0x5123fd0, pipeline=0x5123fd0, spec_info=0x0, stage=MESA_SHADER_FRAGMENT, entrypoint_name=0x7fffffff8e40 "", module=0x3c675d0, mem_ctx=0x3c674b0) at ../../../src/intel/vulkan/anv_pipeline.c:153 153 nir_shader *nir = entry_point->shader; ----------------------------------------------
There's a patch on the list to fix this: https://patchwork.freedesktop.org/patch/193453/
This is fixed by the following commit: commit 3be382cd7cb637f463a4618dc19d87d66a644b0e Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Thu Dec 14 19:53:05 2017 -0800 spirv: Relax the validation conditions of OpSelect The Talos Principle contains shaders with an OpSelect between two vectors where the condition is a scalar boolean. This is technically against the spec bout nir_builder gracefully handles it by splatting out the condition to all the channels. So long as the condition is a boolean, just emit a warning instead of failing. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104246
Verified with Mesa git tip.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.