Setup: * Ubuntu 18.04 * Latest Mesa from git * Latest X or Weston from git * drm-tip git kernel v4.20 or newer * GfxBench v5 (X or Wayland build, GOLD2 or GOLD3 version) Test-case: * Run AztecRuins (normal or high) Vulkan version: bin/testfw_app --gfx vulkan --gl_api vulkan --width 1920 --height 1080 --fullscreen 1 --test_id vulkan_5_normal Result: * Segfault when run from Gdb (abort when run without as GfxBench catches them and aborts): -------------------------------------- ... compile light_directional_cube.frag... done SPIR-V WARNING: In file ../../../../../source2/_repos/git___anongit_freedesktop_org_git_mesa_mesa/src/compiler/spirv/vtn_variables.c:2258 OpStore of a sampler detected. Doing on-the-fly copy propagation to workaround the problem. 9408 bytes into the SPIR-V binary Thread 2 "testfw_app" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff3339700 (LWP 10251)] -------------------------------------- This doesn't happen with GL or GLES versions, only Vulkan. This regression started between following commits: * 2019-01-07 17:07:33 0cc01f45e7: glsl: correct typo in GLSL compilation error message * 2019-01-08 17:51:46 d0c6ef2793: nir: rename global/local to private/function memory Valgrind says following: -------------------------------------- SPIR-V WARNING: In file src/compiler/spirv/vtn_variables.c:2258 OpStore of a sampler detected. Doing on-the-fly copy propagation to workaround the problem. 9408 bytes into the SPIR-V binary ==10457== Invalid read of size 4 ==10457== at 0xA54E458: try_lower_tex_ycbcr (anv_nir_lower_ycbcr_textures.c:328) ==10457== by 0xA54E458: anv_nir_lower_ycbcr_textures (anv_nir_lower_ycbcr_textures.c:464) ==10457== by 0xA55372F: anv_pipeline_lower_nir (anv_pipeline.c:500) ==10457== by 0xA555FEE: anv_pipeline_compile_graphics (anv_pipeline.c:1034) ==10457== by 0xA5577BB: anv_pipeline_init (anv_pipeline.c:1463) ==10457== by 0xA676675: gen9_graphics_pipeline_create (genX_pipeline.c:1787) ==10457== by 0xA676675: gen9_CreateGraphicsPipelines (genX_pipeline.c:2001) ==10457== by 0x917FCC: ??? (in bin/testfw_app) ==10457== Address 0x3c is not stack'd, malloc'd or (recently) free'd ==10457== ==10457== ==10457== Process terminating with default action of signal 6 (SIGABRT) ==10457== at 0x7239E97: raise (raise.c:51) ==10457== by 0x723B800: abort (abort.c:79) ==10457== by 0x6872B5: ??? (in bin/testfw_app) ==10457== by 0x623888F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.27.so) ==10457== by 0xA54E457: try_lower_tex_ycbcr (anv_nir_lower_ycbcr_textures.c:396) ==10457== by 0xA54E457: anv_nir_lower_ycbcr_textures (anv_nir_lower_ycbcr_textures.c:464) ==10457== by 0xA55372F: anv_pipeline_lower_nir (anv_pipeline.c:500) ==10457== by 0xA555FEE: anv_pipeline_compile_graphics (anv_pipeline.c:1034) ==10457== by 0xA5577BB: anv_pipeline_init (anv_pipeline.c:1463) ==10457== by 0xA676675: gen9_graphics_pipeline_create (genX_pipeline.c:1787) ==10457== by 0xA676675: gen9_CreateGraphicsPipelines (genX_pipeline.c:2001) --------------------------------------
Gdb tells: -------------------------------------- (gdb) bt #0 try_lower_tex_ycbcr (tex=0x7fffef8e2430, builder=0x7ffff3327b70, layout=0x7fffef8249c0) at src/intel/vulkan/anv_nir_lower_ycbcr_textures.c:328 #1 anv_nir_lower_ycbcr_textures (shader=shader@entry=0x7fffee3037e0, layout=layout@entry=0x7fffef8249c0) at src/intel/vulkan/anv_nir_lower_ycbcr_textures.c:464 #2 0x00007ffff23d7730 in anv_pipeline_lower_nir (pipeline=pipeline@entry=0x7fffef005620, mem_ctx=mem_ctx@entry=0x7fffef133700, stage=stage@entry=0x7ffff33313b0, layout=layout@entry=0x7fffef8249c0) at src/intel/vulkan/anv_pipeline.c:500 #3 0x00007ffff23d9fef in anv_pipeline_compile_graphics (pipeline=pipeline@entry=0x7fffef005620, cache=cache@entry=0x7fffef0d1300, info=info@entry=0x7ffff33371d0) at src/intel/vulkan/anv_pipeline.c:1034 #4 0x00007ffff23db7bc in anv_pipeline_init (pipeline=pipeline@entry=0x7fffef005620, device=device@entry=0x7fffec37f460, cache=cache@entry=0x7fffef0d1300, pCreateInfo=pCreateInfo@entry=0x7ffff33371d0, alloc=0x7fffec37f468, alloc@entry=0x0) at src/intel/vulkan/anv_pipeline.c:1463 #5 0x00007ffff24fa676 in gen9_graphics_pipeline_create (pPipeline=0x7fffee3562d0, pAllocator=0x0, pCreateInfo=0x7ffff33371d0, cache=0x7fffef0d1300, _device=0x7fffec37f460) at src/intel/vulkan/genX_pipeline.c:1787 #6 gen9_CreateGraphicsPipelines (_device=0x7fffec37f460, pipelineCache=0x7fffef0d1300, count=1, pCreateInfos=<optimized out>, pAllocator=0x0, pPipelines=0x7fffee3562d0) at src/intel/vulkan/genX_pipeline.c:2001 (gdb) disassemble try_lower_tex_ycbcr 0x00007ffff23d1549 <+281>: mov 0x80(%r14),%ecx 0x00007ffff23d1550 <+288>: mov 0x78(%r14),%rsi 323: 0x00007ffff23d1581 <+337>: lea 0x44f7f8(%rip),%rcx 0x00007ffff23d1588 <+344>: lea 0x44f489(%rip),%rsi 0x00007ffff23d158f <+351>: lea 0x44f52a(%rip),%rdi 0x00007ffff23d1596 <+358>: mov $0x143,%edx 0x00007ffff23d159b <+363>: callq 0x7ffff239bbf0 <__assert_fail@plt> 0x00007ffff23d15a0 <+368>: test %eax,%eax 0x00007ffff23d15a2 <+370>: js 0x7ffff23d1581 <anv_nir_lower_ycbcr_textures+337> 0x00007ffff23d15a4 <+372>: cltq 0x00007ffff23d15a6 <+374>: shl $0x6,%rax 0x00007ffff23d15aa <+378>: add %rax,%rsi 324-327: 0x00007ffff23d162c <+508>: movslq 0x3c(%rdx),%rax 0x00007ffff23d163c <+524>: shl $0x4,%rax 328: => 0x00007ffff23d2458 <+4136>: mov 0x3c,%eax 0x00007ffff23d245f <+4143>: ud2 0x00007ffff23d2461 <+4145>: nopl 0x0(%rax) (gdb) info registers rdx rax eax rdx 0x7fffef424aa0 140737207487136 rax 0x5 5 eax 0x5 5 (gdb) up #1 anv_nir_lower_ycbcr_textures (shader=shader@entry=0x7fffee3037e0, layout=layout@entry=0x7fffef8249c0) at src/intel/vulkan/anv_nir_lower_ycbcr_textures.c:464 464 in src/intel/vulkan/anv_nir_lower_ycbcr_textures.c (gdb) info locals tex = <optimized out> instr = 0x7fffef8e2430 __next = 0x7fffeef001c0 block = <optimized out> function_progress = false builder = {cursor = {option = nir_cursor_before_block, {block = 0x0, instr = 0x0}}, exact = false, shader = 0x7fffee3037e0, impl = 0x7fffef919690} function = <optimized out> progress = <optimized out> (gdb) print *instr $2 = {node = {next = 0x7fffeef001c0, prev = 0x7fffeef00090}, block = 0x7fffef92d170, type = nir_instr_type_tex, pass_flags = 1 '\001', index = 0} (gdb) print *__next $3 = {node = {next = 0x7fffeef002f0, prev = 0x7fffef8e2430}, block = 0x7fffef92d170, type = nir_instr_type_alu, pass_flags = 1 '\001', index = 0} -------------------------------------- Source code corresponding to try_lower_tex_ycbcr disassembly around the crash is: -------------------------------------- 322: int deref_src_idx = nir_tex_instr_src_index(tex, nir_tex_src_texture_deref); 323: assert(deref_src_idx >= 0); 324: nir_deref_instr *deref = nir_src_as_deref(tex->src[deref_src_idx].src); 326: nir_variable *var = nir_deref_instr_get_variable(deref); 327: const struct anv_descriptor_set_layout *set_layout = 328: layout->set[var->data.descriptor_set].layout; -------------------------------------- To me it seems that deref got invalid address.
Problem here is that there is a cast between texture instruction and the variable : vec1 32 ssa_72 = deref_var &shadow_map (uniform sampler2DShadow) vec1 32 ssa_73 = deref_cast (sampler2DShadow *)ssa_72 (function sampler2DShadow) vec2 32 ssa_74 = vec2 ssa_57, ssa_58 vec1 32 ssa_75 = tex ssa_73 (texture_deref), ssa_73 (sampler_deref), ssa_74 (coord), ssa_59 (comparator), But the nir_deref_instr_get_variable() will stop on a cast and return NULL. Maybe Jason has more background on what to do here?
I am getting so tired of maintaining this workaround.... In any case, patches are now on the list that work around it some more: https://patchwork.freedesktop.org/series/55099/
Fixed by the following commit in master: commit b57c1ec4219f01bfdb98bcab8fca4c44e87bd1a4 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Fri Jan 11 14:17:24 2019 -0600 spirv: Whack sampler/image pointers to uniform A long time in a galaxy far far away, there was a GLSLang bug with how it handled samplers passed in as function parameters. (The bug can be found here: https://github.com/KhronosGroup/glslang/issues/179.) Unfortunately, that version was shipped in several apps and has been causing heartburn for our SPIR-V parser ever since. Recent changes to NIR uncovered a moderately old bug in how we work around this issue. In particular, we ended up with a deref_cast from uniform to local which is not a no-op cast so nir_opt_deref wasn't getting rid of the cast. The only reason why it worked before was because someone just happened to call nir_fixup_deref_modes which "fixed" the cast (that shouldn't be happening) and then a later round of copy-prop would get rid of it. The fact that the deref_cast survived that long without causing trouble for other parts of NIR is a bit surprising. Just whacking the mode of the pointer seems to fix it fairly unobtrusively. Currently, only apps with this bug will have a local variable containing an image or sampler.
Verified, works fine on all Vulkan platforms again.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.