| Summary: | GfxBench AztecRuins Vulkan version Segfault | ||
|---|---|---|---|
| Product: | Mesa | Reporter: | Eero Tamminen <eero.t.tamminen> |
| Component: | Drivers/Vulkan/intel | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
| Status: | VERIFIED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
| Severity: | normal | ||
| Priority: | medium | CC: | jason |
| Version: | git | Keywords: | regression |
| Hardware: | Other | ||
| OS: | All | ||
| Whiteboard: | |||
| i915 platform: | i915 features: | ||
Gdb tells:
--------------------------------------
(gdb) bt
#0 try_lower_tex_ycbcr (tex=0x7fffef8e2430, builder=0x7ffff3327b70, layout=0x7fffef8249c0)
at src/intel/vulkan/anv_nir_lower_ycbcr_textures.c:328
#1 anv_nir_lower_ycbcr_textures (shader=shader@entry=0x7fffee3037e0,
layout=layout@entry=0x7fffef8249c0)
at src/intel/vulkan/anv_nir_lower_ycbcr_textures.c:464
#2 0x00007ffff23d7730 in anv_pipeline_lower_nir (pipeline=pipeline@entry=0x7fffef005620,
mem_ctx=mem_ctx@entry=0x7fffef133700, stage=stage@entry=0x7ffff33313b0,
layout=layout@entry=0x7fffef8249c0)
at src/intel/vulkan/anv_pipeline.c:500
#3 0x00007ffff23d9fef in anv_pipeline_compile_graphics (pipeline=pipeline@entry=0x7fffef005620,
cache=cache@entry=0x7fffef0d1300, info=info@entry=0x7ffff33371d0)
at src/intel/vulkan/anv_pipeline.c:1034
#4 0x00007ffff23db7bc in anv_pipeline_init (pipeline=pipeline@entry=0x7fffef005620,
device=device@entry=0x7fffec37f460, cache=cache@entry=0x7fffef0d1300,
pCreateInfo=pCreateInfo@entry=0x7ffff33371d0, alloc=0x7fffec37f468, alloc@entry=0x0)
at src/intel/vulkan/anv_pipeline.c:1463
#5 0x00007ffff24fa676 in gen9_graphics_pipeline_create (pPipeline=0x7fffee3562d0, pAllocator=0x0,
pCreateInfo=0x7ffff33371d0, cache=0x7fffef0d1300, _device=0x7fffec37f460)
at src/intel/vulkan/genX_pipeline.c:1787
#6 gen9_CreateGraphicsPipelines (_device=0x7fffec37f460, pipelineCache=0x7fffef0d1300, count=1,
pCreateInfos=<optimized out>, pAllocator=0x0, pPipelines=0x7fffee3562d0)
at src/intel/vulkan/genX_pipeline.c:2001
(gdb) disassemble try_lower_tex_ycbcr
0x00007ffff23d1549 <+281>: mov 0x80(%r14),%ecx
0x00007ffff23d1550 <+288>: mov 0x78(%r14),%rsi
323:
0x00007ffff23d1581 <+337>: lea 0x44f7f8(%rip),%rcx
0x00007ffff23d1588 <+344>: lea 0x44f489(%rip),%rsi
0x00007ffff23d158f <+351>: lea 0x44f52a(%rip),%rdi
0x00007ffff23d1596 <+358>: mov $0x143,%edx
0x00007ffff23d159b <+363>: callq 0x7ffff239bbf0 <__assert_fail@plt>
0x00007ffff23d15a0 <+368>: test %eax,%eax
0x00007ffff23d15a2 <+370>: js 0x7ffff23d1581 <anv_nir_lower_ycbcr_textures+337>
0x00007ffff23d15a4 <+372>: cltq
0x00007ffff23d15a6 <+374>: shl $0x6,%rax
0x00007ffff23d15aa <+378>: add %rax,%rsi
324-327:
0x00007ffff23d162c <+508>: movslq 0x3c(%rdx),%rax
0x00007ffff23d163c <+524>: shl $0x4,%rax
328:
=> 0x00007ffff23d2458 <+4136>: mov 0x3c,%eax
0x00007ffff23d245f <+4143>: ud2
0x00007ffff23d2461 <+4145>: nopl 0x0(%rax)
(gdb) info registers rdx rax eax
rdx 0x7fffef424aa0 140737207487136
rax 0x5 5
eax 0x5 5
(gdb) up
#1 anv_nir_lower_ycbcr_textures (shader=shader@entry=0x7fffee3037e0,
layout=layout@entry=0x7fffef8249c0)
at src/intel/vulkan/anv_nir_lower_ycbcr_textures.c:464
464 in src/intel/vulkan/anv_nir_lower_ycbcr_textures.c
(gdb) info locals
tex = <optimized out>
instr = 0x7fffef8e2430
__next = 0x7fffeef001c0
block = <optimized out>
function_progress = false
builder = {cursor = {option = nir_cursor_before_block, {block = 0x0, instr = 0x0}}, exact = false,
shader = 0x7fffee3037e0, impl = 0x7fffef919690}
function = <optimized out>
progress = <optimized out>
(gdb) print *instr
$2 = {node = {next = 0x7fffeef001c0, prev = 0x7fffeef00090}, block = 0x7fffef92d170,
type = nir_instr_type_tex, pass_flags = 1 '\001', index = 0}
(gdb) print *__next
$3 = {node = {next = 0x7fffeef002f0, prev = 0x7fffef8e2430}, block = 0x7fffef92d170,
type = nir_instr_type_alu, pass_flags = 1 '\001', index = 0}
--------------------------------------
Source code corresponding to try_lower_tex_ycbcr disassembly around the crash is:
--------------------------------------
322: int deref_src_idx = nir_tex_instr_src_index(tex, nir_tex_src_texture_deref);
323: assert(deref_src_idx >= 0);
324: nir_deref_instr *deref = nir_src_as_deref(tex->src[deref_src_idx].src);
326: nir_variable *var = nir_deref_instr_get_variable(deref);
327: const struct anv_descriptor_set_layout *set_layout =
328: layout->set[var->data.descriptor_set].layout;
--------------------------------------
To me it seems that deref got invalid address.
Problem here is that there is a cast between texture instruction and the variable : vec1 32 ssa_72 = deref_var &shadow_map (uniform sampler2DShadow) vec1 32 ssa_73 = deref_cast (sampler2DShadow *)ssa_72 (function sampler2DShadow) vec2 32 ssa_74 = vec2 ssa_57, ssa_58 vec1 32 ssa_75 = tex ssa_73 (texture_deref), ssa_73 (sampler_deref), ssa_74 (coord), ssa_59 (comparator), But the nir_deref_instr_get_variable() will stop on a cast and return NULL. Maybe Jason has more background on what to do here? I am getting so tired of maintaining this workaround.... In any case, patches are now on the list that work around it some more: https://patchwork.freedesktop.org/series/55099/ Fixed by the following commit in master: commit b57c1ec4219f01bfdb98bcab8fca4c44e87bd1a4 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Fri Jan 11 14:17:24 2019 -0600 spirv: Whack sampler/image pointers to uniform A long time in a galaxy far far away, there was a GLSLang bug with how it handled samplers passed in as function parameters. (The bug can be found here: https://github.com/KhronosGroup/glslang/issues/179.) Unfortunately, that version was shipped in several apps and has been causing heartburn for our SPIR-V parser ever since. Recent changes to NIR uncovered a moderately old bug in how we work around this issue. In particular, we ended up with a deref_cast from uniform to local which is not a no-op cast so nir_opt_deref wasn't getting rid of the cast. The only reason why it worked before was because someone just happened to call nir_fixup_deref_modes which "fixed" the cast (that shouldn't be happening) and then a later round of copy-prop would get rid of it. The fact that the deref_cast survived that long without causing trouble for other parts of NIR is a bit surprising. Just whacking the mode of the pointer seems to fix it fairly unobtrusively. Currently, only apps with this bug will have a local variable containing an image or sampler. Verified, works fine on all Vulkan platforms again. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Setup: * Ubuntu 18.04 * Latest Mesa from git * Latest X or Weston from git * drm-tip git kernel v4.20 or newer * GfxBench v5 (X or Wayland build, GOLD2 or GOLD3 version) Test-case: * Run AztecRuins (normal or high) Vulkan version: bin/testfw_app --gfx vulkan --gl_api vulkan --width 1920 --height 1080 --fullscreen 1 --test_id vulkan_5_normal Result: * Segfault when run from Gdb (abort when run without as GfxBench catches them and aborts): -------------------------------------- ... compile light_directional_cube.frag... done SPIR-V WARNING: In file ../../../../../source2/_repos/git___anongit_freedesktop_org_git_mesa_mesa/src/compiler/spirv/vtn_variables.c:2258 OpStore of a sampler detected. Doing on-the-fly copy propagation to workaround the problem. 9408 bytes into the SPIR-V binary Thread 2 "testfw_app" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff3339700 (LWP 10251)] -------------------------------------- This doesn't happen with GL or GLES versions, only Vulkan. This regression started between following commits: * 2019-01-07 17:07:33 0cc01f45e7: glsl: correct typo in GLSL compilation error message * 2019-01-08 17:51:46 d0c6ef2793: nir: rename global/local to private/function memory Valgrind says following: -------------------------------------- SPIR-V WARNING: In file src/compiler/spirv/vtn_variables.c:2258 OpStore of a sampler detected. Doing on-the-fly copy propagation to workaround the problem. 9408 bytes into the SPIR-V binary ==10457== Invalid read of size 4 ==10457== at 0xA54E458: try_lower_tex_ycbcr (anv_nir_lower_ycbcr_textures.c:328) ==10457== by 0xA54E458: anv_nir_lower_ycbcr_textures (anv_nir_lower_ycbcr_textures.c:464) ==10457== by 0xA55372F: anv_pipeline_lower_nir (anv_pipeline.c:500) ==10457== by 0xA555FEE: anv_pipeline_compile_graphics (anv_pipeline.c:1034) ==10457== by 0xA5577BB: anv_pipeline_init (anv_pipeline.c:1463) ==10457== by 0xA676675: gen9_graphics_pipeline_create (genX_pipeline.c:1787) ==10457== by 0xA676675: gen9_CreateGraphicsPipelines (genX_pipeline.c:2001) ==10457== by 0x917FCC: ??? (in bin/testfw_app) ==10457== Address 0x3c is not stack'd, malloc'd or (recently) free'd ==10457== ==10457== ==10457== Process terminating with default action of signal 6 (SIGABRT) ==10457== at 0x7239E97: raise (raise.c:51) ==10457== by 0x723B800: abort (abort.c:79) ==10457== by 0x6872B5: ??? (in bin/testfw_app) ==10457== by 0x623888F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.27.so) ==10457== by 0xA54E457: try_lower_tex_ycbcr (anv_nir_lower_ycbcr_textures.c:396) ==10457== by 0xA54E457: anv_nir_lower_ycbcr_textures (anv_nir_lower_ycbcr_textures.c:464) ==10457== by 0xA55372F: anv_pipeline_lower_nir (anv_pipeline.c:500) ==10457== by 0xA555FEE: anv_pipeline_compile_graphics (anv_pipeline.c:1034) ==10457== by 0xA5577BB: anv_pipeline_init (anv_pipeline.c:1463) ==10457== by 0xA676675: gen9_graphics_pipeline_create (genX_pipeline.c:1787) ==10457== by 0xA676675: gen9_CreateGraphicsPipelines (genX_pipeline.c:2001) --------------------------------------