Bug 104213 - NULL pointer access crashes on compiling Vulkan compute shaders after "anv: Add support for the variablePointers feature"
Summary: NULL pointer access crashes on compiling Vulkan compute shaders after "anv: A...
Status: VERIFIED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Jason Ekstrand
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-11 15:50 UTC by Eero Tamminen
Modified: 2017-12-19 15:21 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Eero Tamminen 2017-12-11 15:50:02 UTC
Between following commits:
2017-12-05 13:47:04 UTC: 20d37da597 "Android: enable noreturn and returns_nonnull attributes"
2017-12-06 18:31:33 UTC: 31d403160f "meson: fix keyword argument in declare_dependency()"

Mesa has started to segfault to NULL pointer access during compute shader compilation.  This happens at least with Sacha Willems Compute N-body demo and GfxBench Aztec Ruins Vulkan version.

Crash is here:
-------------------------------------------------
Program received signal SIGSEGV, Segmentation fault.
anv_shader_compile_to_nir (pipeline=0x8e2570, pipeline=0x8e2570, spec_info=0x7073b0, stage=MESA_SHADER_COMPUTE, entrypoint_name=0x7fffffffb380 "", 
    module=0x8ee0a0, mem_ctx=0x823770) at src/intel/vulkan/anv_pipeline.c:153
153	   nir_shader *nir = entry_point->shader;
(gdb) bt
#0  anv_shader_compile_to_nir (pipeline=0x8e2570, pipeline=0x8e2570, spec_info=0x7073b0, stage=MESA_SHADER_COMPUTE, entrypoint_name=0x7fffffffb380 "", 
    module=0x8ee0a0, mem_ctx=0x823770) at src/intel/vulkan/anv_pipeline.c:153
#1  anv_pipeline_compile (pipeline=pipeline@entry=0x8e2570, mem_ctx=mem_ctx@entry=0x823770, module=module@entry=0x8ee0a0, 
    entrypoint=entrypoint@entry=0x498ae2 "main", stage=stage@entry=MESA_SHADER_COMPUTE, spec_info=spec_info@entry=0x7fffffffe410, prog_data=0x7fffffffb380, 
    map=0x7fffffffb2d0) at src/intel/vulkan/anv_pipeline.c:395
#2  0x00007ffff5e332ac in anv_pipeline_compile_cs (pipeline=pipeline@entry=0x8e2570, cache=cache@entry=0x826350, info=info@entry=0x7fffffffe4e0, 
    module=0x8ee0a0, entrypoint=0x498ae2 "main", spec_info=0x7fffffffe410) at src/intel/vulkan/anv_pipeline.c:994
#3  0x00007ffff5fc0d07 in compute_pipeline_create (_device=_device@entry=0x8b22e0, cache=cache@entry=0x826350, pCreateInfo=pCreateInfo@entry=0x7fffffffe4e0, 
    pAllocator=pAllocator@entry=0x0, pPipeline=pPipeline@entry=0x6dda30) at src/intel/vulkan/genX_pipeline.c:1770
#4  0x00007ffff5fd37f6 in gen9_CreateComputePipelines (_device=0x8b22e0, pipelineCache=0x826350, count=1, pCreateInfos=<optimized out>, pAllocator=0x0, 
    pPipelines=0x6dda30) at src/intel/vulkan/genX_pipeline.c:1895
#5  0x00007ffff7bb0c65 in vkCreateComputePipelines () from libvulkan.so.1
#6  0x0000000000462e8c in VulkanExample::prepareCompute() ()
#7  0x0000000000463f12 in VulkanExample::prepare() ()
#8  0x000000000044ec4a in main ()
(gdb) info locals
device = <optimized out>
spec_entries = 0x8db240
spirv_options = {lower_workgroup_access_to_offsets = true, caps = {float64 = true, image_ms_array = false, tessellation = true, draw_parameters = true, 
    image_read_without_format = false, image_write_without_format = true, int64 = true, multiview = true, variable_pointers = true, storage_16bit = true}, 
  debug = {func = 0x0, private_data = 0x0}}
entry_point = <optimized out>
nir = <optimized out>
compiler = 0x7073b0
nir_options = 0x7ffff6226580 <scalar_nir_options>
spirv = 0x8ee0b8
num_spec_entries = 4
(gdb) disassemble 
Dump of assembler code for function anv_pipeline_compile:
...
   0x00007ffff5e31a4b <+251>:	mov    0x18(%rsp),%rdx
   0x00007ffff5e31a50 <+256>:	callq  0x7ffff61d5e20 <spirv_to_nir>
=> 0x00007ffff5e31a55 <+261>:	mov    0x18(%rax),%rbx
   0x00007ffff5e31a59 <+265>:	mov    0x20(%rsp),%rdi
...
(gdb) info registers rax rbx
rax            0x0	0
rbx            0x4	4
-------------------------------------------------

Vulkan validation layers don't give any errors.
Comment 1 Eero Tamminen 2017-12-11 15:51:05 UTC
Valgrind shows few warnings before the crash, don't know whether they're relevant for the crash (will check more after bisecting):
-------------------------------------------------
==21860== Conditional jump or move depends on uninitialised value(s)
==21860==    at 0x6DDCB08: fs_visitor::bank_conflict_cycles(fs_inst const*) const (brw_fs_bank_conflicts.cpp:906)
==21860==    by 0x6E5981F: fs_instruction_scheduler::issue_time(backend_instruction*) (brw_schedule_instructions.cpp:1546)
==21860==    by 0x6E5F57B: instruction_scheduler::schedule_instructions(bblock_t*) (brw_schedule_instructions.cpp:1602)
==21860==    by 0x6E5F7B4: instruction_scheduler::run(cfg_t*) (brw_schedule_instructions.cpp:1704)
==21860==    by 0x6E5F9B0: fs_visitor::schedule_instructions(instruction_scheduler_mode) (brw_schedule_instructions.cpp:1730)
==21860==    by 0x6DFB8B5: fs_visitor::allocate_registers(unsigned int, bool) (brw_fs.cpp:6071)
==21860==    by 0x6DFE1BC: fs_visitor::run_fs(bool, bool) (brw_fs.cpp:6454)
==21860==    by 0x6DFF1CA: brw_compile_fs (brw_fs.cpp:6789)
==21860==    by 0x6EB170C: blorp_compile_fs (blorp.c:194)
==21860==    by 0x6EC179C: brw_blorp_get_blit_kernel (blorp_blit.c:1331)
==21860==    by 0x6EC179C: try_blorp_blit (blorp_blit.c:1859)
==21860==    by 0x6EC179C: do_blorp_blit (blorp_blit.c:2000)
==21860==    by 0x6EC4C12: blorp_copy (blorp_blit.c:2525)
==21860==    by 0x6B81E1E: copy_buffer_to_image (anv_blorp.c:391)
==21860== 
==21860== Conditional jump or move depends on uninitialised value(s)
==21860==    at 0x6D06DA4: cmd_buffer_subpass_transition_layouts (genX_cmd_buffer.c:2963)
==21860==    by 0x6D06E6A: gen9_cmd_buffer_set_subpass (genX_cmd_buffer.c:3106)
==21860==    by 0x6D1451D: gen9_CmdBeginRenderPass (genX_cmd_buffer.c:3145)
==21860==    by 0x4E6C01E: vkCmdBeginRenderPass (in /home/testrunner/work/VulkanTools/build/loader/libvulkan.so.1.0.65)
==21860==    by 0x483280: VulkanTextOverlay::updateCommandBuffers() (in computenbody)
==21860==    by 0x482FF9: VulkanTextOverlay::endTextUpdate() (in computenbody)
==21860==    by 0x477BE0: VulkanExampleBase::updateTextOverlay() (in computenbody)
==21860==    by 0x476FC8: VulkanExampleBase::prepare() (in computenbody)
==21860==    by 0x463EB1: VulkanExample::prepare() (in computenbody)
==21860==    by 0x44EC49: main (in computenbody)
-------------------------------------------------
Comment 2 Eero Tamminen 2017-12-11 16:11:49 UTC
"bank_conflict_cycles" cannot affect this, it came with later commit.

Bisect gave following as the commit causing segfaults:
-----------------------------------------------------------
commit 8761a04d0d9332d9c0c99164faf855fc3c741f7c
Author:     Jason Ekstrand <jason.ekstrand@intel.com>
AuthorDate: Wed Oct 18 18:02:49 2017 -0700
Commit:     Jason Ekstrand <jason.ekstrand@intel.com>
CommitDate: Tue Dec 5 22:01:54 2017 -0800

    anv: Add support for the variablePointers feature
    
    Not to be confused with variablePointersStorageBuffer which is the
    subset of VK_KHR_variable_pointers required to enable the extension.
    This means we now have "full" support for variable pointers.
    
    Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
-----------------------------------------------------------
Comment 3 Mark Janes 2017-12-11 18:19:28 UTC
Thanks for bisecting this Eero.  Please assign bugs to the author of the bisected commit, otherwise the devs may not notice.
Comment 4 Jason Ekstrand 2017-12-12 03:12:55 UTC
This is fixed by the following commit:

commit 24f019fd6911c360f32ebdb474375212706fff62
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Mon Dec 11 15:31:22 2017 -0800

    spirv: Allow ignoring decorations for workgroup variables
    
    Since we switched over to lowering SLM access directly in SPIR-V -> NIR,
    we no longer have vtn_variables for SLM.  It's all safe as with UBOs and
    SSBOs but we need to let it through in the assert.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104213
    Fixes: 8761a04d0d9332d9c0c99164faf855fc3c741f7c
    Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Comment 5 Eero Tamminen 2017-12-12 08:51:57 UTC
Verified, it fixes the crashes!

(The other Sacha Willems' Vulkan compute demos didn't crash, only the N-body one, I forgot to mention that earlier.)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.