Bug 104338

Summary: NULL pointer access crash on Sacha Willems' Vulkan raytracing demo after "spirv: Add basic type validation for OpLoad, OpStore, and OpCopyMemory"
Product: Mesa Reporter: Eero Tamminen <eero.t.tamminen>
Component: Drivers/DRI/i965Assignee: Jason Ekstrand <jason>
Status: VERIFIED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=104213
https://github.com/SaschaWillems/Vulkan/issues/345
https://bugs.freedesktop.org/show_bug.cgi?id=99507
Whiteboard:
i915 platform: i915 features:
Attachments: attachment-4956-0.html

Description Eero Tamminen 2017-12-19 15:21:09 UTC
Mesa has started to segfault to NULL pointer access during Sacha Willems' "raytracing" demo compute shader compilation.  I'm not seeing that with other cases, like happened with bug 104213.

Bisecting points this as the commit where these started:
--------------------------------------------------------
commit 6737b1b859aadad64e5fe04a92d196a672413e06
Author:     Jason Ekstrand <jason.ekstrand@intel.com>
AuthorDate: Tue Dec 5 22:51:53 2017 -0800
Commit:     Jason Ekstrand <jason.ekstrand@intel.com>
CommitDate: Mon Dec 11 22:28:34 2017 -0800

    spirv: Add basic type validation for OpLoad, OpStore, and OpCopyMemory
    
    Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
--------------------------------------------------------

The crash is due to OpStore validation:
--------------------------------------------------------
(gdb) break _vtn_fail
Function "_vtn_fail" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_vtn_fail) pending.

(gdb) run
Starting program: raytracing 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, _vtn_fail (b=0x8a3b70, file=file@entry=0x7ffff6249f10 "../../../src/compiler/spirv/vtn_variables.c", line=line@entry=2009, 
    fmt=fmt@entry=0x7ffff624a7c8 "Value and pointer types of OpStore do not match") at ../../../src/compiler/spirv/spirv_to_nir.c:112

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
anv_shader_compile_to_nir (pipeline=0x8a2e60, pipeline=0x8a2e60, spec_info=0x6bdd10, stage=MESA_SHADER_COMPUTE, entrypoint_name=0x7fffffffb3c0 "", 
    module=0x8a7dd0, mem_ctx=0x879c40) at ../../../src/intel/vulkan/anv_pipeline.c:153
153	   nir_shader *nir = entry_point->shader;
(gdb) bt
#0  anv_shader_compile_to_nir (pipeline=0x8a2e60, pipeline=0x8a2e60, spec_info=0x6bdd10, stage=MESA_SHADER_COMPUTE, entrypoint_name=0x7fffffffb3c0 "", 
    module=0x8a7dd0, mem_ctx=0x879c40) at ../../../src/intel/vulkan/anv_pipeline.c:153
#1  anv_pipeline_compile (pipeline=pipeline@entry=0x8a2e60, mem_ctx=mem_ctx@entry=0x879c40, module=module@entry=0x8a7dd0, 
    entrypoint=entrypoint@entry=0x45e14a "main", stage=stage@entry=MESA_SHADER_COMPUTE, spec_info=spec_info@entry=0x0, prog_data=0x7fffffffb3c0, 
    map=0x7fffffffb310) at ../../../src/intel/vulkan/anv_pipeline.c:395
#2  0x00007ffff5e323cc in anv_pipeline_compile_cs (pipeline=pipeline@entry=0x8a2e60, cache=cache@entry=0x8792a0, info=info@entry=0x7fffffffe4d0, 
    module=0x8a7dd0, entrypoint=0x45e14a "main", spec_info=0x0) at ../../../src/intel/vulkan/anv_pipeline.c:1019
#3  0x00007ffff5fbfe27 in compute_pipeline_create (_device=_device@entry=0x868c00, cache=cache@entry=0x8792a0, pCreateInfo=pCreateInfo@entry=0x7fffffffe4d0, 
    pAllocator=pAllocator@entry=0x0, pPipeline=pPipeline@entry=0x696890) at ../../../src/intel/vulkan/genX_pipeline.c:1770
#4  0x00007ffff5fd2916 in gen9_CreateComputePipelines (_device=0x868c00, pipelineCache=0x8792a0, count=1, pCreateInfos=<optimized out>, pAllocator=0x0, 
    pPipelines=0x696890) at ../../../src/intel/vulkan/genX_pipeline.c:1895
#5  0x00007ffff798ec65 in vkCreateComputePipelines () from VulkanTools/build/loader/libvulkan.so.1
#6  0x00000000004387c8 in VulkanExample::prepareCompute() ()
#7  0x00000000004393f9 in VulkanExample::prepare() ()
#8  0x0000000000432f92 in main ()

(gdb) info locals
device = <optimized out>
spec_entries = 0x0
spirv_options = {lower_workgroup_access_to_offsets = true, caps = {float64 = true, image_ms_array = false, tessellation = true, draw_parameters = true, 
    image_read_without_format = false, image_write_without_format = true, int64 = true, multiview = true, variable_pointers = true, storage_16bit = true}, 
  debug = {func = 0x0, private_data = 0x0}}
entry_point = <optimized out>
nir = <optimized out>
compiler = 0x6bdd10
nir_options = 0x7ffff6226000 <scalar_nir_options>
spirv = 0x8a7de8
num_spec_entries = 0

(gdb) disassemble 
Dump of assembler code for function anv_pipeline_compile:
...
   0x00007ffff5e30a50 <+256>:	callq  0x7ffff61d5170 <spirv_to_nir>
=> 0x00007ffff5e30a55 <+261>:	mov    0x18(%rax),%rbx
   0x00007ffff5e30a59 <+265>:	mov    0x20(%rsp),%rdi

(gdb) info registers rax rbx
rax            0x0	0
rbx            0x0	0
--------------------------------------------------------

Does this check need also relaxing?
Comment 1 Eero Tamminen 2018-01-02 11:08:55 UTC
(In reply to Eero Tamminen from comment #0)
> Does this check need also relaxing?

The reason why ask about this is because of this comment:
"we've seen glslang (even the latest from master) generating invalid SPIR-V code for your Raytracing demo, it gets a type conversion wrong on the Sphere type"

in the Raytracing demo bug:
  https://github.com/SaschaWillems/Vulkan/issues/345
Comment 2 Jason Ekstrand 2018-01-02 15:01:20 UTC
Created attachment 136494 [details]
attachment-4956-0.html

Yeah, I've run into similar issues with DOOM. I sent a patch last night 
which should help with it.  I'm actually planning to revamp it a bit and 
resend.


On January 2, 2018 05:09:25 bugzilla-daemon@freedesktop.org wrote:

> https://bugs.freedesktop.org/show_bug.cgi?id=104338
>
> --- Comment #1 from Eero Tamminen <eero.t.tamminen@intel.com> ---
> (In reply to Eero Tamminen from comment #0)
>> Does this check need also relaxing?
>
> The reason why ask about this is because of this comment:
> "we've seen glslang (even the latest from master) generating invalid SPIR-V
> code for your Raytracing demo, it gets a type conversion wrong on the Sphere
> type"
>
> in the Raytracing demo bug:
>   https://github.com/SaschaWillems/Vulkan/issues/345
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.
Comment 3 Jason Ekstrand 2018-01-08 23:16:51 UTC
This should be fixed by the following commit:

commit 154668e79c4556ba0eda4751d6a14a45b9242a90
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Mon Jan 1 20:00:02 2018 -0800

    spirv: Loosen the validation for load/store type matching
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104338
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104424
    Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
    Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Comment 4 Eero Tamminen 2018-01-09 09:25:39 UTC
Verified. Raytracing demo works fine with Mesa git head, and didn't with ~12h older commit.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.