Bug 103241

Summary: Anv crashes when using 64-bit vertex inputs
Product: Mesa Reporter: Józef Kucia <joseph.kucia>
Component: Drivers/Vulkan/intelAssignee: Alejandro Piñeiro (freenode IRC: apinheiro) <apinheiro>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: jason, jasuarez, marky
Version: git   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Reproducer

Description Józef Kucia 2017-10-12 11:57:03 UTC
"elem_count" isn't calculated correctly for 64bit vertex inputs in emit_vertex_input(). This leads to crash/assert() in the driver code.

The problem is that vs_prog_data->inputs_read may contain only parts of varyings mark as used (see try_mask_partial_io() in src/compiler/nir/nir_gather_info.c). On the other hand, for vs_prog_data->double_inputs_read whole variables are always marked as used. The element count is computed as _builtin_popcount(elements) - _builtin_popcount(elements_double) / 2, but popcount(elements_double) / 2 > popcount(elements) when only portions of double variables are mark as used.                         

This can be reproduced using the VkPositiveLayerTest.CreatePipeline64BitAttributesPositive test from Vulkan-LoaderAndValidationLayers: 

./vk_layer_validation_tests --gtest_filter='VkPositiveLayerTest.CreatePipeline64BitAttributesPositive'
Note: Google Test filter = VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from VkPositiveLayerTest
[ RUN      ] VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
vk_layer_validation_tests: ./genxml/gen9_pack.h:66: __gen_uint: Assertion `v <= max' failed.
Aborted
Comment 1 Jason Ekstrand 2017-12-02 17:13:10 UTC
*** Bug 103851 has been marked as a duplicate of this bug. ***
Comment 2 Jason Ekstrand 2017-12-02 18:10:26 UTC
There's now a patch on the list to fix this.  Sorry it's taken so long:

https://patchwork.freedesktop.org/patch/191534/
Comment 3 Jason Ekstrand 2018-05-11 18:20:42 UTC
It's been almost 6 months since I last looked at this (sorry about that).  Here's what I remember from when I was looking at it before.

The fundamental problem here seems to be in the way we're trying to handle dvec4 (and dvec3) vertex attributes in GL and Vulkan.  In GL a dvec4 vertex attribute takes up one API slot though it consumes two slots for the purposes of limits.  In Vulkan, it takes up two API slots.

When we did attrib64 for Vulkan, we decided that NIR should just use the Vulkan convention (which is what TGSI does as well) and we should try to contain the GL convention to GL.  To do this we have a double_inputs_read field which is used to flag dvec3 and dvec4 inputs to vertex shaders so that we can map back and forth.  In glsl_to_nir we map one direction and then we map the other direction when we do the vertex attribute setup in i965.  So far so good.

When I tried to fix this bug 6 months ago, I wrote a patch which makes nir_gather_info more properly handle double_inputs_read so that it wouldn't flag partially used things as being a dvec4 unnecessarily.  This caused problems because suddenly, nir_gather_info started disagreeing with glsl_to_nir in some cases and we started failing GL tests.  The problem was that glsl_to_nir would decide on some mapping and then we would manage to eliminate some variable uses during NIR optimizations and nir_gather_info would come up with a different double_inputs_read map.  The new map (which was a subset of the old) would then be used for state setup and the mapping would mismatch and everything would explode.

I think what we need to do is to somehow compute the mapping in glsl_to_nir (or earlier) and then hang on to that mapping and leave it unchanged and then use it for state setup.  This mapping should probably live in gl_shader_program or something like that.  We need some sort of a mapping for Vulkan but they probably aren't the same thing.

Alejandro, could you look at this?
Comment 4 Alejandro Piñeiro (freenode IRC: apinheiro) 2018-05-12 06:35:26 UTC
(In reply to Jason Ekstrand from comment #3)

> 
> Alejandro, could you look at this?

In any normal circumstance I would not have any problem to work on it. But this issue seems tricky and I'm about to start a parental leave (and "about to start" means that I don't know it exactly but can happen at any given moment) so I would prefer to not start to work on it, and need to stop without finishing it.

As it is Saturday, I'm adding Juan on the CC for now, that also worked on the implementation of va64. This Monday we will decide who at the Igalia team will take a look to this bug.
Comment 5 Juan A. Suarez 2018-05-15 10:10:31 UTC
I'll take a look at this issue.
Comment 6 Juan A. Suarez 2018-05-15 13:54:57 UTC
I've been trying to reproduce this issue, using Mesa and Vulkan-LoaderAndValidationLayers master versions, but so far it works fine.

Probably it was fixed in some of the commits, but I don't know in which one, as I didn't find any commit to be able to reproduce it.
Comment 7 Józef Kucia 2018-08-03 18:27:02 UTC
It is still reproducible:

$ VK_LAYER_PATH=../layers/ git-mesa-vk-intel ./vk_layer_validation_tests --gtest_filter='VkPositiveLayerTest.CreatePipeline64BitAttributesPositive'
Note: Google Test filter = VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from VkPositiveLayerTest
[ RUN      ] VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
vk_layer_validation_tests: ./genxml/gen9_pack.h:72: __gen_uint: Assertion `v <= max' failed.
Aborted


18.1-branchpoint-2179-g1c7a2433b270
Comment 8 Lionel Landwerlin 2018-08-04 21:13:32 UTC
(In reply to Józef Kucia from comment #7)
> It is still reproducible:
> 
> $ VK_LAYER_PATH=../layers/ git-mesa-vk-intel ./vk_layer_validation_tests
> --gtest_filter='VkPositiveLayerTest.CreatePipeline64BitAttributesPositive'
> Note: Google Test filter =
> VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from VkPositiveLayerTest
> [ RUN      ] VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
> vk_layer_validation_tests: ./genxml/gen9_pack.h:72: __gen_uint: Assertion `v
> <= max' failed.
> Aborted
> 
> 
> 18.1-branchpoint-2179-g1c7a2433b270

Works fine for me too at :

commit 6754c2e83d79f93b3a4c8c1c55ca4c5e3b965645 (HEAD, tag: 18.1-branchpoint)
Author: Dylan Baker <dylan@pnwbakers.com>
Date:   Fri Apr 20 18:52:55 2018 -0700

    autotools: Include new meson files
    
    Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>


I couldn't find the commit you're referring to (1c7a2433b270).
Comment 9 Józef Kucia 2018-08-06 17:42:30 UTC
(In reply to Lionel Landwerlin from comment #8)
> Works fine for me too at :
> 
> commit 6754c2e83d79f93b3a4c8c1c55ca4c5e3b965645 (HEAD, tag: 18.1-branchpoint)
> Author: Dylan Baker <dylan@pnwbakers.com>
> Date:   Fri Apr 20 18:52:55 2018 -0700
> 
>     autotools: Include new meson files
>     
>     Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
>     Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
> 

Could you please attach the output produced by vk_layer_validation_tests for VkPositiveLayerTest.CreatePipeline64BitAttributesPositive?

> I couldn't find the commit you're referring to (1c7a2433b270).

My Mesa tree could have some additional patches applied but those are not related.
Comment 10 Lionel Landwerlin 2018-08-06 22:58:12 UTC
Indeed, I might be missing an extension :

$ ./vk_layer_validation_tests --gtest_filter=VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
Note: Google Test filter = VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from VkPositiveLayerTest
[ RUN      ] VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
             Did not find VK_LAYER_LUNARG_device_profile_api layer; skipped.
[       OK ] VkPositiveLayerTest.CreatePipeline64BitAttributesPositive (8 ms)
[----------] 1 test from VkPositiveLayerTest (8 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (12 ms total)
[  PASSED  ] 1 test.
Comment 11 Józef Kucia 2018-08-07 07:08:02 UTC
(In reply to Lionel Landwerlin from comment #10)
>              Did not find VK_LAYER_LUNARG_device_profile_api layer; skipped.
Yes, the test is skipped. The VK_LAYER_LUNARG_device_profile_api is available in Vulkan-ValidationLayers/tests/layers/. If you build the validation layers yourself it should be enough to run ./vk_layer_validation_tests with VK_LAYER_PATH=../layers/
Comment 12 Lionel Landwerlin 2018-08-07 08:26:48 UTC
$ VK_LAYER_PATH=../layers/ ./vk_layer_validation_tests --gtest_filter=VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
Note: Google Test filter = VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from VkPositiveLayerTest
[ RUN      ] VkPositiveLayerTest.CreatePipeline64BitAttributesPositive
/home/djdeath/src/mesa-src/Vulkan-LoaderAndValidationLayers/tests/vkrenderframework.cpp:236: Failure
Expected equality of these values:
  VK_SUCCESS
    Which is: 0
  err
    Which is: -9
VK_ERROR_INCOMPATIBLE_DRIVER
/home/djdeath/src/mesa-src/Vulkan-LoaderAndValidationLayers/tests/layer_validation_tests.cpp:27399: Failure
Expected: InitFramework(myDbgFunc, m_errorMonitor) doesn't generate new fatal failures in the current thread.
  Actual: it does.
[  FAILED  ] VkPositiveLayerTest.CreatePipeline64BitAttributesPositive (61 ms)
[----------] 1 test from VkPositiveLayerTest (61 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (61 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] VkPositiveLayerTest.CreatePipeline64BitAttributesPositive

 1 FAILED TEST

It doesn't seem to reach anv_CreateInstance.
Comment 13 Józef Kucia 2018-08-08 16:01:39 UTC
Created attachment 141010 [details]
Reproducer

I'm attaching a standalone reproducer.

It needs VK_FORMAT_FEATURE_VERTEX_BUFFER_BIT for FORMAT_R64G64B64A64_SFLOAT.

I am to reproduce the problem on Intel(R) HD Graphics 630 (Kaby Lake GT2).
Comment 14 Mark Janes 2018-09-11 15:41:30 UTC
Jason, you mentioned 64-bit attributes in your last MSR.  Did you address this bug?
Comment 15 Jason Ekstrand 2018-09-11 15:43:30 UTC
This was fixed by the following commit in master:

commit 7b26741806c521279a1b83f2eae40a277d806626
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Tue Sep 4 13:58:01 2018 -0500

    anv/pipeline: Only consider double elements which actually exist
    
    The brw_vs_prog_data::double_inputs_read field comes directly from
    shader_info::double_inputs which may contain inputs which are not
    actually read.  Instead of using it directly, AND it with inputs_read
    which is only things which are read.  Otherwise, we may end up
    subtracting too many elements when computing elem_count.
    
    Cc: mesa-stable@lists.freedesktop.org
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103241
    Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.