Bug 110815 - Segfault vkCreateDescriptorPool in The-Forge on RADV
Summary: Segfault vkCreateDescriptorPool in The-Forge on RADV
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/radeon (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-02 22:53 UTC by Alex Fuller
Modified: 2019-06-13 12:22 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
attachment-24768-0.html (5.07 KB, text/html)
2019-06-03 00:25 UTC, Bas Nieuwenhuizen
Details

Description Alex Fuller 2019-06-02 22:53:41 UTC
Hello,

I am using the oibaf mesa-vulkan-drivers which I believe is the latest git (just updated to 30th May 2019 build here 19.2~git1906010730.755906~oibaf~b running Linux Kernel 5.0.2 on Ubuntu 18.04). I have recently got segfaults when running vkCreateDescriptorPool with The-Forge which has worked perfectly before in the past on multiple OSes/Vulkan implementations, including RADV:
https://github.com/ConfettiFX/The-Forge
I maintain a 'lite' edition so it is easier for me to embed into other projects, which is probably easier to test with:
https://github.com/boberfly/The-Forge-Lite

I narrowed it down to this line crashing in the RADV vulkan driver when running the 01_Transformations example:
https://github.com/boberfly/The-Forge-Lite/blob/v1.27/src/Renderer/Vulkan/Vulkan.cpp#L477
From:
https://github.com/boberfly/The-Forge-Lite/blob/v1.27/src/Examples/01_Transformations/01_Transformations.cpp#L226

The error I get is pretty vague:
[1185298.570160] MainThread[39052]: segfault at c ip 00007fe119abd604 sp 00007ffc01e6afc8 error 4 in libvulkan_radeon.so[7fe11992b000+230000]
[1185298.570179] Code: 24 18 85 d2 41 89 94 24 00 02 00 00 0f 84 37 01 00 00 48 8d 44 24 10 45 31 f6 31 ed 45 31 ed 31 db 48 89 04 24 90 49 8b 57 18 <4a> 8b 34 32 43 89 5c 74 0c 8b 4e 04 4b 89 34 74 85 c9 74 2e 83 e9

I can only say that roughly a month ago this was running fine on an older RADV version. I am using the latest Vulkan 1.1.106 SDK from LunarG as well.

Hardware: HP z620 w/ 2x Xeon E5-2680v2, AMD Radeon Vega Frontier Edition 16GB

I hope I have enough information here to explain the issue.
Comment 1 Bas Nieuwenhuizen 2019-06-03 00:25:30 UTC
Created attachment 144416 [details]
attachment-24768-0.html

So AFAIU this is a framework right? Any demo apps using it that reproduce
the issue?

On Mon, Jun 3, 2019, 12:53 AM <bugzilla-daemon@freedesktop.org> wrote:

> Bug ID 110815 <https://bugs.freedesktop.org/show_bug.cgi?id=110815>
> Summary Segfault vkCreateDescriptorPool in The-Forge on RADV
> Product Mesa
> Version git
> Hardware x86-64 (AMD64)
> OS Linux (All)
> Status NEW
> Severity normal
> Priority medium
> Component Drivers/Vulkan/radeon
> Assignee mesa-dev@lists.freedesktop.org
> Reporter boberfly@gmail.com
> QA Contact mesa-dev@lists.freedesktop.org
>
> Hello,
>
> I am using the oibaf mesa-vulkan-drivers which I believe is the latest git
> (just updated to 30th May 2019 build here 19.2~git1906010730.755906~oibaf~b
> running Linux Kernel 5.0.2 on Ubuntu 18.04). I have recently got segfaults when
> running vkCreateDescriptorPool with The-Forge which has worked perfectly before
> in the past on multiple OSes/Vulkan implementations, including RADV:https://github.com/ConfettiFX/The-Forge
> I maintain a 'lite' edition so it is easier for me to embed into other
> projects, which is probably easier to test with:https://github.com/boberfly/The-Forge-Lite
>
> I narrowed it down to this line crashing in the RADV vulkan driver when running
> the 01_Transformations example:https://github.com/boberfly/The-Forge-Lite/blob/v1.27/src/Renderer/Vulkan/Vulkan.cpp#L477
> From:https://github.com/boberfly/The-Forge-Lite/blob/v1.27/src/Examples/01_Transformations/01_Transformations.cpp#L226
>
> The error I get is pretty vague:
> [1185298.570160] MainThread[39052]: segfault at c ip 00007fe119abd604 sp
> 00007ffc01e6afc8 error 4 in libvulkan_radeon.so[7fe11992b000+230000]
> [1185298.570179] Code: 24 18 85 d2 41 89 94 24 00 02 00 00 0f 84 37 01 00 00 48
> 8d 44 24 10 45 31 f6 31 ed 45 31 ed 31 db 48 89 04 24 90 49 8b 57 18 <4a> 8b 34
> 32 43 89 5c 74 0c 8b 4e 04 4b 89 34 74 85 c9 74 2e 83 e9
>
> I can only say that roughly a month ago this was running fine on an older RADV
> version. I am using the latest Vulkan 1.1.106 SDK from LunarG as well.
>
> Hardware: HP z620 w/ 2x Xeon E5-2680v2, AMD Radeon Vega Frontier Edition 16GB
>
> I hope I have enough information here to explain the issue.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are the assignee for the bug.
>    - You are the QA Contact for the bug.
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Comment 2 Alex Fuller 2019-06-03 00:29:36 UTC
Hi Bas,

Yep a framework, I bundle in one of their unit tests for the lite edition 01_Transformations which causes the bug:
https://github.com/boberfly/The-Forge-Lite/blob/v1.27/src/Examples/01_Transformations/01_Transformations.cpp#L226

Inside the Vulkan code:
https://github.com/boberfly/The-Forge-Lite/blob/v1.27/src/Renderer/Vulkan/Vulkan.cpp#L477

And just to reiterate, this was working great about a month ago.

Cheers
Comment 3 Alex Fuller 2019-06-03 01:02:42 UTC
If this helps, the debugger reports the amount of descriptor pool sizes and max sets here before the vkCreateDescriptorPool call:
poolCreateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO;
poolCreateInfo.pNext = NULL;
poolCreateInfo.poolSizeCount = 12;
poolCreateInfo.pPoolSizes = pPoolSizes; (.descriptorCount = 12, type = VK_DESCRIPTOR_TYPE_SAMPLER)
poolCreateInfo.flags = 0;
poolCreateInfo.maxSets = 33;

The last argument looks like a valid empty calloc'd VkDescriptorPool being passed in.

Cheers.
Comment 4 Bas Nieuwenhuizen 2019-06-03 13:50:27 UTC
From that description, sounds like poolSizeCount is wrong. It should
be equal to the number of structs in pPoolSizes, not the sum of their
descriptorCount. So sounds like it should be 1.

From spec:

"poolSizeCount is the number of elements in pPoolSizes."

Curious that it worked before.
Comment 5 Alex Fuller 2019-06-03 17:30:43 UTC
Hi Baz,

I am not in front of that computer right now, but I believe the function is being passed an array of size 'CONF_DESCRIPTOR_TYPE_RANGE_SIZE' which is 'VK_DESCRIPTOR_TYPE_RANGE_SIZE' or VK_DESCRIPTOR_TYPE_RANGE_SIZE+1 if the VK_NV_RAY_TRACING_SPEC_VERSION is located in the headers. The debugger wasn't too clear here that it was an array or not, it was probably just the first element's content.
Comment 6 Alex Fuller 2019-06-04 03:38:07 UTC
I managed to do some testing and I can now trigger the bug. It looks like when creating a vkCreateDescriptorPool of size 11 which is the default VK_DESCRIPTOR_TYPE_RANGE_SIZE everything is fine which is the behaviour I get before. Because I updated the headers of Vulkan, the NV raytracing extensions are now in the headers, and the size increases to 12 now in The-Forge codebase from a preprocessor define. Their comment on this:
"//+1 for Acceleration Structure because it is not counted by VK_DESCRIPTOR_TYPE_RANGE_SIZE"

So this segfault crash looks like there is some limit of VK_DESCRIPTOR_TYPE_RANGE_SIZE when creating pools on RADV. Is there some hardware limitation? ConfettiFX I am pretty sure are testing AMD's Vulkan driver and haven't reported an issue afaik.

Cheers.
Comment 7 Alex Fuller 2019-06-05 08:32:57 UTC
Hello again,

I didn't realise it was so simple to build debug versions of Mesa/RADV so I went ahead and did this.

I've found the bug:
https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/vulkan/radv_descriptor_set.c#L656

It is due to RADV not anticipating VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_NV which makes sense. I put a bug report onto The-Forge about it, as I doubt there would be a clean way to fix this on RADV. I'll leave this bug open if you think otherwise.

Cheers!
Comment 8 Samuel Pitoiset 2019-06-13 12:22:55 UTC
This is definitely a bug in TheForge. VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_NV is part of VK_NV_ray_tracing, so if the extension isn't exposed, you shouldn't try to use it. Also, as mentioned by Bas, poolSizeCount seems totally wrong.

Anyways, AFAIU The Forge doesn't want to support RADV, closing...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.