105098 – [RADV] GPU freeze with simple Vulkan App

Bug 105098 - [RADV] GPU freeze with simple Vulkan App

Summary: [RADV] GPU freeze with simple Vulkan App

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/Vulkan/radeon (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	mesa-dev
QA Contact:	mesa-dev

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-02-14 19:41 UTC by Lukas Kahnert
Modified:	2018-02-16 11:08 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments

Description Lukas Kahnert 2018-02-14 19:41:44 UTC

I tried to start with developing in Vulkan. I finally got something on screen but after some seconds the system freezes.
I tried also the source from Vulkan Tutorial (https://github.com/Overv/VulkanTutorial/tree/master/code) which triggers the same behaviour.
Some seconds after running an app, which renders fine, the display freezes and the LEDs on the GPU are full (means 100% GPU load).
It's still possible to ssh but the dmesg shows nothing and hard reset is needed because shutdown process hangs also. Looks like an infinite loop inside the GPU.
On my laptop(AMDGPU OLAND) it runs without freeze(output is a but corrupted but it's another issue).

Tried with vanilla Kernel 4.15 and drm-next-4.17-wip branch
LLVM 5.0 and 6.0
mesa 18.0.0-RC4 and git
always the same problem and doesn't matter if running under X11/XCB or Wayland.
If I comment out vkQueueSubmit(so it does render nothing), the output is garbage but it doesn't hang. Maybe a issue with queue handling?

Comment 1 Bas Nieuwenhuizen 2018-02-14 21:21:20 UTC

Hi, I can't really comment on what is going wrong with your application specifically, due to lack of details.

That the hang does not happen if you don't do the vkQueueSubmit does nto tell us much, as you're essentially not telling the driver to do anything, and it can't hang  when not doing anything on the GPU.

What you can try:

1) Don't specify the semaphores during the submit. For simple apps our driver should give mostly the correct result even without them. That excludes semaphores from being a problem.
2) See if you can scrap/comment out commands that you record in the command buffer and whether the hang still occurs.
3) Run validation layers. (I realize that due to the hang you may need to redirect the output to a file and check the file after reboot)

wrt VulkanTutorials, what exactly did you built and run? I have trouble matching C++ files and shaders (since there are more c++ files to shaders) and I'd prefer not to have to read the entire tutorial.

Comment 2 Lukas Kahnert 2018-02-14 23:59:22 UTC

From the VulkanTutorial source most demos trigger the freeze.
For example the triangle demo 15_hello_triangle.cpp with 09_shader_base.frag and 09_shader_base.vert as shaders.
The command buffers are very simple and if I comment out vkCmdDraw it also doesn't freeze(well the GPU also does nothing).
Commenting out the semaphores at submit changes nothing. The Validation Layers report warnings cause of the missing semaphores but after couple of seconds it also freezes.
I piped the output and read it from ssh but there are no Validation Layer reports. It's like the GPU stop working at this moment.
I know its a hardware-specific problem because on Intel ANV and RADV(AMD OLAND) it works.
Is it possible that the GPU hangs if there will be too much draw calls in a short time frame?

Comment 3 Bas Nieuwenhuizen 2018-02-15 00:14:48 UTC

What GPU do you have in your hanging system?

FWIW I can reproduce the 15*.cpp + 09* shaders hang here, let me see what I can come up with.

Comment 4 Lukas Kahnert 2018-02-15 00:48:37 UTC

I use a Vega64 Liquid Edition

I found out that if you change fragColor or outColor in the shader source to a fixed value the issue is not triggered(it doesn't freeze immediately). Maybe the bug is hidden in the gl_VertexIndex variable or the color array itself.

Comment 5 Bas Nieuwenhuizen 2018-02-15 20:45:02 UTC

The non-constant indexing was indeed the issue. Normally we use some instructions for those but Vega switched to different instruction and a bug in LLVM causes a hang with them. 

We had a workaround for that but it turned out it did not trigger for this shader due to declaring the array in global scope.

https://patchwork.freedesktop.org/patch/205018/

should fix the issue. Does this also fix your app?

Comment 6 Lukas Kahnert 2018-02-15 21:26:23 UTC

My App is more or less the same than the triangle demo(I try to learn Vulkan, but this issue was definitely not normal for invalid API usage ;)).

With this patch it does't hang now and works as expected. Thanks :)

Comment 7 Adrià Cereto i Massagué 2018-02-16 07:47:06 UTC

The same behaviour can be observed on my Vega 56 when trying to run some games through DXVK.

I'll try the patch to see whether it fixes it in that case too.

Comment 8 Bas Nieuwenhuizen 2018-02-16 11:08:38 UTC

Thanks, the fix is in git master now.

wrt dxvk, if you still have issues after the patch, could you open a new bug with more details?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.