Bug 105049

Summary: Invalid usage of vulkan pipeline barriers can cause amdgpu to deadlock
Product: Mesa Reporter: Hal Gentz <zegentzy>
Component: Drivers/Vulkan/radeonAssignee: mesa-dev
Status: RESOLVED NOTOURBUG QA Contact: mesa-dev
Severity: normal    
Priority: medium    
Version: 17.3   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: src code for my program + its CMakeLists.txt and the script I build it with
results from running `vulkaninfo`
results from running `pacaur -Q | grep "vulkan\|mesa\|xf86"`
what the program outputs when run from ssh with x-forwarding (program crashes before it can lock the gpu)
results from running `glxinfo`
results from running `glinfo`
A call trace from the kernel because the process was hung

Description Hal Gentz 2018-02-12 02:12:39 UTC
Created attachment 137278 [details]
src code for my program + its CMakeLists.txt and the script I build it with

So I was messing around with my neat little vulkan program and managed to make the amdgpu driver freeze. TTY consoles still work but if you enter an TTY with xorg running you can't get out or do anything. Any new xorg servers you start will stay frozen, ect. Audio still plays.

Expect:
My faulty program shoud've just crashed.

What Happened:
Nothing graphical responded.

Attached:

calltrace: A call trace from the kernel because the process was hung
glinfo: results from running `glinfo`
glxinfo: results from running `glxinfo`
logfromssh: what the program outputs when run from ssh with x-forwarding (program crashes before it can lock the gpu)
pacaur: results from running `pacaur -Q | grep "vulkan\|mesa\|xf86"`
vulkaninfo: results from running `vulkaninfo`
debuginfo.7z: src code for my program + its CMakeLists.txt and the script I build it with
Comment 1 Hal Gentz 2018-02-12 02:13:09 UTC
Created attachment 137279 [details]
results from running `vulkaninfo`
Comment 2 Hal Gentz 2018-02-12 02:13:28 UTC
Created attachment 137280 [details]
results from running `pacaur -Q | grep "vulkan\|mesa\|xf86"`
Comment 3 Hal Gentz 2018-02-12 02:13:47 UTC
Created attachment 137281 [details]
what the program outputs when run from ssh with x-forwarding (program crashes before it can lock the gpu)
Comment 4 Hal Gentz 2018-02-12 02:14:04 UTC
Created attachment 137282 [details]
results from running `glxinfo`
Comment 5 Hal Gentz 2018-02-12 02:14:24 UTC
Created attachment 137283 [details]
results from running `glinfo`
Comment 6 Hal Gentz 2018-02-12 02:14:41 UTC
Created attachment 137284 [details]
A call trace from the kernel because the process was hung
Comment 7 Samuel Pitoiset 2018-12-18 16:17:21 UTC
Well, the validation layers report a bunch of errors with your app, like:

Validation layer: (F8)  [ VUID-vkCmdEndRenderPass-commandBuffer-cmdpool ] Object: 0x5638b8e4f6b0 (Type = 6) | Cannot call vkCmdEndRenderPass() on a command buffer allocated from a pool without VK_QUEUE_GRAPHICS_BIT capabilities.. The Vulkan spec states: The VkCommandPool that commandBuffer was allocated from must support graphics operations (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkCmdEndRenderPass-commandBuffer-cmdpool)

Validation layer: (F8)  [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object: 0x5638b8279a80 (Type = 6) | Submitted command buffer expects image 0x3 (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL--instead, image 0x3's current layout is VK_IMAGE_LAYOUT_UNDEFINED.

Validation layer: (F8)  [ VUID-vkCmdBeginRenderPass-commandBuffer-cmdpool ] Object: 0x5638b8e5a3f0 (Type = 6) | Cannot call vkCmdBeginRenderPass() on a command buffer allocated from a pool without VK_QUEUE_GRAPHICS_BIT capabilities.. The Vulkan spec states: The VkCommandPool that commandBuffer was allocated from must support graphics operations (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkCmdBeginRenderPass-commandBuffer-cmdpool)

As Vulkan is a low-level API, it's not surprising that the GPU hangs if you do something bad. The first step is to *always* enable the validation layers in order to make sure your application is correct. If after fixing all errors the GPU still hangs with RADV, it might be a problem in the driver.

Feel free to re-open if the problem still happens after making sure you use the API the right way.

Thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.