Bug 105049 - Invalid usage of vulkan pipeline barriers can cause amdgpu to deadlock
Summary: Invalid usage of vulkan pipeline barriers can cause amdgpu to deadlock
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/radeon (show other bugs)
Version: 17.3
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-12 02:12 UTC by Hal Gentz
Modified: 2018-12-18 16:17 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
src code for my program + its CMakeLists.txt and the script I build it with (30.82 KB, application/x-7z-compressed)
2018-02-12 02:12 UTC, Hal Gentz
Details
results from running `vulkaninfo` (99.81 KB, text/plain)
2018-02-12 02:13 UTC, Hal Gentz
Details
results from running `pacaur -Q | grep "vulkan\|mesa\|xf86"` (640 bytes, text/plain)
2018-02-12 02:13 UTC, Hal Gentz
Details
what the program outputs when run from ssh with x-forwarding (program crashes before it can lock the gpu) (10.80 KB, text/plain)
2018-02-12 02:13 UTC, Hal Gentz
Details
results from running `glxinfo` (52.75 KB, text/plain)
2018-02-12 02:14 UTC, Hal Gentz
Details
results from running `glinfo` (5.63 KB, text/plain)
2018-02-12 02:14 UTC, Hal Gentz
Details
A call trace from the kernel because the process was hung (1.40 KB, text/plain)
2018-02-12 02:14 UTC, Hal Gentz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hal Gentz 2018-02-12 02:12:39 UTC
Created attachment 137278 [details]
src code for my program + its CMakeLists.txt and the script I build it with

So I was messing around with my neat little vulkan program and managed to make the amdgpu driver freeze. TTY consoles still work but if you enter an TTY with xorg running you can't get out or do anything. Any new xorg servers you start will stay frozen, ect. Audio still plays.

Expect:
My faulty program shoud've just crashed.

What Happened:
Nothing graphical responded.

Attached:

calltrace: A call trace from the kernel because the process was hung
glinfo: results from running `glinfo`
glxinfo: results from running `glxinfo`
logfromssh: what the program outputs when run from ssh with x-forwarding (program crashes before it can lock the gpu)
pacaur: results from running `pacaur -Q | grep "vulkan\|mesa\|xf86"`
vulkaninfo: results from running `vulkaninfo`
debuginfo.7z: src code for my program + its CMakeLists.txt and the script I build it with
Comment 1 Hal Gentz 2018-02-12 02:13:09 UTC
Created attachment 137279 [details]
results from running `vulkaninfo`
Comment 2 Hal Gentz 2018-02-12 02:13:28 UTC
Created attachment 137280 [details]
results from running `pacaur -Q | grep "vulkan\|mesa\|xf86"`
Comment 3 Hal Gentz 2018-02-12 02:13:47 UTC
Created attachment 137281 [details]
what the program outputs when run from ssh with x-forwarding (program crashes before it can lock the gpu)
Comment 4 Hal Gentz 2018-02-12 02:14:04 UTC
Created attachment 137282 [details]
results from running `glxinfo`
Comment 5 Hal Gentz 2018-02-12 02:14:24 UTC
Created attachment 137283 [details]
results from running `glinfo`
Comment 6 Hal Gentz 2018-02-12 02:14:41 UTC
Created attachment 137284 [details]
A call trace from the kernel because the process was hung
Comment 7 Samuel Pitoiset 2018-12-18 16:17:21 UTC
Well, the validation layers report a bunch of errors with your app, like:

Validation layer: (F8)  [ VUID-vkCmdEndRenderPass-commandBuffer-cmdpool ] Object: 0x5638b8e4f6b0 (Type = 6) | Cannot call vkCmdEndRenderPass() on a command buffer allocated from a pool without VK_QUEUE_GRAPHICS_BIT capabilities.. The Vulkan spec states: The VkCommandPool that commandBuffer was allocated from must support graphics operations (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkCmdEndRenderPass-commandBuffer-cmdpool)

Validation layer: (F8)  [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object: 0x5638b8279a80 (Type = 6) | Submitted command buffer expects image 0x3 (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL--instead, image 0x3's current layout is VK_IMAGE_LAYOUT_UNDEFINED.

Validation layer: (F8)  [ VUID-vkCmdBeginRenderPass-commandBuffer-cmdpool ] Object: 0x5638b8e5a3f0 (Type = 6) | Cannot call vkCmdBeginRenderPass() on a command buffer allocated from a pool without VK_QUEUE_GRAPHICS_BIT capabilities.. The Vulkan spec states: The VkCommandPool that commandBuffer was allocated from must support graphics operations (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkCmdBeginRenderPass-commandBuffer-cmdpool)

As Vulkan is a low-level API, it's not surprising that the GPU hangs if you do something bad. The first step is to *always* enable the validation layers in order to make sure your application is correct. If after fixing all errors the GPU still hangs with RADV, it might be a problem in the driver.

Feel free to re-open if the problem still happens after making sure you use the API the right way.

Thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.