Bug 99692

Summary: [radv] Mostly broken on Hawaii PRO/CIK ASICs
Product: Mesa Reporter: Kai <kai>
Component: Drivers/Vulkan/radeonAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact: mesa-dev
Severity: normal    
Priority: medium CC: 0xe2.0x9a.0x9b
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: vulkanscene example showing corruption
vulkaninfo output
xdpyinfo output
dota2 gdb backgrace

Description Kai 2017-02-06 12:58:41 UTC
Created attachment 129359 [details]
vulkanscene example showing corruption

I've been trying to use radv with my Hawaii PRO GPU for some time now, but it seems more or less broken depending on the application. The Talos Principle starts now and I'm seeing a loading screen (with heavy visual corruption; the corruption pattern is the same as with the demos, see below) until the main menu should be displayed and the whole system is just frozen, won't react to any input.
Most of the demos of Sascha Willems (<https://github.com/SaschaWillems/Vulkan>) exhibit corrupted rendering (flickering blocks/line, see attached screenshot).

I know radv is not complete and meant for prime time, but this seems more of an inherent incompatibility with CIK-generation GPUs, given that Dave and Bas seem to be able to run Talos, Doom, etc. on their (newer) ASICs. Especially since dmesg is spammed with tons of GPU fault errors while running many of the Vulkan demos; the following messages where triggered by running the triangle example):
> [ 7048.432078] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0e8b9014
> [ 7048.432080] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00100474
> [ 7048.432081] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B090014
> [ 7048.432081] VM fault (0x14, vmid 5) at page 1049716, write from 'CB2' (0x43423200) (144)
These messages look all very similar, the part changing is "write from 'CB2'", where I can see other CBs as well (CB3, CB6, CB7 are the currently predominant ones besides CB2). The fault address and status seem to stay constant. The first line (GPU fault detected) has different values though.


The stack I'm using (Debian testing as a base) is:
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/02264bc6f9 + attachment 127922 [details] [review] (bug 97988) and revert of 7b32ae4df5bc19c378598d6a950a6019fa64ece6 (see bug 99542)
libdrm: Git:master/d4b8344363 (tag libdrm-2.4.75)
LLVM: SVN:trunk/r294119 (5.0 devel) + <https://reviews.llvm.org/D26348?download=true> (bug 97988)
X.Org: 2:1.19.1-4
Linux: 4.9.8
Firmware (firmware-amd-graphics): 20160824-1
libclc: Git:master/2ec7d80d5e
DDX (xserver-xorg-video-amdgpu): 1.2.0-1+b1

Let me know, if you need anything else.
Comment 1 Kai 2017-02-06 12:59:04 UTC
Created attachment 129360 [details]
vulkaninfo output
Comment 2 Kai 2017-02-06 12:59:26 UTC
Created attachment 129361 [details]
xdpyinfo output
Comment 3 Nicholas Disiere 2017-02-10 01:43:58 UTC
I can confirm that this visual corruption is present on my R9 290 when radeon is blacklisted on the current Arch kernel. It affects the desktop with and without the compositor, and causes vulkan games to crash.
Comment 4 Luzipher 2017-02-10 17:13:20 UTC
The Radeon R9 290X (Hawaii XT) is affected by this as well (on today's mesa master).
Comment 5 Luke A. Guest 2017-02-10 17:57:42 UTC
I haven't tried it in months as I've been using teh amdgpu-pro stack due to the same rendering issues on my R9 390, I did inform dave and baz about it then.
Comment 6 John Owen 2017-02-10 19:45:28 UTC
Can confirm the same with a 290 on Linux 4.9.8 using mesa-git, amdgpu-git.

Sascha Willems' triangle demo flickers and clicking 'maximise' on the window results in this error in console:

Fatal : VkResult is "ERROR_OUT_OF_DATE_KHR" in /opt/Vulkan/triangle/triangle.cpp at line 333
triangle: /opt/Vulkan/triangle/triangle.cpp:333: void VulkanExample::draw(): Assertion `res == VK_SUCCESS' failed.
Aborted (core dumped)
Comment 7 Dave Airlie 2017-02-14 06:12:07 UTC
I've posted some patches to the mesa development list that should hopefully fix the VM faults. Please test them.
Comment 8 Dave Airlie 2017-02-14 06:12:47 UTC
https://patchwork.freedesktop.org/series/19593/

is the series.
Comment 9 Kai 2017-02-14 13:52:23 UTC
(In reply to Dave Airlie from comment #8)
> https://patchwork.freedesktop.org/series/19593/
> 
> is the series.

I can confirm, that all the issues I was seeing as reported in comment #0 are resolved by applying that series. You can have my:
  Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>

The full stack I used was(Debian testing as a base) is:
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/956556b3c3 + attachment 127922 [details] [review] (bug 97988), <https://patchwork.freedesktop.org/patch/138473/> (see bug 99542) and <https://patchwork.freedesktop.org/series/19593/>
libdrm: Git:master/d4b8344363 (tag libdrm-2.4.75)
LLVM: SVN:trunk/r294982 (5.0 devel) + <https://reviews.llvm.org/D26348?download=true> (bug 97988)
X.Org: 2:1.19.1-4
Linux: 4.9.9
Firmware (firmware-amd-graphics): 20160824-1
libclc: Git:master/2ec7d80d5e
DDX (xserver-xorg-video-amdgpu): 1.2.0-1+b1
Comment 10 Jan Ziak (http://atom-symbol.net) 2017-02-14 16:14:37 UTC
Created attachment 129602 [details]
dota2 gdb backgrace
Comment 11 Jan Ziak (http://atom-symbol.net) 2017-02-14 16:41:58 UTC
(In reply to Kai from comment #9)
> The full stack I used was(Debian testing as a base) is:
> GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
> Mesa: Git:master/956556b3c3 + attachment 127922 [details] [review] [review] (bug
> 97988), <https://patchwork.freedesktop.org/patch/138473/> (see bug 99542)
> and <https://patchwork.freedesktop.org/series/19593/>
> libdrm: Git:master/d4b8344363 (tag libdrm-2.4.75)
> LLVM: SVN:trunk/r294982 (5.0 devel) +
> <https://reviews.llvm.org/D26348?download=true> (bug 97988)

Is LLVM-5.0-devel required? I am using LLVM-4.0.0_rc1 and even vulkaninfo is terminating with an assertion error.
Comment 12 Kai 2017-02-14 16:56:58 UTC
(In reply to Jan Ziak from comment #11)
> (In reply to Kai from comment #9)
> > The full stack I used was(Debian testing as a base) is:
> > GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
> > Mesa: Git:master/956556b3c3 + attachment 127922 [details] [review] [review] [review] (bug
> > 97988), <https://patchwork.freedesktop.org/patch/138473/> (see bug 99542)
> > and <https://patchwork.freedesktop.org/series/19593/>
> > libdrm: Git:master/d4b8344363 (tag libdrm-2.4.75)
> > LLVM: SVN:trunk/r294982 (5.0 devel) +
> > <https://reviews.llvm.org/D26348?download=true> (bug 97988)
> 
> Is LLVM-5.0-devel required? I am using LLVM-4.0.0_rc1 and even vulkaninfo is
> terminating with an assertion error.

Not to my knowledge and not intentionally I would venture to say, otherwise proposed changes including different behaviour depending on whether you have LLVM < 4.0.0 or not wouldn't make too much sense. Or it could just be a broken assertion somewhere? My non-asserting build here works now.

In any case your issue looks like a different problem from what this bug was/is about (graphical corruption and VM faults with radv and CIK ASICS), please file a separate report for it.
Comment 13 Dave Airlie 2017-02-14 19:14:57 UTC
fixes pushed, thanks for testing.
Comment 14 Bas Nieuwenhuizen 2017-02-14 19:31:58 UTC
(In reply to Jan Ziak from comment #11)
> (In reply to Kai from comment #9)
> > The full stack I used was(Debian testing as a base) is:
> > GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
> > Mesa: Git:master/956556b3c3 + attachment 127922 [details] [review] [review] [review] (bug
> > 97988), <https://patchwork.freedesktop.org/patch/138473/> (see bug 99542)
> > and <https://patchwork.freedesktop.org/series/19593/>
> > libdrm: Git:master/d4b8344363 (tag libdrm-2.4.75)
> > LLVM: SVN:trunk/r294982 (5.0 devel) +
> > <https://reviews.llvm.org/D26348?download=true> (bug 97988)
> 
> Is LLVM-5.0-devel required? I am using LLVM-4.0.0_rc1 and even vulkaninfo is
> terminating with an assertion error.

If we detect 4.0, we use a patch that got backported to the 4.0 branch after rc1, so you might need to use rc2+ or a svn version.
Comment 15 Jan Ziak (http://atom-symbol.net) 2017-02-14 19:33:16 UTC
(In reply to Kai from comment #12)
> (In reply to Jan Ziak from comment #11)
> > Is LLVM-5.0-devel required? I am using LLVM-4.0.0_rc1 and even vulkaninfo is
> > terminating with an assertion error.
> 
> Not to my knowledge and not intentionally I would venture to say, otherwise
> proposed changes including different behaviour depending on whether you have
> LLVM < 4.0.0 or not wouldn't make too much sense. Or it could just be a
> broken assertion somewhere? My non-asserting build here works now.

llvm-git works without issues.

> In any case your issue looks like a different problem from what this bug
> was/is about (graphical corruption and VM faults with radv and CIK ASICS),
> please file a separate report for it.

It is true that it was a problem different from this bug.

----

Back to this bug:

Dota2 -vulkan (observer mode) is working without issues on my R9-390.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.