110603 – Blocky and black opacity/alpha using RADV on some games

Bug 110603 - Blocky and black opacity/alpha using RADV on some games

Summary: Blocky and black opacity/alpha using RADV on some games

Status:	RESOLVED MOVED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/Vulkan/radeon (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium minor
Assignee:	mesa-dev
QA Contact:	mesa-dev

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2019-05-04 03:07 UTC by Lucas Francesco
Modified:	2019-09-18 19:56 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
the bug itself on dota2 with vulkan enabled (306.79 KB, image/png) 2019-05-04 03:07 UTC, Lucas Francesco	Details
screenshot (265.81 KB, image/jpeg) 2019-05-06 09:01 UTC, Samuel Pitoiset	Details
screnshot taken with scrot (1.48 MB, image/png) 2019-05-06 20:31 UTC, Lucas Francesco	Details
Some notable parts of a renderdoc capture (1.53 MB, application/zip) 2019-07-28 17:27 UTC, tivoboma	Details
View All

Description Lucas Francesco 2019-05-04 03:07:37 UTC

Created attachment 144153 [details]
the bug itself on dota2 with vulkan enabled

I am experiencing a RADV bug with both arch and Gentoo linux, on LLVM7+ (cant test on LLVM 6)


i can't reproduce it with  Ubuntu 18.10(strangely it works flawlessly there, i didnt test with 19.10 but i can give it a go if its needed) on the same system with the same hardware specs, i tried nuking Gentoo and installing arch to see if the bug was a Gentoo specific one and it wasn't, i reinstalled Gentoo 2 times while testing it (with different use flags) and wasn't able to stop that to happening

Already tried:
switching LLVM versions
switching to arch
changing compiler flags
changing around with the debug enable flag on mesa
downgrading glibc a bit
downgrading x-server
forcing the game to use wayland directly on SDL (in the case of artifact)


The games I can reproduce are mainly Source 2 ones, but i can reproduce it with skyrim (dxvk dx11 version on proton) 

i'm putting it as minor severity as no one else that i asked that haves the same gpu hardware besides me can reproduce the issue


System info:
https://gist.github.com/Uramekus/03308e0cdb776374d7cfa9ceb125bbe7

RenderDoc Capture (quite old at this point, i might make a new one tomorrow)

https://drive.google.com/file/d/1XZ8XMiA-j2eeZJU0vpfUeq85iQxsE67b/view?usp=sharing

Comment 1 Samuel Pitoiset 2019-05-06 07:33:47 UTC

What LLVM/Mesa versions are you using?
Can you attach the output of glxinfo (or vulkaninfo) please?

Comment 2 Samuel Pitoiset 2019-05-06 08:55:19 UTC

Please ignore my previous comment, you posted it already. :-)

Comment 3 Samuel Pitoiset 2019-05-06 09:01:40 UTC

Created attachment 144172 [details]
screenshot

Attached screenshot with:

OpenGL renderer string: AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.27.0, 4.20.0-rc3-58450-g9698024e8a19, LLVM 8.0.1)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.1.0-devel (git-b98955e128)

What's the problem actually? It looks good to me.
(Also tried with mesa-git/llvm-git, same output)

Comment 4 Lucas Francesco 2019-05-06 20:31:26 UTC

Created attachment 144180 [details]
screnshot taken with scrot

The strange thing for me is that the renderdoc capture looks 100% fine on ubuntu or in a friend's computer that haves similar hardware with arch, but on my PC both arch and gentoo, the renderdoc AND the game itself produces that blocky 100% black 



here's an screenshot I've taken just now of how it looks on my side, now I'm on the latest Mesa commit and the latest LLVM commit


i'm trying to do another RenderDoc capture but i'm not being able to compile renderdoc due to a random dns issue right now

Comment 5 Samuel Pitoiset 2019-05-07 07:42:23 UTC

This is indeed weird.

Comment 6 tivoboma 2019-06-19 21:06:43 UTC

I experienced a visually similar issue in witcher3 after updating my gentoo system with a RX 570. Notable changes:
- installed mesa 19.1 (from 19.0.x, not sure which exact version)
- updated wine to 4.10 (from 4.6 iirc)
- migrated the gentoo profile from 17.0 to to the 17.1 (including all recommended rebuilds)

Switching back to mesa 19.0.6 does not fix the issue, but setting the RADV_DEBUG environment variable to "nohiz" does. Maybe that helps to identify the issue?

Comment 7 Samuel Pitoiset 2019-06-21 07:32:25 UTC

Can you record a renderdoc capture of the problem please?

Comment 8 tivoboma 2019-07-28 17:27:34 UTC

Created attachment 144898 [details]
Some notable parts of a renderdoc capture

I tried to get a renderdoc capture, but the results are way too large for my internet connection to upload. I thus took a capture with and without nohiz, and compared the two for obvious differences. I still have the captures, so if there is anything specific I should look at I can do that.

The two things that stood out for me the most (refer to the attached zip for the images I refer to):


1. DS=Store / DS=Load inconsistencies
In the first pass yielding visually different results, there are a few instances of the following sequence:
> vkCmdEndRenderPass(DS=Store)      | a
> ...                               | b
> vkCmdEndRenderPass(DS=Load)       | c
> vkCmdDrawIndexed                  | d
In the good render (nohiz set), the DS keeps its content throughout a-c, and is updated in d. In the bad render (nohiz not set), the DS gets weird fragments in b, which usually go away in d – but in at least one case they did *not* go away, causing the resulting texture to become visually corrupted (many pixels become white, see 0_depth_attachment_after_load_*).

2. The final depth buffer is obviously wrong
The blocks visible on the final output are visible as white (near) in the final depth pass. This seems to block the skybox/background from being added later on (see 1_final_depth_*).

3. It seems that the good and bad render use slightly different render paths
While I could map most parts of the captures onto each other, there were some additional render passes here and there that did not have clear equivalents in the other capture – and are not obviously related to the different camera position, etc.

4. The captures seem to be very hardware-dependent
For some reason, the nohiz capture only works if I set nohiz, otherwise renderdoc claims that I use different hardware and I thus cannot view the capture. Not sure if that is intentional, or if that is some clue that something is wrong.


The final render results are 2_final_result_*. Note that even without nohiz, some frames render properly – suggesting that the issue could be some kind of data race, maybe between DS=Store/DS=Load?

Hope these comments help at least a little, sorry that I'm unable to provide the whole captures.

Comment 9 GitLab Migration User 2019-09-18 19:56:17 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/860.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.