Bug 104602

Summary: [apitrace] Graphical artifacts in Civilization VI on RX Vega
Product: Mesa Reporter: Zach Tibbitts <zachtib>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: gediminas, jason, matombo, michael.mansell, stevenvandenbrandenstift, t_arceri, zachtib
Version: 17.3   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 77449    
Attachments: image showing the bad triangles
Hack around issue
Renderdoc capture

Description Zach Tibbitts 2018-01-12 15:58:09 UTC
System is Arch Linux running kernel 4.15-rc7 with Mesa 17.3.1

When playing Civilization VI and zooming the camera out to view more of the map, I experience a large number of black, flickering, triangle-shaped artifacts across the screen. While this doesn't seem to crash the game (I'm able to play for an extended period without the application crashing still), it is distracting.

I've had this issue since switching to a Vega card from Polaris, running first the amd-staging branch of the kernel, then the 4.15 RCs, and from mesa-git through 17.3 stable. The game updated today and the issue is still present.

As far as I've noticed, no other game experiences the same bug, so I can't be certain if the fault lies in the driver or the game itself. If needed, I can attempt to capture some screenshots or video of the bug.
Comment 1 Jason Playne 2018-01-20 13:04:17 UTC
I am seeing the same with my vega64 on KDE Neon with the oibaf ppa.

packages are marked as 17.4~git1801200730.436ed6~oibaf~x

kernel: 4.15.0-041500rc5-generic

You can easily reproduce this by running the graphics benchmark.

I tried to capture an API trace but the game slowed down so much that the benchmark did not get far enough along to show it :( 

(it is entirely possible that I am holding it wrong too)
Comment 2 Jason Playne 2018-01-20 13:10:01 UTC
Created attachment 136866 [details]
image showing the bad triangles

These triangles are not static, they flicker around the screen a fair bit
Comment 3 Jason Playne 2018-01-22 06:33:49 UTC
So I was able to get an API trace of the problem!

(I am rather happy to now be holding apitrace correctly!)

it does weigh in at 1.5GiB compressed but it does show all the badly rendered triangles and how zooming in stops them

it is available here: https://jasonplayne.com/share/Civ6.trace.bz2

I hope this is helpful
Comment 4 matombo 2018-03-18 22:44:44 UTC
*** Bug 105353 has been marked as a duplicate of this bug. ***
Comment 5 matombo 2018-03-18 22:47:41 UTC
I have the same issue,
i captured a video showing the flickering triangles which i had posted in the duplicate bug 105353
https://bugs.freedesktop.org/attachment.cgi?id=137806
Comment 6 Jason Playne 2018-04-09 12:15:05 UTC
Can Confirm that this problem still persists on Kernel 4.16.1 and Mesa 18
Comment 7 Zach Tibbitts 2018-04-09 13:26:41 UTC
This issue is also persisting with the latest update to Civ VI, including the Rise and Fall expansion.
Comment 8 peetipablo 2018-09-16 16:11:20 UTC
Just want to comment that this issue is still occuring on mesa 18.2, arch linux kernel 4.18.6.
Comment 9 Timothy Arceri 2018-09-21 03:28:36 UTC
I'm not sure why yet but both the black triangles and the incorrect rendering behind the chinese emperor on the loading screen go away when I run the trace on the NIR backend.

Until we figure out what is going on here you can try running the game with the following environment variable:

R600_DEBUG=nir
Comment 10 Jason Playne 2018-09-21 10:36:38 UTC
(In reply to Timothy Arceri from comment #9)
> I'm not sure why yet but both the black triangles and the incorrect
> rendering behind the chinese emperor on the loading screen go away when I
> run the trace on the NIR backend.
> 
> Until we figure out what is going on here you can try running the game with
> the following environment variable:
> 
> R600_DEBUG=nir

Can confirm! The initial red+black triangles in the game seem to have disappeared using the nir backend.

(I may be a little excited!)
Comment 11 Jason Playne 2018-09-21 12:25:39 UTC
(In reply to Jason Playne from comment #10)
> (In reply to Timothy Arceri from comment #9)
> > I'm not sure why yet but both the black triangles and the incorrect
> > rendering behind the chinese emperor on the loading screen go away when I
> > run the trace on the NIR backend.
> > 
> > Until we figure out what is going on here you can try running the game with
> > the following environment variable:
> > 
> > R600_DEBUG=nir
> 
> Can confirm! The initial red+black triangles in the game seem to have
> disappeared using the nir backend.
> 
> (I may be a little excited!)

After a couple of hours of game time, I have not seen the triangles.

nir solves the problem

Thanks Timothy!
Comment 12 michael.mansell 2018-09-22 01:18:11 UTC
This fixes the issue for me as well. Its great to finally have a work-around!
Comment 13 Juan A. Suarez 2018-10-05 10:50:43 UTC
As this is fixed, and published in mesa 18.2.2, I'm closing it.
Comment 14 Timothy Arceri 2018-10-05 10:57:30 UTC
(In reply to Juan A. Suarez from comment #13)
> As this is fixed, and published in mesa 18.2.2, I'm closing it.

This isn't fixed yet :)

Another bug was found using the trace from this bug, as per the commit message that fix does not fix the primary issue of back triangles from this bug.
Comment 15 Timothy Arceri 2018-11-01 00:30:59 UTC
Created attachment 142313 [details] [review]
Hack around issue

I've found the source of the problem. It seems that the tgsi indirect indexing optimisation is causing issues on Vega for some reason. I've attached a hack which disables it resulting in correct rendering.
Comment 16 Timothy Arceri 2018-11-01 00:37:59 UTC
Created attachment 142314 [details]
Renderdoc capture

Also attaching a renderdoc capture of the issue.
Comment 17 Timothy Arceri 2018-11-01 06:39:55 UTC
After talking this over with Marek here is a summary of the problem.

LLVM's VGPR indexing code on gfx9+ is broken for immediate arrays. Usually this is not a problem as GLSL IR in mesa will lower these to Uniforms via lower_const_arrays_to_uniforms(). However this does not work for the shaders in Civ6 because these arrays are not actually defined as constant arrays for example the original shader looks like this:

	vec4 x0[3];
	vec4 x1[6];
	vec4 x2[6];
	vec4 x3[6];

	x0[0].xy = vec2(0.031250, 0.500000);
	x0[1].xy = vec2(0.968750, 0.031250);
	x0[2].xy = vec2(0.968750, 0.968750);
	x1[0].xy = vec2(1.000000, 1.000000);
	x1[1].xy = vec2(0.000000, 1.000000);
	x1[2].xy = vec2(-1.000000, 0.000000);
	x1[3].xy = vec2(-1.000000, -1.000000);
	x1[4].xy = vec2(0.000000, -1.000000);
	x1[5].xy = vec2(1.000000, 0.000000);
	x2[0].xy = vec2(1.000000, -1.000000);
	x2[1].xy = vec2(2.000000, 1.000000);
	x2[2].xy = vec2(1.000000, 2.000000);
	x2[3].xy = vec2(-1.000000, 1.000000);
	x2[4].xy = vec2(-2.000000, -1.000000);
	x2[5].xy = vec2(-1.000000, -2.000000);

Without SSA I don't see any way for GLSL IR to easily recognise this as a constant array. Unfortunately by the time LLVM is done it is recognised and is exposed to the buggy indexing support.
Comment 18 Sergio Marcelo 2018-12-01 11:26:15 UTC
Video of problem happening:
https://www.youtube.com/watch?v=E4oy8tqaYs0

My set up: Ubuntu 18.10, AMD Ryzen 2700X, RX Vega 56

This is the workaround that worked for me:

* Beyond adding "R600_DEBUG=nir" to game launcher options (in steam client), also  start "Steam Launcher" in a terminal with "R600_DEBUG=nir" environment variable set. And it will work.

When bugfix is published let me know to test it.
Comment 19 oliver.triebel 2019-03-12 10:44:49 UTC
I also confirm success on Ryzen 2200g system.

Steam launch property `R600_DEBUG=nir %command%`
fixed texture flickering/artifacts in CIV6 (that appeared on strategic map).


olly@ryzen-pc1:~$ uname -a
Linux ryzen-pc1 5.0.0-050000-generic #201903032031 SMP Mon Mar 4 01:33:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
olly@ryzen-pc1:~$ glxinfo | grep "OpenGL version"
OpenGL version string: 4.5 (Compatibility Profile) Mesa 18.3.3
olly@ryzen-pc1:~$ lspci -nnk | grep -i VGA -A2 
38:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] [1002:15dd] (rev c8)
    Subsystem: Micro-Star International Co., Ltd. [MSI] Vega [Radeon Vega 8 Mobile] [1462:7a39]
    Kernel driver in use: amdgpu
olly@ryzen-pc1:~$ lsb_release -a
No LSB modules are available.
Description:    Ubuntu 18.04.2 LTS
Comment 20 Connor Abbott 2019-08-01 14:09:56 UTC
I wanted to make sure that improving the NIR path to reach parity with TGSI in local variable handling wouldn't break things, so I investigated this a bit more. It seems this is triggered by the fact that on Vega the TGSI path always uses scratch, even for smaller local arrays. This bloats the scratch space used by the VS in question. There are three back-to-back draw calls with this VS (used to build up the map), each using scratch, and it seems that radeonsi doesn't properly wait for each call to be done before starting the next and reuses the same scratch buffer, resulting in the threads from one draw call overwriting the scratch of the previous call. Hacking si_update_spi_tmpring_size() to always allocate a new scratch buffer "fixes" the black triangles.
Comment 21 Jason Playne 2019-08-01 14:13:19 UTC
(In reply to Connor Abbott from comment #20)
> I wanted to make sure that improving the NIR path to reach parity with TGSI
> in local variable handling wouldn't break things, so I investigated this a
> bit more. It seems this is triggered by the fact that on Vega the TGSI path
> always uses scratch, even for smaller local arrays. This bloats the scratch
> space used by the VS in question. There are three back-to-back draw calls
> with this VS (used to build up the map), each using scratch, and it seems
> that radeonsi doesn't properly wait for each call to be done before starting
> the next and reuses the same scratch buffer, resulting in the threads from
> one draw call overwriting the scratch of the previous call. Hacking
> si_update_spi_tmpring_size() to always allocate a new scratch buffer "fixes"
> the black triangles.

Thanks heaps for looking into the issue Conner. Looking at the explanation on what was happening makes it sound simple - I am sure the debugging effort was far greater!

<3
Comment 22 Marek Olšák 2019-08-16 18:17:02 UTC
Connor, the hardware manages the scratch buffer alloc/dealloc. You don't have to allocate more than one.

The problem with Civ VI is that VGPR indexing has never been properly implemented for gfx9 in LLVM.
Comment 23 Marek Olšák 2019-08-19 19:28:15 UTC
Connor, there is indeed an issue with how we set SPI_TMPRING_SIZE and same for compute.
Comment 24 Timothy Arceri 2019-08-19 23:06:04 UTC
(In reply to Marek Olšák from comment #23)
> Connor, there is indeed an issue with how we set SPI_TMPRING_SIZE and same
> for compute.

I wonder if this is the issue reported in bug #108194
Comment 26 gurchetansingh 2019-08-23 03:50:39 UTC
An issue similar occurs when running Civ6 on Virgl.  Is there any to disable TGSI indirect indexing for testing purposes?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.