Bug 101739 - An issue with alpha-to-coverage handling is causing Arma 3 64-bit Linux port to render trees incorrectly
Summary: An issue with alpha-to-coverage handling is causing Arma 3 64-bit Linux port ...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 77449
  Show dependency treegraph
 
Reported: 2017-07-10 14:32 UTC by Krystian Gałaj
Modified: 2018-10-04 05:03 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Krystian Gałaj 2017-07-10 14:32:12 UTC
The game is sometimes rendering trees. It has multiple tree mesh models, and when the tree gets close, the game is switching between different LODs/models by rendering both models overlapping on a grid of pixels, as if checkerboard, pixels of old LOD in black fields of it, new one in white.

As long as it’s doing that using single-sampled buffers as render targets, all is fine. However in higher quality modes, the game switches to multisampled buffers. When using multisampled buffers, if ATOC is turned on in settings, the game is using alpha-to-coverage technique without polygon sorting to make sure that the grass is rendered correctly. At the same time it’s using depth test and depth write to fill up the depth buffer. In the same execution of fragment shader it sets the output color and alpha value, and the depth buffer value, and it expects the alpha value to cause the color value to end up only in some samples of multisampled texture, and - the important bit - the depth value to end up only in the corresponding samples of the depth buffer.

This technique works on DX11 on Windows, on all drivers on Mac OS X, and on NVidia drivers on Linux, but on Mesa Radeonsi drivers on Linux it makes the rendering go bad, rendering most of the tree pixels white until the LOD transition ends.

The white is visible on the screen because it is the initial color of the multisampled render target, and for some reason this color is allowed to leak through, which means that in some cases the depth buffer value is set, when the corresponding color buffer value isn’t.

Both depth and color buffer have the same number of samples (8), and were created with fixedSamples = true in OpenGL call.

It looks as if the depth buffer values in case of Mesa Radeonsi driver were:
 - correctly not written when the depth test fails for the fragment shader,
 - correctly written to all samples of the depth buffer when alpha coverage directs the draw to fill them all, but
 - INCORRECTLY assigned not only to those samples to which alpha coverage directed the fragment output color, but either to all the samples in depth buffer, or to samples in depth buffer that don’t correspond to the samples in color attachment buffer.

We encountered the same issue in 2015 in fglrx drivers, contacted AMD team about it, and received confirmation that it is a bug, and that it was fixed. Unfortunately, shortly after that fglrx drivers went out of use.

The issue can be easily reproduced in Arma 3 by:
 - launching Arma 3 game,
 - switching to High quality or higher to turn on multisampled buffers,
 - enabling ATOC (alpha to coverage) in Video settings, or making sure it's enabled,
 - launching first level of Drawdown 2035 campaign, ie. starting a new campaign, bypassing optional tutorial, and observing as the main character walks into the base, the bushes flicker white from time to time as they get closer.

The issue is present in Mesa 17.2.0-devel from padoka PPA, on at least Radeon R7 260X/360.
Comment 1 Roland Scheidegger 2017-07-16 00:25:56 UTC
My naive conclusion from the description would be that the hw is doing earlyz optimizations when it shouldn't (so, the depth updates happen before the final sample mask modified by the alpha value is known).
The driver doesn't set if early z is enabled directly, since it just sets EARLY_Z_THEN_LATE_Z most of the time, and the hw should figure out if early z is possible (taking into account all state), otherwise use late z.
If that's the case, then overriding the Z_ORDER to LATE_Z for the db_shader_control value might be necessary in this case.
But that's just a guess, I could be completely wrong here - I've got only a very rough idea of radeonsi hw and driver...
Comment 2 Jan 2017-11-13 21:28:36 UTC
I also have this problem. Is there a way to force/override Z_ORDER to LATE_Z, at best per application, so that I can try whether this has any effect?
If not: what other way is there to test it? I am willing to e.g. patch, compile and test some code if somebody tells me what to do, no promise when I'll find the time for that though.
Comment 3 Roland Scheidegger 2017-11-26 00:59:50 UTC
(In reply to Jan from comment #2)
> I also have this problem. Is there a way to force/override Z_ORDER to
> LATE_Z, at best per application, so that I can try whether this has any
> effect?
> If not: what other way is there to test it? I am willing to e.g. patch,
> compile and test some code if somebody tells me what to do, no promise when
> I'll find the time for that though.

You can't override that, you'd need a mesa patch looking something like this:
diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c
index fcf4928e65..13e44dac16 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1417,6 +1417,11 @@ static void si_emit_db_render_state(struct si_context *sctx, struct r600_atom *s
                db_shader_control |= S_02880C_Z_ORDER(V_02880C_LATE_Z);
        }
 
+       if (sctx->queued.named.blend->alpha_to_coverage) {
+               db_shader_control &= C_02880C_Z_ORDER;
+               db_shader_control |= S_02880C_Z_ORDER(V_02880C_LATE_Z);
+       }
+

Albeit probably would need to add a blend dependency like this too:
@@ -658,6 +658,10 @@ static void si_bind_blend_state(struct pipe_context *ctx, void *state)
             old_blend->dual_src_blend != blend->dual_src_blend)
                si_mark_atom_dirty(sctx, &sctx->cb_render_state);
 
+       if (!old_blend ||
+            old_blend->alpha_to_coverage != blend->alpha_to_coverage)
+               si_mark_atom_dirty(sctx, &sctx->db_render_state);
+
        si_pm4_bind_state(sctx, blend, state);
 
        if (!old_blend ||

But as said, I really don't have much knowledge of the driver.
Comment 4 Patryk Nadrowski 2017-12-30 13:45:29 UTC
You can fix that issue by added this entry to drirc file:
-------
<application name="ARMA 3" executable="arma3.x86_64">
<option name="glsl_correct_derivatives_after_discard" value="true"/>
</application>
-------
I hope that it will be permanently added to mesa official drirc file.
Comment 5 ysblokje 2017-12-30 14:07:34 UTC
(In reply to nadro-linux from comment #4)
> You can fix that issue by added this entry to drirc file:
> -------
> <application name="ARMA 3" executable="arma3.x86_64">
> <option name="glsl_correct_derivatives_after_discard" value="true"/>
> </application>
> -------
> I hope that it will be permanently added to mesa official drirc file.

I just tried this, has no effect for me. At least not on mesa 17.3.1
Comment 6 ysblokje 2017-12-30 14:21:13 UTC
addendum : using the 
glsl_correct_derivatives_after_discard=true commandline option does work. 

But a big but : FPS took a nosedive.
Comment 7 tom34 2018-03-16 14:07:29 UTC
I using mesa3d (17.3.6) from stable padoka repo and i see white bushes in game, here is the example:

(Headphone warning, mic volume too high)
https://www.twitch.tv/videos/239136303?t=01m10s
Comment 8 tom34 2018-03-16 14:20:29 UTC
I see that Bohemia mentioned about this bug here:
https://community.bistudio.com/wiki/Arma_3_Experimental_Ports#Known_Issues


"AMD Mesa drivers can cause graphical glitches, such as white blinking vegetation LODs."

"ATOC might cause rendering issues with AMD cards using MESA drivers."
Comment 9 Jan Havran 2018-03-17 20:37:44 UTC
I can confirm this bug (and also other users are facing it, like [1]).

My spec:
Distribution: Antergos 64b
Linux: 4.15.9
Mesa 17.3.6
Arma 1.80

Processor: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz
GPU: AMD Radeon R7 M265

[1] https://www.gamingonlinux.com/articles/the-linux-beta-of-arma-3-has-been-updated-to-180-compatible-with-windows-again-for-a-time.11349/page=1#r115868
Comment 10 Gregor Münch 2018-04-10 18:04:49 UTC
Some comments from a VP dev: https://www.gamingonlinux.com/articles/the-linux-beta-of-arma-3-has-been-updated-to-180-compatible-with-windows-again-for-a-time.11349/comment_id=118838

Seems to be that it's not clear if the bug is on Mesa or VPs side. Maybe some Mesa dev could comment.
Comment 11 Marek Olšák 2018-04-10 20:52:17 UTC
If "glsl_correct_derivatives_after_discard=true" fixes it, it's not an alpha-to-coverage issue. It's a problem with the use of discard in GLSL.

There is no difference in alpha-to-coverage behavior between DX and GL. Make sure you have this fix:
https://cgit.freedesktop.org/mesa/mesa/commit/?id=f222cfww3c6d6fc5d9dee3742d20aa77cfff9c39f8

If there is a difference between DX and GL, the GL specification can be corrected or a new GL extension can be added for the DX behavior.
Comment 12 Ilia Mirkin 2018-04-10 21:05:19 UTC
Probably meant this change:

https://cgit.freedesktop.org/mesa/mesa/commit/?id=f222cf3c6d6fc5d9dee3742d20aa77cfff9c39f8
Comment 13 Marek Olšák 2018-04-10 21:12:10 UTC
Yes. Thanks.
Comment 14 Krystian Gałaj 2018-04-19 07:56:09 UTC
(In reply to Gregor Münch from comment #10)
> Some comments from a VP dev:
> https://www.gamingonlinux.com/articles/the-linux-beta-of-arma-3-has-been-
> updated-to-180-compatible-with-windows-again-for-a-time.11349/
> comment_id=118838
> 
> Seems to be that it's not clear if the bug is on Mesa or VPs side. Maybe
> some Mesa dev could comment.

I am not sure what we could do on VP side to work around the bug. It happens in a single execution of fragment shader on a multisampled color buffer and depth buffer. The same execution is writing a color value, and it's supposed to write a depth value into the corresponding sample of depth buffer. I don't know of any additional keywords in GLSL that we could specify to make sure this is the case. If anyone knows about something we're specifying wrong, please advise.

As for rendering techniques used, we are only converting the rendering technique used by the original Arma 3 developer team from Direct3D API to OpenGL. So one way of working around the problem would be to ask them to do LOD switching in another way in future release - and then we could port that new version. But since it's not happening on the same graphics cards on Windows or Mac, only on Linux, it isn't likely this rework would be given any high priority. And we are good at API knowledge and conversion between them, but inventing another technique to swap in for existing technique in not our game requires slightly different approach, and, above all, good knowledge of the entire complicated rendering engine used in the game, so as not to break anything.

I don't think that working around the problem is a good thing to mention in a bug ticket... this thing might be happening in other games, maybe not so high profile, so it would make sense to fix it in driver. It can be done, if it's working on the same cards using Windows drivers...
Comment 15 Marek Olšák 2018-04-21 04:58:28 UTC
We can add "glsl_correct_derivatives_after_discard=true" to Mesa's drirc and call it a day.
Comment 16 Gregor Münch 2018-08-04 19:24:21 UTC
Tested today and the bug is still there with current git (also with nir). The workaround also still works and at least on my Tahiti card I actually experience no performance drop.
Comment 17 Marek Olšák 2018-10-04 05:03:53 UTC
I pushed the workaround as commit 8e0b4cb8a1fcb1572be8eca16a806520aac08a61. Closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.