Bug 103544 - Graphical glitches r600 in game this war of mine linux native
Summary: Graphical glitches r600 in game this war of mine linux native
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: 17.3
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-02 12:39 UTC by Vitalii
Modified: 2017-11-15 02:22 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
black artifacts (1.16 MB, image/png)
2017-11-02 12:39 UTC, Vitalii
Details
enable simple_float logic for blending (1.01 KB, patch)
2017-11-04 14:49 UTC, Ilia Mirkin
Details | Splinter Review

Description Vitalii 2017-11-02 12:39:04 UTC
Created attachment 135209 [details]
black artifacts

the problem with artifacts is observed in mesa 17.2 mesa 17.3. When the version is rolled back to 17.0.5 the problem disappears and the game works without artifacts.
I enclose the screenshots:
Comment 1 Emil Velikov 2017-11-02 13:27:11 UTC
Vitalii can you bisect Mesa to the commit that broke the game? There aren't many developer working on r600 - so this would be greatly beneficial.
Comment 2 Vitalii 2017-11-02 13:36:48 UTC
tell me how and what to do and I'll try
Comment 3 Vitalii 2017-11-02 14:01:50 UTC
(In reply to Emil Velikov from comment #1)
> Vitalii can you bisect Mesa to the commit that broke the game? There aren't
> many developer working on r600 - so this would be greatly beneficial.

tell me how and what to do and I'll try
Comment 4 Roland Scheidegger 2017-11-03 15:48:48 UTC
(In reply to Vitalii from comment #3)
> (In reply to Emil Velikov from comment #1)
> > Vitalii can you bisect Mesa to the commit that broke the game? There aren't
> > many developer working on r600 - so this would be greatly beneficial.
> 
> tell me how and what to do and I'll try

git bisect is easy, albeit building 32bit mesa on a 64bit distro might be somewhere from challenging to near impossible depending on the distro...
You just do
git bisect start (probably on mesa master branch, but 17.2 branch should work too)
git bisect good <sha id of a commit where things are working)
git bisect bad <sha id of a commit where things are broken)
build mesa, test (make sure testing really uses the newly compiled driver...)
git bisect good (or bad), build, test, rinse and repeat...
Comment 5 Roland Scheidegger 2017-11-04 02:48:57 UTC
I've actually got that game myself here.
So I did the bisect and the winner is:
ce7a045feeef8cad155f1c9aa07f166e146e3d00 is the first bad commit
commit ce7a045feeef8cad155f1c9aa07f166e146e3d00
Author: Ilia Mirkin <imirkin@alum.mit.edu>
Date:   Mon Jan 23 20:53:50 2017 -0500

    r600g: use ieee variants of multiplication instructions
    
    This matches the behavior of most other drivers, including nouveau,
    radeonsi, and i965.
    
    Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
    Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

Looks like some numerical issue then, albeit I don't know if the game is at fault here.
Comment 6 Ilia Mirkin 2017-11-04 03:21:39 UTC
(In reply to Roland Scheidegger from comment #5)
> I've actually got that game myself here.
> So I did the bisect and the winner is:
> ce7a045feeef8cad155f1c9aa07f166e146e3d00 is the first bad commit
> commit ce7a045feeef8cad155f1c9aa07f166e146e3d00
> Author: Ilia Mirkin <imirkin@alum.mit.edu>
> Date:   Mon Jan 23 20:53:50 2017 -0500
> 
>     r600g: use ieee variants of multiplication instructions
>     
>     This matches the behavior of most other drivers, including nouveau,
>     radeonsi, and i965.
>     
>     Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
>     Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
> 
> Looks like some numerical issue then, albeit I don't know if the game is at
> fault here.

Apologies for the trouble.

The main difference between IEEE and non-IEEE is whether 0 * infinity = 0 or NaN. IEEE makes it mean NaN. DX9 behavior is 0. I added a flag to be used by st/nine to enable the DX9 behavior optionally, but leave the IEEE behavior for GLSL. (There was some additional desire to expose that in a GL ext for WINE to use, but it got shot down pretty quickly.)

Perhaps there are other changes from using the IEEE instruction variants, e.g. denorms, which would be undesirable. I was never too familiar with the R600 ISA.
Comment 7 Roland Scheidegger 2017-11-04 06:00:07 UTC
(In reply to Ilia Mirkin from comment #6)
> The main difference between IEEE and non-IEEE is whether 0 * infinity = 0 or
> NaN. IEEE makes it mean NaN. DX9 behavior is 0. I added a flag to be used by
> st/nine to enable the DX9 behavior optionally, but leave the IEEE behavior
> for GLSL. (There was some additional desire to expose that in a GL ext for
> WINE to use, but it got shot down pretty quickly.)
> 
> Perhaps there are other changes from using the IEEE instruction variants,
> e.g. denorms, which would be undesirable. I was never too familiar with the
> R600 ISA.

I don't think these chips can do denorms at all.
I quickly looked at some trace, and indeed it looks like NaNs popping up in some RT (which has a rgba16f format), and in that case it will then show as black in the final output later.
I could not figure out what fragment shader is responsible for it, the NaNs are always surrounded by pixels which are all black (hence making them indistinguishable in qapitrace visually), plus that texture which gets the NaNs is also blitted to from other textures via glBlitFramebuffer, which also already has NaNs and so on, and I didn't invest all that much time...
I guess though the question is why other mesa drivers render it correctly and how they avoid the NaNs if they also use ieee conformant behavior (if they actually render it correctly...).
Comment 8 Ilia Mirkin 2017-11-04 14:34:13 UTC
(In reply to Roland Scheidegger from comment #7)
> I guess though the question is why other mesa drivers render it correctly
> and how they avoid the NaNs if they also use ieee conformant behavior (if
> they actually render it correctly...).

If you can get me a trace, I can have a look to see what happens with nouveau.

There's additional implications from having NaN's in RT outputs, i.e. what happens in blending functions. I believe that nouveau configures the GPU for the "zero wins" style of multiplication in the blending unit. And separately, same thing for color clamps (I believe NVIDIA hw maps NaN -> 0 unconditionally for that case).
Comment 9 Ilia Mirkin 2017-11-04 14:49:29 UTC
Created attachment 135237 [details] [review]
enable simple_float logic for blending

Entirely untested (not even compiled) patch to force blending to be 0*NaN=0. Looks like radeonsi sets this bit, so it's not completely crazy.
Comment 10 Ilia Mirkin 2017-11-06 04:52:27 UTC
(In reply to Ilia Mirkin from comment #9)
> Created attachment 135237 [details] [review] [review]
> enable simple_float logic for blending
> 
> Entirely untested (not even compiled) patch to force blending to be 0*NaN=0.
> Looks like radeonsi sets this bit, so it's not completely crazy.

Note that it looks like SIMPLE_FLOAT merely makes an optimization which allows a GL_ZERO factor to cause a 0 to result... sometimes. But it could be enough here. If an infinity or NaN makes it into a dest buffer, without this it will ruin any future blending done, even if there's an attempt to override the destination value.

Patch is available at https://patchwork.freedesktop.org/patch/186599/ (although I'll have to rework the description)
Comment 11 Roland Scheidegger 2017-11-06 20:33:50 UTC
(In reply to Ilia Mirkin from comment #10)
> Patch is available at https://patchwork.freedesktop.org/patch/186599/
> (although I'll have to rework the description)

Doesn't help, everything looks the same.
FWIW I (accidentally...) also tested without float rt support (got really confused first with the results...), and there's still corruption when the game uses ordinary (gl_rgba) rt. I'm not sure the rendering is otherwise identical, but at least SOME corruption disappears (in particular, the black boxes), whereas other corruption remains exactly the same (some vertical stripes sometimes, and the fires still are black in the interior).
Comment 12 Roland Scheidegger 2017-11-06 21:14:55 UTC
Here's a apitrace for this (1GB, of course the corruption is only seen towards the end...), should be available a week (?):
https://we.tl/EanuxRG7Yf
Comment 13 Ilia Mirkin 2017-11-07 19:20:10 UTC
(In reply to Roland Scheidegger from comment #12)
> Here's a apitrace for this (1GB, of course the corruption is only seen
> towards the end...), should be available a week (?):
> https://we.tl/EanuxRG7Yf

Well, this all replays fine with nouveau, even if I remove the blending config which makes 0*nan=0. At least I couldn't see any artifacts.

So this all points to ... sadness. Can you check if you still see artifacts with nosb?
Comment 14 Roland Scheidegger 2017-11-08 00:25:25 UTC
(In reply to Ilia Mirkin from comment #13)
> (In reply to Roland Scheidegger from comment #12)
> > Here's a apitrace for this (1GB, of course the corruption is only seen
> > towards the end...), should be available a week (?):
> > https://we.tl/EanuxRG7Yf
> 
> Well, this all replays fine with nouveau, even if I remove the blending
> config which makes 0*nan=0. At least I couldn't see any artifacts.
> 
> So this all points to ... sadness. Can you check if you still see artifacts
> with nosb?

Tried that, doesn't help.

I noticed some things looking questionable in the driver:
- the driver doesn't use the ieee variants consistently (e.g. muls from lerps).
There's also some inconsistencies - for r600, the comments say for non-gl usage the ieee variant should be used for rcp, however eg/cayman will use that anyway, and I don't think handling it differently between these drivers is done on purpose.
- the driver uses min/max instead of min_dx10/max_dx10 (though at least the EG ISA docs are wrong/inaccurate what they exactly do). These would kill off NaNs (albeit I am not entirely sure if the non-dx10 version doesn't kill off NaNs neither, given the docs aren't accurate in the first place...). I think this probably should be changed (as far as I can tell, radeonsi also uses effectively the dx10 versions, since llvm.min/maxnum is specified as selecting non-nan operands, though I don't know if the backend honors it really). This actually does help things, it removes the black holes in the fires, and the "vertical blue stripes" (albeit the latter wasn't really in that trace). But otherwise there's still lots of black boxes around.
Comment 15 Roland Scheidegger 2017-11-08 00:36:35 UTC
(In reply to Roland Scheidegger from comment #14)
> - the driver doesn't use the ieee variants consistently (e.g. muls from
> lerps).
> There's also some inconsistencies - for r600, the comments say for non-gl
> usage the ieee variant should be used for rcp, however eg/cayman will use
> that anyway, and I don't think handling it differently between these drivers
> is done on purpose.

Actually that seems to be the problem - on r600, rcp/rsq_clamped are used (btw does someone know why rsq has its own code expansion on r600, but not eg/cayman)?
But eg/cayman use rcp/rsq_ieee. It looks like if I use rcp_clamped instead that is enough to fix everything (regardless what rsq/min/max are using).
Comment 16 Roland Scheidegger 2017-11-15 02:22:49 UTC
This should now be fixed by 3835009796166968750ff46cf209f6d4208cda86 (and the preceding commit).

(Albeit I'm wondering now if I should have used stable tag for these two...).

Closing this, albeit I suppose the simple float blending could still be done.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.