Bug 109404 - [ANV] The Witcher 3 shadows flickering
Summary: [ANV] The Witcher 3 shadows flickering
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/intel (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Matt Turner
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords: bisected, regression
Depends on:
Blocks:
 
Reported: 2019-01-21 12:11 UTC by nagrigoriadis
Modified: 2019-02-15 20:19 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
vulkaninfo (111.78 KB, text/plain)
2019-01-21 12:11 UTC, nagrigoriadis
Details
Even with patch applied, glitch still happens, but less so. (921.52 KB, image/png)
2019-01-21 19:00 UTC, nagrigoriadis
Details
Witcher 3 save (939.87 KB, application/zip)
2019-01-22 10:01 UTC, nagrigoriadis
Details
Shader in question, correct and incorrect NIRs (8.11 KB, application/gzip)
2019-01-22 17:29 UTC, Danylo
Details
Patch to comment out line as per Danylos suggestion (588 bytes, patch)
2019-01-24 06:22 UTC, nagrigoriadis
Details | Splinter Review

Description nagrigoriadis 2019-01-21 12:11:45 UTC
Created attachment 143177 [details]
vulkaninfo

When playing The Witcher 3 (from Steam) on a Kabylake Core i7-8650U on a Dell notebook with a UHD620 I get a weird texture flickering when items are in shadow.
Generally the game is too dark,
I did a video recording https://drive.google.com/open?id=1MohQ4EGNQ94PDaS6a9wbjy5yIdYirE5G to highlight the issue.

This is using DXVK 0.94 bundled with Proton 3.16-6.
Software information

The Witcher 3 from Steam, 1280x720 all lowest except for texture resolution which is medium.

System information:
    GPU: Intel UHD620
    Driver: AMV from mesa git (also happened on MESA 18.3.0)
    Wine version: Wine 3.16 bundles with Proton 3.16-6
    DXVK version: DXVK 0.94 bundled with Proton 3.16-6
    OS: Gentoo Linux, 4.20.1 (and the same on 4.19.8), KDE5 desktop with compositing disabled.

Initially reported here: https://github.com/doitsujin/dxvk/issues/876
Comment 1 Danylo 2019-01-21 16:19:13 UTC
Hi, thanks for the report!

Bisected to:

commit 3561108de089f383bb7733eaf49e3d517994b51c
Author: Timothy Arceri <tarceri@itsqueeze.com>
Date:   Wed Nov 7 14:29:18 2018 +1100

    anv/i965: make use of nir_link_constant_varyings()

    shader-db results for SLK:

    total instructions in shared programs: 13106498 -> 13091573 (-0.11%)
    instructions in affected programs: 1186244 -> 1171319 (-1.26%)
    helped: 6186
    HURT: 0

    total cycles in shared programs: 332062633 -> 331961653 (-0.03%)
    cycles in affected programs: 8537165 -> 8436185 (-1.18%)
    helped: 5371
    HURT: 862

    LOST:   6
    GAINED: 14

    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

Which is just enabling of the feature from:

commit d40dd05553e548b648c77394424b59cc0abca199
Author: Timothy Arceri <tarceri@itsqueeze.com>
Date:   Fri Nov 9 09:24:11 2018 +1100

    nir: add new linking opt nir_link_constant_varyings()

    This pass moves constant outputs to the consuming shader stage
    where possible.

    Reviewed-by: Eric Anholt <eric@anholt.net>

In RenderDoc the issue is manifested as "inverted" normal map of the scene.

I tested at the very beginning of the game and it looks like the issue you have encountered. However it does not happen on 18.3.0.

Will look into it tomorrow if it won't be fixed until then.
Comment 2 nagrigoriadis 2019-01-21 18:01:32 UTC
Thanks for the quick response Danylo.

I re-build an un-patched Mesa 18.3.2 just now, and re-tested (no other change), and you are right. The issue is vastly improved. There is a single flicker I can get on the same scene, but once per session only.

It might have manifested itself as me patching Mesa with the transform feedback patches? (Which don't apply themselves cleanly to Mesa git anymore, but that is a different issue)
Comment 3 nagrigoriadis 2019-01-21 18:36:33 UTC
I rebuilt mesa git and did a revert of commit 3561108de089f383bb7733eaf49e3d517994b51c and the issue also seems resolved.
Comment 4 nagrigoriadis 2019-01-21 19:00:53 UTC
Created attachment 143178 [details]
Even with patch applied, glitch still happens, but less so.

Even with the match to reverse the aforementioned reverted patch, I do still get some glitches such as the attached image.

It occurs less frequent, but most often with buildings or in caves.
Comment 5 nagrigoriadis 2019-01-22 06:44:22 UTC
After playing The Witcher 3 some more, I came to the conclusion that the constant folding in the identified patch really just makes the bug trigger more likely.

It still happens quite often, just less as often. Now that I know what to look for.

I'll donwngrade to 18.3.2 again to confirm wether this is really a regression in git, or a longer standing bug.
Comment 6 nagrigoriadis 2019-01-22 07:16:14 UTC
Confirmed to still be there with v18.3.2

It looks like some lighting normals get inverted and flicker between dark and light.

See this video: https://drive.google.com/file/d/1OxKno4pqjqipgOxYbDOlDC60ogqvk3Ft/view
That wooden house should not be so dark?

I'll try with 18.2.8 next, but can't do so immediately.
Comment 7 Danylo 2019-01-22 09:12:49 UTC
Could you attach your game save at the location where the issue is visible? Since without nir_link_constant_varyings I cannot reproduce it in the very beginning of the game.
Comment 8 nagrigoriadis 2019-01-22 10:01:45 UTC
Created attachment 143197 [details]
Witcher 3 save

Hi Danylo

I attached the save file just by the last video I did.
Comment 9 Danylo 2019-01-22 10:37:43 UTC
Thanks for the save, unfortunately I don't have expansions it is made with. I'll travel then with debug console.
Comment 10 Danylo 2019-01-22 11:15:59 UTC
Interesting, I have traveled around with and without the rain and cannot spot any issues. I have tested it on commit right before 3561108de089f383bb7733eaf49e3d517994b51c and on master without this commit changes.

Could you tell the exact location it happens, preferably nearby signpost?

All settings set to low (no manual changes in config files).

I have HD Graphics 620.
Comment 11 nagrigoriadis 2019-01-22 11:40:58 UTC
Its in the town of Crow's Perch in Velen.
Go to the Crow's Perch signpost, just go north over the long bridge into the village there. The save was at night (and raining), but I see artefacts all the way going up to the Manor house at the top.

All settings at low, except for texture resolution to medium. (but it happens for me in low). No manual changes in config filed. Using a UHD620 as well.

I just tested it with Mesa 18.2.8, and had the same issues (Only noticeable difference was that it ran much slower).

I then deleted both the mesa shader cache and the witcher3.dxvk-cache in case there was an issue from there, but the issue persists.

So I have had this happen with:
* Mesa 18.2.8
* Mesa 18.3.2
* Mesa git (worst)
* Mesa git + revert of 3561108de089f383bb7733eaf49e3d517994b51c

The unpatched Mesa git is by far the worst, but the issues manifest themselves as either the same, or very similar.

I wonder if the constant folding/lowering only just made a certain bug trigger much more often?

Or the issue could be due to something else altogether on my system.
Comment 12 Danylo 2019-01-22 12:12:41 UTC
I indeed see the issue at that place even without bisected commit.
Comment 13 Danylo 2019-01-22 17:28:08 UTC
> I wonder if the constant folding/lowering only just made a certain bug trigger much more often?

Yes, looks like it.

After staring at shader in RenderDoc I found that something wrong with 1-bit boolean optimizations.

Commenting out 

(('bcsel', a, -1, 0), ('ineg', ('b2i', 'a@1'))),

in src/compiler/nir/nir_opt_algebraic.py solves the issue, this optimization is part of optimizations for 1-bit booleans introduced in:

6bcd2af0
nir/algebraic: Add some optimizations for D3D-style Booleans 

I have some doubts that this exact optimization is a root of the issue but still it's near enough.

So in the shader there is next contruction:

 vec4 _893 = shader_in[4];
 _893.x = uintBitsToFloat(gl_FrontFacing ? 4294967295u : 0u);
 shader_in[4] = _893;

 ...

 _613.x = uintBitsToFloat((0u >= floatBitsToUint(shader_in[4].x)) ? 4294967295u : 0u);
    r1 = _613;
    if (floatBitsToUint(r1.x) != 0u)
    {
        // Something that happens when it should not
    }


When problematic optimization kicks in - it is reduced to:

	vec1 32 ssa_14 = intrinsic load_front_face () ()
	vec1 32 ssa_15 = b2i32 ssa_14
	vec1 32 ssa_16 = imov -ssa_15

        ...

        vec1 32 ssa_200 = uge32 ssa_1, ssa_16
	/* succs: block_1 block_2 */
	if ssa_200 {
		block block_1:

And without it:

        vec1 32 ssa_14 = intrinsic load_front_face () ()

        ...
        
        vec1 32 ssa_198 = inot ssa_14
	/* succs: block_1 block_2 */
	if ssa_198 {
		block block_1:


The first one is incorrect, I was unable to find how it happens today.
I'll attach shader and the resulted NIR.
Comment 14 Danylo 2019-01-22 17:29:07 UTC
Created attachment 143202 [details]
Shader in question, correct and incorrect NIRs
Comment 15 Mark Janes 2019-01-22 18:16:56 UTC
since the bisection was wrong, re-assigning...
Comment 16 nagrigoriadis 2019-01-24 06:22:38 UTC
Created attachment 143218 [details] [review]
Patch to comment out line as per Danylos suggestion

Thanks Danylo

I applied the attached patch (as per four findings), without reversing the original patch, to latest Mesa git.

And I have to say, the game renders perfectly now :-)

Thanks for your hard work!
Comment 17 nagrigoriadis 2019-01-24 06:40:43 UTC
Well, not perfectly. Now there is some polygons that flicker, but I'm certain it is unrelated to this issue.

It also feels slightly slower? (not scientific at all, as I am in a different area of the game, and that area could perform worse :P )
Comment 18 Danylo 2019-01-28 09:36:38 UTC
What have I found on Friday:

The NIR I have attached is not a problem - it's a correct one as Jason corrected me on IRC.

The issue seems to be even further down, in native code such comparison generated:

asr.le.f0.0(8)  null<1>D        -g0<0,1,0>W     15D             { align1 1Q };

...

(+f0.0) if(8)   JIP: 128        UIP: 128                        { align1 1Q };

The 15th bit of that register means:

  0: Front Facing 
  1: Back Facing

Then it is negated to have:

  0: Back Facing
  ~0: Front Facing

Then the *signed* .le (less or equal) with zero, as I see it - this is the wrong part, ~0 < 0 when comparison is signed however the initial intention was to have "false" in this comparison when triangle is front facing.

Manually changing null<1>D to null<1>UD fixes the issue.

How such comparison is created?

brw_fs_cmod_propagation.cpp, in opt_cmod_propagation_local, line 323 - CONDITION_LE modifier is propagated when condition operates on two REGISTER_TYPE_UD while scan_inst (opcode: asr) has dst: REGISTER_TYPE_D and src: REGISTER_TYPE_W.

From my point of view it's not a correct thing to do, but I'm not sure what to do here.
Comment 19 Matt Turner 2019-01-29 00:37:37 UTC
(In reply to Danylo from comment #18)
> What have I found on Friday:
> 
> The NIR I have attached is not a problem - it's a correct one as Jason
> corrected me on IRC.
> 
> The issue seems to be even further down, in native code such comparison
> generated:
> 
> asr.le.f0.0(8)  null<1>D        -g0<0,1,0>W     15D             { align1 1Q
> };
> 
> ...
> 
> (+f0.0) if(8)   JIP: 128        UIP: 128                        { align1 1Q
> };
> 
> The 15th bit of that register means:
> 
>   0: Front Facing 
>   1: Back Facing
> 
> Then it is negated to have:
> 
>   0: Back Facing
>   ~0: Front Facing
> 
> Then the *signed* .le (less or equal) with zero, as I see it - this is the
> wrong part, ~0 < 0 when comparison is signed however the initial intention
> was to have "false" in this comparison when triangle is front facing.
> 
> Manually changing null<1>D to null<1>UD fixes the issue.
> 
> How such comparison is created?
> 
> brw_fs_cmod_propagation.cpp, in opt_cmod_propagation_local, line 323 -
> CONDITION_LE modifier is propagated when condition operates on two
> REGISTER_TYPE_UD while scan_inst (opcode: asr) has dst: REGISTER_TYPE_D and
> src: REGISTER_TYPE_W.
> 
> From my point of view it's not a correct thing to do, but I'm not sure what
> to do here.

I'm just (In reply to Danylo from comment #18)
> What have I found on Friday:
> 
> The NIR I have attached is not a problem - it's a correct one as Jason
> corrected me on IRC.
> 
> The issue seems to be even further down, in native code such comparison
> generated:
> 
> asr.le.f0.0(8)  null<1>D        -g0<0,1,0>W     15D             { align1 1Q
> };
> 
> ...
> 
> (+f0.0) if(8)   JIP: 128        UIP: 128                        { align1 1Q
> };
> 
> The 15th bit of that register means:
> 
>   0: Front Facing 
>   1: Back Facing
> 
> Then it is negated to have:
> 
>   0: Back Facing
>   ~0: Front Facing
> 
> Then the *signed* .le (less or equal) with zero, as I see it - this is the
> wrong part, ~0 < 0 when comparison is signed however the initial intention
> was to have "false" in this comparison when triangle is front facing.
> 
> Manually changing null<1>D to null<1>UD fixes the issue.
> 
> How such comparison is created?
> 
> brw_fs_cmod_propagation.cpp, in opt_cmod_propagation_local, line 323 -
> CONDITION_LE modifier is propagated when condition operates on two
> REGISTER_TYPE_UD while scan_inst (opcode: asr) has dst: REGISTER_TYPE_D and
> src: REGISTER_TYPE_W.
> 
> From my point of view it's not a correct thing to do, but I'm not sure what
> to do here.

Thank you very much for the analysis (and attaching the shader). I'll take a closer look.
Comment 20 Danylo 2019-01-29 10:00:48 UTC
> Thank you very much for the analysis (and attaching the shader). I'll take a closer look.

Thanks and here is a link to RenderDoc trace https://drive.google.com/open?id=19kV_pv63sLrqpeKyx3-UnVzQIKW7d7gG , to see the issue and shader in the wild you should look at call №2428, the shader I have attached is from that call but made compilable.
Comment 21 Matt Turner 2019-02-12 00:07:31 UTC
Please test if 

   git://people.freedesktop.org/~mattst88/mesa bug-109404

fixes the misrendering.
Comment 22 nagrigoriadis 2019-02-12 05:20:36 UTC
Hi Matt

I built your branch and it does indeed seem to resolve the issue :-)

Thank you
Comment 23 Danylo 2019-02-12 09:32:04 UTC
Yes, it fixes the issue.
Comment 24 Ian Romanick 2019-02-14 19:15:41 UTC
MR submitted: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/256
Comment 25 Ian Romanick 2019-02-15 19:13:24 UTC
Fixed by :

commit 2dff9a66b629834bffad47e7a9025e0f1de5ffc3
Author: Matt Turner <mattst88@gmail.com>
Date:   Mon Feb 11 16:02:15 2019 -0800

    intel/compiler: Avoid propagating inequality cmods if types are different
    
    v2: Fix silly bug in logic.  s/||/&&/
    
    All but one of the affected shaders is in an Unreal4 demo.  The other is
    in Tomb Raider.  All of the cases that Ian investigated appear to be
    sequences like the following
    
        if (int(uint(some_float)) < 0) /* other relations too */
            ...
    
    At least in Tomb Raider, it's not obvious that this sequence came from
    the original shader.
    
    In some of the Unreal demos, the shader contains code like
    
        if (int(uint(textureLod(...))) > 0)
            ...
    
    which explicitly generates the offending sequence.
    
    All Gen6+ platforms had similar results (Skylake shown):
    total instructions in shared programs: 15437170 -> 15437187 (<.01%)
    instructions in affected programs: 4492 -> 4509 (0.38%)
    helped: 0
    HURT: 17
    HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
    HURT stats (rel)   min: 0.05% max: 0.73% x̄: 0.66% x̃: 0.73%
    95% mean confidence interval for instructions value: 1.00 1.00
    95% mean confidence interval for instructions %-change: 0.57% 0.75%
    Instructions are HURT.
    
    total cycles in shared programs: 383007996 -> 383007992 (<.01%)
    cycles in affected programs: 20542 -> 20538 (-0.02%)
    helped: 6
    HURT: 7
    helped stats (abs) min: 2 max: 6 x̄: 5.33 x̃: 6
    helped stats (rel) min: 0.11% max: 0.36% x̄: 0.32% x̃: 0.36%
    HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
    HURT stats (rel)   min: 0.27% max: 0.27% x̄: 0.27% x̃: 0.27%
    95% mean confidence interval for cycles value: -3.30 2.69
    95% mean confidence interval for cycles %-change: -0.19% 0.19%
    Inconclusive result (value mean confidence interval includes 0).
    
    No changes on Iron Lake or GM45.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
    Tested-by: nagrigoriadis@gmail.com
    Tested-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>
Comment 26 nagrigoriadis 2019-02-15 20:19:14 UTC
Thanks. I built latest GIT and tested.
Seems all good :-)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.