Bug 110395

Summary: Gen8 and lower. Shadows are flickering in SuperTuxKart
Product: Mesa Reporter: Deve <deveee>
Component: Drivers/DRI/i965Assignee: Danylo <danylo.piliaiev>
Status: RESOLVED WORKSFORME QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: asdfghrbljzmkd, danylo.piliaiev
Version: 18.3Keywords: bisected
Hardware: All   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=110463
https://bugs.freedesktop.org/show_bug.cgi?id=111631
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 111444    
Attachments: flickering on skl
no flickering occurs during gnome screen record

Description Deve 2019-04-10 21:10:06 UTC
In current git STK grass shadows are constantly flickering.

Here is original bug report:
https://github.com/supertuxkart/stk-code/issues/3824

Basically it is related to the file:
https://github.com/supertuxkart/stk-code/blob/master/data/shaders/sp_grass_shadow.vert

I can "fix" this by changing order of the "layer" and "wind_direction" uniform variables. Maybe it's just a case.

And it can be reproduced on any track with grass/trees when shadows are enabled in options, eg. in Nessie's Pond.

Not sure which graphics cards are affected. Maybe <= gen8, because original bug report is from haswell and I was testing it on ivybridge.

Git bisect shows:

eca4a6548d07bbbb02a7768edb397bad7b72cfc2 is the first bad commit
commit eca4a6548d07bbbb02a7768edb397bad7b72cfc2
Author: Danylo Piliaiev <danylo.piliaiev@gmail.com>
Date:   Mon Jul 2 17:04:23 2018 +0300

    i965: Disable dual source blending when shader doesn't support it on gen8+
    
    Dual source blending behaviour is undefined when shader doesn't
    have second color output, dismissing fragment in such situation
    leads to a hang on gen8+ if depth test in enabled.
    
    Since blending cannot be gracefully fixed in such case and the result
    is undefined - blending is simply disabled.
    
    v2 (Kenneth Graunke):
     - Listen to BRW_NEW_FS_PROG_DATA in 3DSTATE_PS_BLEND
     - Also whack BLEND_STATE[] to keep the two in sync, since we're not
       sure exactly which copy of the redundant info the hardware will use.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107088
    Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

:040000 040000 e4887675a8acfdd017f7022f553a8bfeefc2b31b 8eb8649a23e09c70cb3088873249763e05f5452e M	src
Comment 1 Mark Janes 2019-04-11 01:09:35 UTC
Danylo, can you take a look and figure out if it is an app bug?
Comment 2 Denis 2019-04-11 12:22:39 UTC
hello Deve, Mark.
I tested this issue and found out that it was fixed in commit below. Bisect was maden between 18.3.4 and 19.0.0-rc7 tags.
Firstly it made me crazy, because I got different results even for the same mesa version (like, 18.3.4 first time may show issue, after re-launching application - not).
This was solved by adding MESA_GLSL_CACHE_DISABLE=1 before binary name.
I think, some shaders simply were cached, and that could lead to wrong results.

Latest master (if you run app like this MESA_GLSL_CACHE_DISABLE=1 ./supertuxcart) doesn't have this issue.



commit 5a9b7bce9cee5563e94e75c93fffe462405dfcb1 (HEAD, refs/bisect/good-5a9b7bce9cee5563e94e75c93fffe462405dfcb1)
Author: Ilia Mirkin <imirkin@alum.mit.edu>
Date:   Mon Feb 4 22:57:06 2019 -0500

    nv50,nvc0: use condition for occlusion queries when already complete
    
    For the NO_WAIT variants, we would jump into the ALWAYS case for both
    nested and inverted occlusion queries. However if the query had
    previously completed, the application could reasonably expect that the
    render condition would follow that result.
    
    To resolve this, we remove the nesting distinction which unnecessarily
    created an imbalance between the regular and inverted cases (since
    there's no "zero" condition mode). We also use the proper comparison if
    we know that the query has completed (which could happen as a result of
    an earlier get_query_result call).
    
    Fixes KHR-GL45.conditional_render_inverted.functional
    
    Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
    Cc: 19.0 <mesa-stable@lists.freedesktop.org>
    (cherry picked from commit e00799d3dc0595dc3998dbf199ceec8b1eece966)
Comment 3 Denis 2019-04-11 12:29:04 UTC
I am sorry, attached commit - it is the last "with issue".

First commit which didn't have this issue, here:


commit d278b3c187d426fea5ded7e8d97022efc9e9d7e3 (HEAD, refs/bisect/bad)
Author: Ilia Mirkin <imirkin@alum.mit.edu>
Date:   Tue Feb 5 03:05:33 2019 -0500

    nvc0: stick zero values for the compute invocation counts
    
    Not quite perfect, but at least we don't end up with random values in
    the query buffer.
    
    Fixes KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values
    
    Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
    Cc: 19.0 <mesa-stable@lists.freedesktop.org>
    (cherry picked from commit 6adb9b38bfb1f6ee4c94596bf0744225aa8e967a)
Comment 4 Denis 2019-04-11 12:52:53 UTC
and sorry one more time. Now it flickers even on 19.0.0-rc7. So it really looks like completely random. Continue investigations
Comment 5 Deve 2019-04-11 13:37:18 UTC
The original bug report says
"If you set all graphics settings to the lowest possible value except Improved Graphics and shadow (low or high), the shadows flash like crazy."

so maybe it is more visible on lower settings and only shadows enabled. For me it was always reproducible on highest settings.

It also behaves a bit "funny" for me. If I increase the wind direction to much higher values in c++ side, then it flickers much less. But using higher/different values in shaders doesn't help. And if I move "uniform vec3 wind_direction;" to the first line in shader, then it works without issues.
Comment 6 Denis 2019-04-11 13:48:39 UTC
right now I can reproduce issue on max settings (6) in Graphics tab.
And still it is random. One time flickers - after restarting - not. I tried to make apitrace... but it creates without everything except sky - so useless.

Are you sure that you fixed it with your "hack"? And it wasn't the case when issue simply didn't reproduce?
Comment 7 Deve 2019-04-11 13:55:00 UTC
I just can't reproduce it after that modification. And it just gives me impression that it's related to uniform variables...
Comment 8 Danylo 2019-04-11 14:49:37 UTC
All of this tells me that the issue is not related to my commit because the commit fixes the issue with the dual source blending which itself isn't even used in the game. And looking at troubles Denis is going through - maybe a wrong commit was found during the bisect?

I'm not near a machine with the necessary gpu  at the moment so I'm not able to definitely tell what's wrong. However Denis will probably provide more information later.
Comment 9 benau2006 2019-04-11 17:23:30 UTC
you may want MESA_EXTENSION_OVERRIDE="-GL_ARB_buffer_storage" apitrace trace bin/supertuxkart to avoid buffer storage usage in stk because we used MAP_COHERENT_BIT
Comment 10 Mark Janes 2019-04-11 17:40:15 UTC
I see all kinds of STK flickering with 0.9.3 from debian testing.  I see the same behavior on mesa master and 18.1.

This is on my Intel skylake system (HD520).

Since the behavior is consistent on our driver, it seems likely to me that a change to STK is causing this.

Is anyone able to test on an older STK?
Comment 11 QwertyChouskie 2019-04-11 19:15:51 UTC
On my system, with Ubuntu 18.04 and Mesa 18.3.3 from http://ppa.launchpad.net/ubuntu-x-swat/updates/ubuntu, I have this issue.  It started recently, so I assumed it was a certain change in the code, but that change was then reverted, and the issue is still present.  It is especially noticeable in Black Forest due to the large amount of trees.
Comment 12 Deve 2019-04-11 19:49:51 UTC
@Mark Janes I checked STK 0.9.2 and it behaves the same. For older versions it would need some fixes in code to make shadows working with mesa. Actually I was using laptop with intel hd4000 for a long time and I never saw this bug before.

And if it's random for others, then maybe the bisected commit is unrelated and it just exposed a bug that is somewhere else...
Comment 13 Denis 2019-04-12 11:21:45 UTC
@Deve - 
>I can "fix" this by changing order of the "layer" and "wind_direction" uniform variables. Maybe it's just a case.

Does it fix bisected by you commit (from Danylo)?
I made bisection one more time (and now I think that it was the most correct), but I did what you suggested (first line in shader is "wind_dircection", second - "layer".

As I wrote before, this issue is completely random. It can be reproduced, like, 1/5, or even 1/10 launches. Sometimes - 1 time from 2 launches.

So, 17.3.6 was stable (I did about 50 launches or even more), and 18.3.4 - with an issue. Also issue existed in 19.0.0-rc7 (simply I had them compiled).

And result of bisection also lead me to Danylo's commit.
>you may want MESA_EXTENSION_OVERRIDE="-GL_ARB_buffer_storage" apitrace trace bin/supertuxkart

thanks a lot! It helped and I made an apitrace.
I will discuss my results with Danylo when he will be available.

>This is on my Intel skylake system (HD520).
Very interesting. I made an apitrace, and ran it on SKL (fedora 29) about 10 times. Didn't see flickerings.
Also I checked app (and apitrace) on my KBL - with 19.0 mesa version - also nothing.

Could you please check apitrace I did?
https://drive.google.com/open?id=1hTRS3tgBuSOSNiFGoy7JaLVKkVeNmwb0

UDP - everything I did, was maden on:
Fedora 29
HSW cpu
Comment 14 Danylo 2019-04-15 13:51:15 UTC
Mark, could you recheck if it flickers for you on skylake since neither I nor Denis are able to reproduce it on gen8+ ?

Ok, my commit indeed leads to flickering for a very strange reason.

Commenting blend state update on new fragment program mystically solves the issue (https://gitlab.freedesktop.org/mesa/mesa/commit/eca4a6548d07bbbb02a7768edb397bad7b72cfc2#a17c5c8577696d3d741e2bdb7e9ea61ca0356b89_3200_3219).

So something is not updated right. The closest thing in PRM I found was the HSW PRM, volume 2b, page 823 (3DSTATE_BLEND_STATE_POINTERS):

"When the BLEND_STATE pointer changes but not the CC_STATE pointer,
driver needs to force a CC_STATE pointer change to improve
blend performance in pixel backend."

And doing this fixed the issue for some unknown for me reason. However the same quote is present for gen8 but I didn't reproduce the flickering there...

Here is a patch - https://gitlab.freedesktop.org/mesa/mesa/merge_requests/660
Comment 15 Mark Janes 2019-04-16 18:26:32 UTC
Created attachment 143998 [details]
flickering on skl
Comment 16 Mark Janes 2019-04-16 18:27:19 UTC
Created attachment 143999 [details]
no flickering occurs during gnome screen record
Comment 17 Mark Janes 2019-04-16 18:29:22 UTC
I must be looking at a different bug.  When I run supertuxcart on skl, I see lots of flickering.  See attached video.  Curiously, I had to take a video of it with my phone, because the flickering *stops* when I enable gnome's screen recording (ctrl-alt-shift-r).

The non-flickering video is also attached for reference.

Danylo's patch has no affect on this behavior.
Comment 18 Deve 2019-04-16 20:00:07 UTC
I tested the patch https://gitlab.freedesktop.org/mesa/mesa/merge_requests/660 with both first bad commit and current git version on my intel HD4000 and it solves the issue for me.
Comment 19 Denis 2019-04-17 13:23:27 UTC
(In reply to Mark Janes from comment #17)
> I must be looking at a different bug.  When I run supertuxcart on skl, I see
> lots of flickering.  See attached video.  Curiously, I had to take a video
> of it with my phone, because the flickering *stops* when I enable gnome's
> screen recording (ctrl-alt-shift-r).
> 
> The non-flickering video is also attached for reference.
> 
> Danylo's patch has no affect on this behavior.

hi Mark. I rechecked again my SKL, and I couldn't reproduce this issue.
I have fedora 29 there.

I moved this issue to a separate bug => https://bugs.freedesktop.org/show_bug.cgi?id=110463
later I will setup debian testing and will try again.
Comment 20 Horst Schirmeier 2019-05-03 20:05:23 UTC
I'm seeing STK 1.0 flickering similar to the attached video on a Skylake system (i7-6700HQ, Intel HD Graphics 530) with Ubuntu 19.04 (Mesa 19.0.2-1ubuntu1) as well.  The proposed patch (https://gitlab.freedesktop.org/GL/mesa/commit/ef9b640df2cd2e56186dd94edd72d48e4cb611b4.diff) applied to Ubuntu's Mesa 19.0.2-1ubuntu1 packages does *not* fix the issue for me; I still get flickering at graphics detail levels 4 and 5, while level 3 is OK before and after the patch.
Comment 21 Danylo 2019-05-06 11:49:18 UTC
Thanks for mentioning detail levels!

The flickering on SKL+ seems to be caused by the same issue as in Vulkan:
  https://bugs.freedesktop.org/show_bug.cgi?id=109630 - vkQuake
  https://bugs.freedesktop.org/show_bug.cgi?id=109616 - Witcher 3
  https://bugs.freedesktop.org/show_bug.cgi?id=110295 - Dirt 4
Which I attempted to fix in https://gitlab.freedesktop.org/mesa/mesa/merge_requests/621

In short something goes wrong with binding tables of 3D workload after compute workload.

Why I think it is the same issue:
- STK uses compute shaders on these detail levels.
- Commenting out dispatch helps.
- Always uploading push constants helps.
Comment 22 Denis 2019-08-29 11:17:37 UTC
(In reply to Deve from comment #18)
> I tested the patch
> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/660 with both first
> bad commit and current git version on my intel HD4000 and it solves the
> issue for me.

I would put update related to this bug, because there were a lot of flood higher.
Mentioned commit should fix current issue on Gen8 and lower CPU models.
So waiting for patch review and merge (if it is ok)
Comment 23 QwertyChouskie 2019-09-06 01:24:01 UTC
Any updates on this, especially since a patch already exists?  Would love to get this in Mesa master, and preferably ported to 19.2 and 19.1 also.
Comment 24 Mark Janes 2019-09-09 20:47:54 UTC

*** This bug has been marked as a duplicate of bug 110463 ***
Comment 25 Mark Janes 2019-09-09 21:53:13 UTC
I just built latest mesa on a bdw system (gen8) and could not repro any flickering, using the repro case described in the STK bug.  Reopen if you can find a way to make it happen on master.
Comment 26 Denis 2019-09-10 08:06:17 UTC
I don't like such issues... I also tried to reproduce it, and couldn't. I couldn't reproduce it even using bisected commit. Tried apitrace and latest game version (for about 20 runs for both, because issue wasn't stable). Shadows look great now
Comment 27 Denis 2019-09-18 08:56:23 UTC
hi Mark, did you reproduce it again on BRW? We tried a lot, but without success (and, in fact, we didn't find, why it started working correctly even on previously found "bad" commits).
Based on Danylo's findings and proposed patch, I assumed that same bug was left only on HSW, and created a new ticket for it https://bugs.freedesktop.org/show_bug.cgi?id=111631
And possibly it should be added to blockers instead current one.
Comment 28 Mark Janes 2019-09-18 16:19:40 UTC
Denis: you are right, I reopened this bug because of confusion around what platforms were still broken.  The HSW specific bug is the one that should block the release.

*** This bug has been marked as a duplicate of bug 110463 ***
Comment 29 QwertyChouskie 2019-09-19 01:48:09 UTC
Definitely not a dupe of https://bugs.freedesktop.org/show_bug.cgi?id=110463, even though that bug also got reported in these comments (but was then split into its own bug, as it should be).  The other bug is for texture flickering, this is for shadow flickering.  If shadows are confirmed fine, then it should be resolved as fixed, not dupe.
Comment 30 Denis 2019-09-19 06:48:07 UTC
I agree with @QwertyChouskie. That's not a duplicate, because in case with skylake - there was a patch, fixed it. Current issue was fixed "somehow", so lets don't reopen it again :) and move our discussions into HSW issue.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.