Bug 106594 - [regression,apitrace,bisected] Prison Architect rendered unplayable by multicoloured flickering triangles and overlayed triangles when performing certain actions
Summary: [regression,apitrace,bisected] Prison Architect rendered unplayable by multic...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Mathias Fröhlich
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords: bisected, regression
Depends on:
Blocks: 77449
  Show dependency treegraph
 
Reported: 2018-05-21 16:53 UTC by Kai
Modified: 2018-06-05 05:51 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Compressed trace (29.79 MB, application/x-7z-compressed)
2018-05-21 16:53 UTC, Kai
Details
Glitch while placing new staff on the map (2.08 MB, image/png)
2018-05-21 16:54 UTC, Kai
Details
Game output to STDOUT and STDERR (10.97 KB, text/plain)
2018-05-21 16:54 UTC, Kai
Details
git bisect log (1.74 KB, text/plain)
2018-05-30 20:35 UTC, Kai
Details
Make sure that immediate mode draws have landed before array draws are executed. (14.55 KB, patch)
2018-05-31 08:52 UTC, Mathias Fröhlich
Details | Splinter Review

Description Kai 2018-05-21 16:53:31 UTC
Created attachment 139660 [details]
Compressed trace

I wanted to play Prison Architect again, but when I launch the game and during gameplay there are sometimes triangles flickering over the screen in various colours of the rainbow (see the first frame of the trace). And there are actions that trigger these glitches reliably like opening the staff menu and placing the mouse with a staff type selected over the map (see attached screenshot or eg. Frame 599 in the trace).

This is a regression, because I was able to play Prison Architect with no trouble in the past. However I cannot say whether this is a regression in PA or in Mesa/LLVM, since it has been a while since I last played PA and there have been updates to PA as well as the graphics stack. In any case I prepared a trace (which looks a bit funny on reply: lots of black with the game window stuck in the lower left corner of the replay), I'll attach as a compressed file to this bug. I'll also attach the log from the game, there are couple of lines, that seem interesting, like:
> ==OPENGL==> [location 'Bitmap::ConvertToTexture Before Texture Creation'] error code 0x502 (invalid operation)

While I'm using the "experimental VBO" feature of PA, turning that off, doesn't fix this bug and thus seems unrelated.

This bug affects at least the Steam and the Humble Bundle builds of version 13f.

The graphics stack I used (fully updated Debian testing as a base) for testing is:
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/6f558fb0f7
libdrm: 2.4.91-2
LLVM: SVN:trunk/r332816 (7.0 devel)
X.Org: 2:1.19.6-1
Linux: 4.16.10
Firmware (firmware-amd-graphics): 20170823-1
libclc: Git:master/a2118d58fc
DDX (xserver-xorg-video-amdgpu): 18.0.1-1

Let me know, if you need anything else.
Comment 1 Kai 2018-05-21 16:54:13 UTC
Created attachment 139661 [details]
Glitch while placing new staff on the map
Comment 2 Kai 2018-05-21 16:54:47 UTC
Created attachment 139662 [details]
Game output to STDOUT and STDERR
Comment 3 Kai 2018-05-22 08:38:12 UTC
I was able to check for this behaviour on a system with an integrated Intel GPU (HD Graphics 530 (Skylake GT2); PCIID: 0x1912) on Mesa 18.0.3 and it didn't show these glitches.

The OpenGL error line ("==OPENGL==> [location 'Bitmap::ConvertToTexture Before Texture Creation'] error code 0x502 (invalid operation)") is shown with the Intel GPU as well.
Comment 4 Kai 2018-05-26 15:23:17 UTC
Still an issue with the following stack (fully updated Debian testing as a base):
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/79fe00efb4
libdrm: 2.4.92-1
LLVM: SVN:trunk/r333339 (7.0 devel)
X.Org: 2:1.19.6-1
Linux: 4.16.11
Firmware (firmware-amd-graphics): 20170823-1
libclc: Git:master/a2118d58fc
DDX (xserver-xorg-video-amdgpu): 18.0.1-1

Forcing the OpenGL level back to 3.0 (the same OpenGL level the Intel GPU from comment #3 supported) doesn't help either.

However, I did find, that when I downgrade to Debian's Mesa package 18.0.4-1 (built against LLVM 6.0 (package version 1:6.0-3+b1)), I cannot reproduce this bug any longer. Therefore the bug must have been introduced in either Mesa between 18.0.4 and the current Git HEAD or LLVM between the 6.0 release and the current SVN HEAD.
Comment 5 Kai 2018-05-26 18:05:00 UTC
I just tested Debian's 18.1.0-1 package (also built against LLVM 6.0 (package version 1:6.0-3+b1)) and that version of Mesa doesn't produce the glitch either, thus narrowing the regression space to:
- Mesa: between 18.1 and current HEAD of Git
- LLVM: between 6.0 and current HEAD of SVN.

Further testing with a version of Debian's 18.1.0-1 package built against LLVM 7 (SVN r333339) didn't exhibit the bug either, therefore the bug must have been introduced between Mesa 18.1.0 and the current Git HEAD of Mesa. I'll see, if I can do a bisection.
Comment 6 Kai 2018-05-30 20:33:55 UTC
The bisection result is:
> 9c7be67968aaba224d518dee86dff736a4b599c6 is the first bad commit
> commit 9c7be67968aaba224d518dee86dff736a4b599c6
> Author: Mathias Fröhlich <mathias.froehlich@web.de>
> Date:   Sun May 13 09:18:57 2018 +0200
>
>     mesa: Remove FLUSH_VERTICES from VAO state changes.
>    
>     Pending draw calls on immediate mode or display list calls do
>     not depend on changes of the VAO state. So, remove calls to
>     FLUSH_VERTICES and flag _NEW_ARRAY as appropriate.
>    
>     Reviewed-by: Brian Paul <brianp@vmware.com>
>     Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
>
> :040000 040000 ad95067168b41b30d17d7ff05ecd47be4ca150e4 97ab8bde466f83da431193b045a664e540595d80 M      src

Reverting that commit generates several conflicts. I'd probably have to revert the whole series?

Since this touches core Mesa, the bug shouldn't be constrained to radeonsi, I'll adjust the component accordingly.
Comment 7 Kai 2018-05-30 20:35:12 UTC
Created attachment 139872 [details]
git bisect log
Comment 8 Mathias Fröhlich 2018-05-31 08:52:43 UTC
Created attachment 139883 [details] [review]
Make sure that immediate mode draws have landed before array draws are executed.

For some reason the trace did not replay locally. But I could look into the command sequence and I think I have spotted a use pattern that is broken currently and the problem most likely got exposed by the bisected patch. The real problem probably happend with an earlier change from my bigger series.

Can you check if the attached change helps for your problem please?

best
Mathias
Comment 9 Kai 2018-05-31 16:17:46 UTC
(In reply to Mathias Fröhlich from comment #8)
> For some reason the trace did not replay locally.

If you saw a lot of black, you might have been hit by what I described in comment #0: for me the replay is a large window of black (several times the required size for my screen) with the game screen stuck in the lower left corner (can be seen, if you let apitrace generate the screenshots).
Otherwise the trace is working for me.

> But I could look into the command sequence and I think I have spotted a use
> pattern that is broken currently and the problem most likely got exposed by
> the bisected patch. The real problem probably happend with an earlier change
> from my bigger series.
> 
> Can you check if the attached change helps for your problem please?

I can confirm, that attachment 139883 [details] [review] RESOLVES the bug for me. You can have my
  Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
for the patch from attachment 139883 [details] [review].

I used the following graphics stack (fully updated Debian testing as a base) for testing:
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/b9fb2c266a + attachment 139883 [details] [review]
libdrm: 2.4.92-1
LLVM: SVN:trunk/r333339 (7.0 devel)
X.Org: 2:1.19.6-1
Linux: 4.16.13
Firmware (firmware-amd-graphics): 20170823-1
libclc: Git:master/a2118d58fc
DDX (xserver-xorg-video-amdgpu): 18.0.1-1

To ensure no old shader from cache interfered, I deleted ~/.cache/mesa_shader_cache before running the game (did that during bisection as well).

Let me know, if you need anything else.
Comment 10 Kai 2018-06-05 05:51:29 UTC
Fixed by the following commit on master:
> commit 1ac4439d6278e6c5f9da5499bbc50362f0c6759b
> Author: Mathias Fröhlich <mathias.froehlich@web.de>
> Date:   Fri Jun 1 19:10:08 2018 +0200
> 
>     mesa: Make sure that imm draws are flushed before other draws execute.
>     
>     The recent patch
>     
>         mesa: Remove FLUSH_VERTICES from VAO state changes.
>     
>         Pending draw calls on immediate mode or display list calls do
>         not depend on changes of the VAO state. So, remove calls to
>         FLUSH_VERTICES and flag _NEW_ARRAY as appropriate.
>     
>     uncovered a problem that non immediate mode draw calls do only
>     flush outstanding immediate mode draws if FLUSH_UPDATE_CURRENT
>     is set in ctx->Driver.NeedFlush.
>     In that case, due to the sequence of _mesa_set_draw_vao commands
>     we could end up with the VAO from the FLUSH_VERTICES call set
>     into gl_context::Array._DrawVAO when the array draw is executed.
>     So the change pulls FLUSH_CURRENT out of _mesa_validate_* calls
>     into the array draw calls being validated.
>     The change introduces a new macro FLUSH_FOR_DRAW beside FLUSH_VERTICES
>     and FLUSH_CURRENT that flushes on changed current attributes as well
>     as on outstanding immediate mode draw calls. Use FLUSH_FOR_DRAW
>     in the non immediate mode draw code paths.
>     
>     Reviewed-by: Marek Olšák <marek.olsak@amd.com>
>     Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106594
>     Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.