Bug 107926

Summary: [anv] Rise of the Tomb Raider always misrendering, segfault and gpu hang.
Product: Mesa Reporter: Darius Spitznagel <d.spitznagel>
Component: Drivers/Vulkan/intelAssignee: Intel 3D Bugs Mailing List <intel-3d-bugs>
Status: RESOLVED MOVED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: jason
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: misrendering on little rock left of Lara
misrendering2 on little rock left of Lara
corruption
missing objects
game settings1
game settings2
misrendering_haswell

Description Darius Spitznagel 2018-09-13 19:41:49 UTC
Hello devs,

this game (native from feral) has rendering issues in all latest mesa 18.x releases (17.x not tested).

18.0.5 > misrendering > no crash > no hang.
18.1.8 > misrendering > no crash > no hang.
18.2.0 > misrendering > sometimes segfault or gpu hang and than crash.
master (d4bf954fe61ec231be2bfa5e059f0fb7f6150bd1) > misrendering > sometimes segfault or gpu hang and than crash.

This is all reproducible. Simply run game and play internal benchmark.

The misrendering is best seen in first location mostly on the mountains/rocks.
The crash or gpu hang occurs also mostly in first location of benchmark.

Specs:
Iris Pro Graphics 6200
RAM 16GB
Xorg 1.20.1
Kernel 4.14.69
vulkan-loader 1.1.82
Debian Stretch 9.5
Comment 1 Sergii Romantsov 2018-09-20 13:27:00 UTC
Hello, Darius.
Could you, please, provide screenshots of misrendering?
Comment 2 Darius Spitznagel 2018-09-20 16:08:29 UTC
Created attachment 141661 [details]
misrendering on little rock left of Lara
Comment 3 Darius Spitznagel 2018-09-20 16:09:56 UTC
Created attachment 141662 [details]
misrendering2 on little rock left of Lara
Comment 4 Darius Spitznagel 2018-09-20 16:17:32 UTC
These screenshots where taken with 18.0.5 because 18.2.0 crashed (seqfault) so much times that I could not make any screenshot.

Environment while taking screenshots:
MESA_GLSL_CACHE_DISABLE=1
Xorg 1.20.1
Mate DE 1.20.3
DDX modesetting
mesa 18.0.5

Strangly there were no seqfaults in dmesg but gpu hangs...

[Thu Sep 20 17:57:27 2018] [drm] GPU HANG: ecode 8:0:0x85d7dfff, in WinMain [13370], reason: No progress on rcs0, action: reset
[Thu Sep 20 17:57:27 2018] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[Thu Sep 20 17:57:43 2018] i915 0000:00:02.0: Resetting rcs0 after gpu hang

The segfaults (with mesa 18.2.0) where reported by a little feral window.
Comment 5 Darius Spitznagel 2018-09-20 16:23:28 UTC
Before I forget...

Most missrendering was NOT visible on any screenshot. Luckily you can see them on the little rocks at the end of location 1 of the internal benchmark.

I also tried to compile lunarg vktrace (debian does not deliver it! - why???) but had no luck - so no vktrace file for you:(
Comment 6 Darius Spitznagel 2018-09-20 16:26:45 UTC
Would it help to use RenderDoc? - is vktrace obsolete?
Comment 7 Darius Spitznagel 2018-09-20 16:40:27 UTC
With current master it's getting worse.
Now I see black triangles in first benchmark location before segfault.
Comment 8 Sergii Romantsov 2018-09-21 12:29:32 UTC
Probably it will be nice to have both: renderdoc and vktrace.
Please, try vktrace from here: https://vulkan.lunarg.com/sdk/home#linux

And have you tried to use another modesetting-drivers or disable it?
Comment 9 Darius Spitznagel 2018-09-21 16:40:19 UTC
(In reply to Sergii Romantsov from comment #8)
> Probably it will be nice to have both: renderdoc and vktrace.
Will try RenderDoc. Vktrace did not compile as already said in comment 5.

> Please, try vktrace from here: https://vulkan.lunarg.com/sdk/home#linux
See above.

> And have you tried to use another modesetting-drivers or disable it?
Yes tried Intel DDX which changed nothing.
Comment 10 Jason Ekstrand 2018-10-02 22:25:53 UTC
Patch on the list to fix the crash/hang:

https://patchwork.freedesktop.org/patch/254432/
Comment 11 Jason Ekstrand 2018-10-03 15:11:41 UTC
The crash is fixed by the following commit on master:

commit f5bab06428fc7ca6116cf0daf1c237eb86202e7a
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Tue Oct 2 17:19:32 2018 -0500

    anv/batch_chain: Don't start a new BO just for BATCH_BUFFER_START
    
    Previously, we just went ahead and emitted MI_BATCH_BUFFER_START as
    normal.  If we are near enough to the end, this can cause us to start a
    new BO just for the MI_BATCH_BUFFER_START which messes up chaining.  We
    always reserve enough space at the end for an MI_BATCH_BUFFER_START so
    we can just increment cmd_buffer->batch.end prior to emitting the
    command.
    
    Fixes: a0b133286a3 "anv/batch_chain: Simplify secondary batch return..."
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926
    Tested-by: Alex Smith <asmith@feralinteractive.com>
    Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Comment 12 Jason Ekstrand 2018-10-04 17:49:05 UTC
The corruption is fixed by the following commit in master:

commit dd553bc67f8ab1513fd196b6ffb7c4a76723adfd (public/master)
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Wed Oct 3 12:14:20 2018 -0500

    nir/alu_to_scalar: Use ssa_for_alu_src in hand-rolled expansions
    
    The ssa_for_alu_src helper will correctly handle swizzles and other
    source modifiers for you.  The expansions for unpack_half_2x16,
    pack_uvec2_to_uint, and pack_uvec4_to_uint were all broken with regards
    to swizzles.  The brokenness of unpack_half_2x16 was causing rendering
    errors in Rise of the Tomb Raider on Intel ever since c11833ab24dcba26
    which added an extra copy propagation to the optimization pipeline and
    caused us to start seeing swizzles where we hadn't seen any before.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926
    Fixes: 9ce901058f3d "nir: Add lowering of nir_op_unpack_half_2x16."
    Fixes: 9b8786eba955 "nir: Add lowering support for packing opcodes."
    Tested-by: Alex Smith <asmith@feralinteractive.com>
    Tested-by: Józef Kucia <joseph.kucia@gmail.com>
    Reviewed-by: Matt Turner <mattst88@gmail.com>
Comment 13 Darius Spitznagel 2018-10-05 12:05:21 UTC
Hello Jason,

the GPU hangs/crashes are indeed solved.

But NOT not the misrendering/corruption.
Could it be that you have accidentally commited the wrong patch?
Comment 14 Jason Ekstrand 2018-10-05 12:46:17 UTC
Could you be more specific about what you think it's being misrendered?  Perhaps open the screenshot up in an image editor and circle it in red and then describe why you think it's wrong.  I played for two hours yesterday and, apart from an occasional Z-buffer flicker, everything looked fine to me.
Comment 15 Darius Spitznagel 2018-10-05 13:56:01 UTC
In the first internal benchmark location (be ware I did NOT play the game) - the snowy mountains - there are little triangles moving and blinking over the rocks.
Sometimes black boxes appear in the sky too.

My game preset is set to LOWEST if this helps.

I will make some screen shots when I'am at home. Hopefully the corruptions are more visible this time.
Comment 16 Jason Ekstrand 2018-10-05 14:17:54 UTC
Ok, the corruption mentioned in the above patch was causing everything to be black on anything other than lowest settings and causing lighting to be wrong on lowest with ambient occlusion enabled.  A little z-flickering is not nearly as bad of a bug. :-)
Comment 17 Darius Spitznagel 2018-10-05 16:39:42 UTC
Created attachment 141915 [details]
corruption
Comment 18 Darius Spitznagel 2018-10-05 16:44:51 UTC
I took more than 20 screenshots, but on all of them the corruption is not visible.
Only at the end of the first location (red circle).

In "real world" all the rocks in red AND green circles are totally corrupted.

When I change the preset to low or middle it gets even more worse.
Than I have black boxes and flashing dots in addition.
Comment 19 Jason Ekstrand 2018-10-05 17:30:28 UTC
Ok, I see it now.  Yeah, that's the same Z-flickering I saw while playing.  Not 100% sure what's going wrong there; probably a shader precision issue.
Comment 20 Jason Ekstrand 2018-10-05 18:05:36 UTC
Can you take a picture of the screen with the corruption?  I suspect something else may be going on that's deeper in the display system and doesn't show up in screenshots.
Comment 21 Darius Spitznagel 2018-10-06 12:43:26 UTC
Ok, I've made a little mp4-movie made with smartphone.

You can download it here...
https://www.goodbytez.de/mesa/20181006_140346.mp4
Comment 22 Darius Spitznagel 2018-10-28 13:49:28 UTC
Hello Jason,

any news on this?

Compiled newest git master today - no improvement.

I took a spin and started a new game...
Graphical glitches everywhere. It really doesn't make fun at all.
Comment 23 Marina Chernish 2018-10-29 11:44:48 UTC
I tried to reproduce it on Haswell but no glitch happened.

Used environment: Haswell: CPU: Intel Core i5-4300M; GPU: Intel® HD Graphics 4600
Ubuntu 16.04; kernel 4.15.0-36-generic;
Vulkan 1.1.80, mesa 18.3.0.
Comment 24 Darius Spitznagel 2018-10-29 22:32:10 UTC
Hello Marina,

luckily I have two Gigabyte BRIX boxes... one with Broadwell and the other with Haswell.

I took the HDD from the BDW one and hooked it into the HSW one.

You are 99% right. There are NO glitches like with Broadwell in Location one and two. But missing objects (from the tree house) in location three (see screenshot).
The objects are also missing with Broadwell.

So now we have two systems that do not work correctly.
Comment 25 Darius Spitznagel 2018-10-29 22:32:49 UTC
Created attachment 142265 [details]
missing objects
Comment 26 Darius Spitznagel 2018-10-29 22:40:33 UTC
I think the important difference between HSW and BDW is that as far as I know HSW is using vectors and BDW scalars for GLSL.
Comment 27 Jason Ekstrand 2018-10-29 22:53:45 UTC
My previous work on this game has been on Skylake which, if you're using Broadwell, may explain why we were seeing such different results.  I pulled out my broadwell and have it installing now.  I may be able to play with it some tomorrow.
Comment 28 Jason Ekstrand 2018-10-30 19:20:32 UTC
Can you provide your exact graphics settings?  Screenshot of the settings page would work well.  I tried again, this time on Broadwell, and I still can't reproduce it. :(  Also, just to be sure, are you running a 64-bit system?
Comment 29 Darius Spitznagel 2018-10-30 19:32:42 UTC
Yes, I'm on 64bit Debian Stretch.

darius@pc1:~$ glxinfo | grep string
server glx vendor string: SGI
server glx version string: 1.4
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Iris Pro 6200 (Broadwell GT3e) 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.3.0-devel (git-8676af12c8)
OpenGL core profile shading language version string: 4.50
OpenGL version string: 3.0 Mesa 18.3.0-devel (git-8676af12c8)
OpenGL shading language version string: 1.30
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 18.3.0-devel (git-8676af12c8)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
Comment 30 Darius Spitznagel 2018-10-30 19:33:22 UTC
Created attachment 142284 [details]
game settings1
Comment 31 Darius Spitznagel 2018-10-30 19:33:46 UTC
Created attachment 142285 [details]
game settings2
Comment 32 Jason Ekstrand 2018-10-30 19:57:38 UTC
Turning my settings all the way down, I can now see it.  It looks like one of the filters was masking something.  I'll try to take a look into why.
Comment 33 vadym 2018-11-07 16:17:28 UTC
Created attachment 142400 [details]
misrendering_haswell
Comment 34 vadym 2018-11-07 16:23:56 UTC
Hi Darius, Jason,

I noticed some new rendering issues on HSW. Video is attached. Are you able to reproduce it ? I'm using the same graphics settings as you. 

Mesa coomit id: 792dde66f253acb46396ac8be1801d2d878d30bf

> But missing objects (from the tree house) in location three (see screenshot).
> The objects are also missing with Broadwell.

I can not reproduce this on HSW.
Comment 35 vadym 2018-11-07 16:33:19 UTC
(In reply to vadym from comment #33)
> Created attachment 142400 [details]
> misrendering_haswell

Oh, looks like this is not an issue. So please ignore this message.
Comment 36 Jason Ekstrand 2018-11-09 21:34:23 UTC
Couple of notes:

 1) It appears to be an issue with texture coordinates getting messed up.  Not sure exactly how yet but that seems to be the actual problem.
 2) The bug goes away if INTEL_SCALAR_VS is set to 0 so it's some sort of miscompiled vertex shader.
Comment 37 Darius Spitznagel 2018-11-11 11:13:21 UTC
Hello Jason,

INTEL_SCALAR_VS=0 changed nothing for me with git-8676af12c8 (last one I've tested - see Comment 29).

But with git-552642066f from today and INTEL_SCALAR_VS=0 I can confirm, that ALL rendering issues are gone (at least for the benchmark (ingame not tested)).

Something between these master releases has improved the situation.
Comment 38 Darius Spitznagel 2018-11-11 12:15:01 UTC
Hello Jason,

with todays master git-552642066f the misrendering in location one and two IS FIXED.
No need to use "INTEL_SCALAR_VS=0" on my Broadwell system.

The only problem which remains is that on the very FIRST run there are missing objects in location three (see screenshot "missing objects"). Setting INTEL_SCALAR_VS=0 for the first run doesn't change that.

But these objects GET VISIBLE during every NEXT... run of the benchmark.
Comment 39 Darius Spitznagel 2018-11-11 12:34:35 UTC
If it helps...

Even with "MESA_GLSL_CACHE_DISABLE=1" the missing objects from the tree house appear at every NEXT run in location three.

I was eager to know if it has something to do with the disk shader cache, but seem no.
Comment 40 Darius Spitznagel 2018-11-11 13:26:55 UTC
Right now I'm bisecting the first good/fixed commit for the misrendering in location one and two.
Comment 41 Darius Spitznagel 2018-11-11 14:51:30 UTC
Strange...

My bisection did not come up with a good/fixed commit. And now the misrendering is BACK with current master which worked hours ago.

I'm investigating right now.
Comment 42 Jason Ekstrand 2018-11-11 15:01:45 UTC
The missing objects likely isn't a bug.  Their engine uploads stuff and fills the scene in on the fly.  The only reason the first year are complete from the start is that they are almost entirely landscape and don't have lots of individual objects.
Comment 43 Darius Spitznagel 2018-11-11 16:17:11 UTC
I have found out some nasty things.

Forget all my comments from 37-39.

The feral starter (splash screen) is doing stupid things like (GPU)analytics.

How to reproduce...
1. rm -rf .local/share/feral-interactive/Rise\ of\ the\ Tomb\ Raider/
2. start steam
3. start ROTR
4. start benchmark > misrendering
5. quit ROTR
6. change start options to "INTEL_SCALAR_VS=0 %command%".
7. start ROTR
8. start benchmark > misrendering - NOTHING changed.
9. exit steam

another try...
10. rm -rf .local/share/feral-interactive/Rise\ of\ the\ Tomb\ Raider/
11. start steam
12. start ROTR (start options are already set to "INTEL_SCALAR_VS=0 %command%" - see step 6)
13. start benchmark > NO misrendering.
14. quit ROTR
15. change start options to "INTEL_SCALAR_VS=1 %command%".
16. start ROTR
17. start benchmark > NO misrendering - INTEL_SCALAR_VS=1 has NO IMPACT.
18. quit benchmark

This is really annoying feral!
Comment 44 Jason Ekstrand 2018-11-11 17:11:17 UTC
I recommend doing a RenderDoc capture and debugging with that. If you capture the scene on the mountain and replay with `renderdoccmd replay -l 0 file.rdc` it will replay in a loop and you can see the corruption. The corruption may not occur on the first time it plays the frame but it will eventually.
Comment 45 GitLab Migration User 2019-09-18 19:49:42 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/840.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.