Summary: | [anv] Rise of the Tomb Raider always misrendering, segfault and gpu hang. | ||
---|---|---|---|
Product: | Mesa | Reporter: | Darius Spitznagel <d.spitznagel> |
Component: | Drivers/Vulkan/intel | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Status: | RESOLVED MOVED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | jason |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
misrendering on little rock left of Lara
misrendering2 on little rock left of Lara corruption missing objects game settings1 game settings2 misrendering_haswell |
Description
Darius Spitznagel
2018-09-13 19:41:49 UTC
Hello, Darius. Could you, please, provide screenshots of misrendering? Created attachment 141661 [details]
misrendering on little rock left of Lara
Created attachment 141662 [details]
misrendering2 on little rock left of Lara
These screenshots where taken with 18.0.5 because 18.2.0 crashed (seqfault) so much times that I could not make any screenshot. Environment while taking screenshots: MESA_GLSL_CACHE_DISABLE=1 Xorg 1.20.1 Mate DE 1.20.3 DDX modesetting mesa 18.0.5 Strangly there were no seqfaults in dmesg but gpu hangs... [Thu Sep 20 17:57:27 2018] [drm] GPU HANG: ecode 8:0:0x85d7dfff, in WinMain [13370], reason: No progress on rcs0, action: reset [Thu Sep 20 17:57:27 2018] i915 0000:00:02.0: Resetting rcs0 after gpu hang [Thu Sep 20 17:57:43 2018] i915 0000:00:02.0: Resetting rcs0 after gpu hang The segfaults (with mesa 18.2.0) where reported by a little feral window. Before I forget... Most missrendering was NOT visible on any screenshot. Luckily you can see them on the little rocks at the end of location 1 of the internal benchmark. I also tried to compile lunarg vktrace (debian does not deliver it! - why???) but had no luck - so no vktrace file for you:( Would it help to use RenderDoc? - is vktrace obsolete? With current master it's getting worse. Now I see black triangles in first benchmark location before segfault. Probably it will be nice to have both: renderdoc and vktrace. Please, try vktrace from here: https://vulkan.lunarg.com/sdk/home#linux And have you tried to use another modesetting-drivers or disable it? (In reply to Sergii Romantsov from comment #8) > Probably it will be nice to have both: renderdoc and vktrace. Will try RenderDoc. Vktrace did not compile as already said in comment 5. > Please, try vktrace from here: https://vulkan.lunarg.com/sdk/home#linux See above. > And have you tried to use another modesetting-drivers or disable it? Yes tried Intel DDX which changed nothing. Patch on the list to fix the crash/hang: https://patchwork.freedesktop.org/patch/254432/ The crash is fixed by the following commit on master: commit f5bab06428fc7ca6116cf0daf1c237eb86202e7a Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Tue Oct 2 17:19:32 2018 -0500 anv/batch_chain: Don't start a new BO just for BATCH_BUFFER_START Previously, we just went ahead and emitted MI_BATCH_BUFFER_START as normal. If we are near enough to the end, this can cause us to start a new BO just for the MI_BATCH_BUFFER_START which messes up chaining. We always reserve enough space at the end for an MI_BATCH_BUFFER_START so we can just increment cmd_buffer->batch.end prior to emitting the command. Fixes: a0b133286a3 "anv/batch_chain: Simplify secondary batch return..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926 Tested-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> The corruption is fixed by the following commit in master: commit dd553bc67f8ab1513fd196b6ffb7c4a76723adfd (public/master) Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Wed Oct 3 12:14:20 2018 -0500 nir/alu_to_scalar: Use ssa_for_alu_src in hand-rolled expansions The ssa_for_alu_src helper will correctly handle swizzles and other source modifiers for you. The expansions for unpack_half_2x16, pack_uvec2_to_uint, and pack_uvec4_to_uint were all broken with regards to swizzles. The brokenness of unpack_half_2x16 was causing rendering errors in Rise of the Tomb Raider on Intel ever since c11833ab24dcba26 which added an extra copy propagation to the optimization pipeline and caused us to start seeing swizzles where we hadn't seen any before. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926 Fixes: 9ce901058f3d "nir: Add lowering of nir_op_unpack_half_2x16." Fixes: 9b8786eba955 "nir: Add lowering support for packing opcodes." Tested-by: Alex Smith <asmith@feralinteractive.com> Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Hello Jason, the GPU hangs/crashes are indeed solved. But NOT not the misrendering/corruption. Could it be that you have accidentally commited the wrong patch? Could you be more specific about what you think it's being misrendered? Perhaps open the screenshot up in an image editor and circle it in red and then describe why you think it's wrong. I played for two hours yesterday and, apart from an occasional Z-buffer flicker, everything looked fine to me. In the first internal benchmark location (be ware I did NOT play the game) - the snowy mountains - there are little triangles moving and blinking over the rocks. Sometimes black boxes appear in the sky too. My game preset is set to LOWEST if this helps. I will make some screen shots when I'am at home. Hopefully the corruptions are more visible this time. Ok, the corruption mentioned in the above patch was causing everything to be black on anything other than lowest settings and causing lighting to be wrong on lowest with ambient occlusion enabled. A little z-flickering is not nearly as bad of a bug. :-) Created attachment 141915 [details]
corruption
I took more than 20 screenshots, but on all of them the corruption is not visible. Only at the end of the first location (red circle). In "real world" all the rocks in red AND green circles are totally corrupted. When I change the preset to low or middle it gets even more worse. Than I have black boxes and flashing dots in addition. Ok, I see it now. Yeah, that's the same Z-flickering I saw while playing. Not 100% sure what's going wrong there; probably a shader precision issue. Can you take a picture of the screen with the corruption? I suspect something else may be going on that's deeper in the display system and doesn't show up in screenshots. Ok, I've made a little mp4-movie made with smartphone. You can download it here... https://www.goodbytez.de/mesa/20181006_140346.mp4 Hello Jason, any news on this? Compiled newest git master today - no improvement. I took a spin and started a new game... Graphical glitches everywhere. It really doesn't make fun at all. I tried to reproduce it on Haswell but no glitch happened. Used environment: Haswell: CPU: Intel Core i5-4300M; GPU: Intel® HD Graphics 4600 Ubuntu 16.04; kernel 4.15.0-36-generic; Vulkan 1.1.80, mesa 18.3.0. Hello Marina, luckily I have two Gigabyte BRIX boxes... one with Broadwell and the other with Haswell. I took the HDD from the BDW one and hooked it into the HSW one. You are 99% right. There are NO glitches like with Broadwell in Location one and two. But missing objects (from the tree house) in location three (see screenshot). The objects are also missing with Broadwell. So now we have two systems that do not work correctly. Created attachment 142265 [details]
missing objects
I think the important difference between HSW and BDW is that as far as I know HSW is using vectors and BDW scalars for GLSL. My previous work on this game has been on Skylake which, if you're using Broadwell, may explain why we were seeing such different results. I pulled out my broadwell and have it installing now. I may be able to play with it some tomorrow. Can you provide your exact graphics settings? Screenshot of the settings page would work well. I tried again, this time on Broadwell, and I still can't reproduce it. :( Also, just to be sure, are you running a 64-bit system? Yes, I'm on 64bit Debian Stretch. darius@pc1:~$ glxinfo | grep string server glx vendor string: SGI server glx version string: 1.4 client glx vendor string: Mesa Project and SGI client glx version string: 1.4 OpenGL vendor string: Intel Open Source Technology Center OpenGL renderer string: Mesa DRI Intel(R) Iris Pro 6200 (Broadwell GT3e) OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.3.0-devel (git-8676af12c8) OpenGL core profile shading language version string: 4.50 OpenGL version string: 3.0 Mesa 18.3.0-devel (git-8676af12c8) OpenGL shading language version string: 1.30 OpenGL ES profile version string: OpenGL ES 3.1 Mesa 18.3.0-devel (git-8676af12c8) OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10 Created attachment 142284 [details]
game settings1
Created attachment 142285 [details]
game settings2
Turning my settings all the way down, I can now see it. It looks like one of the filters was masking something. I'll try to take a look into why. Created attachment 142400 [details]
misrendering_haswell
Hi Darius, Jason,
I noticed some new rendering issues on HSW. Video is attached. Are you able to reproduce it ? I'm using the same graphics settings as you.
Mesa coomit id: 792dde66f253acb46396ac8be1801d2d878d30bf
> But missing objects (from the tree house) in location three (see screenshot).
> The objects are also missing with Broadwell.
I can not reproduce this on HSW.
(In reply to vadym from comment #33) > Created attachment 142400 [details] > misrendering_haswell Oh, looks like this is not an issue. So please ignore this message. Couple of notes: 1) It appears to be an issue with texture coordinates getting messed up. Not sure exactly how yet but that seems to be the actual problem. 2) The bug goes away if INTEL_SCALAR_VS is set to 0 so it's some sort of miscompiled vertex shader. Hello Jason, INTEL_SCALAR_VS=0 changed nothing for me with git-8676af12c8 (last one I've tested - see Comment 29). But with git-552642066f from today and INTEL_SCALAR_VS=0 I can confirm, that ALL rendering issues are gone (at least for the benchmark (ingame not tested)). Something between these master releases has improved the situation. Hello Jason, with todays master git-552642066f the misrendering in location one and two IS FIXED. No need to use "INTEL_SCALAR_VS=0" on my Broadwell system. The only problem which remains is that on the very FIRST run there are missing objects in location three (see screenshot "missing objects"). Setting INTEL_SCALAR_VS=0 for the first run doesn't change that. But these objects GET VISIBLE during every NEXT... run of the benchmark. If it helps... Even with "MESA_GLSL_CACHE_DISABLE=1" the missing objects from the tree house appear at every NEXT run in location three. I was eager to know if it has something to do with the disk shader cache, but seem no. Right now I'm bisecting the first good/fixed commit for the misrendering in location one and two. Strange... My bisection did not come up with a good/fixed commit. And now the misrendering is BACK with current master which worked hours ago. I'm investigating right now. The missing objects likely isn't a bug. Their engine uploads stuff and fills the scene in on the fly. The only reason the first year are complete from the start is that they are almost entirely landscape and don't have lots of individual objects. I have found out some nasty things. Forget all my comments from 37-39. The feral starter (splash screen) is doing stupid things like (GPU)analytics. How to reproduce... 1. rm -rf .local/share/feral-interactive/Rise\ of\ the\ Tomb\ Raider/ 2. start steam 3. start ROTR 4. start benchmark > misrendering 5. quit ROTR 6. change start options to "INTEL_SCALAR_VS=0 %command%". 7. start ROTR 8. start benchmark > misrendering - NOTHING changed. 9. exit steam another try... 10. rm -rf .local/share/feral-interactive/Rise\ of\ the\ Tomb\ Raider/ 11. start steam 12. start ROTR (start options are already set to "INTEL_SCALAR_VS=0 %command%" - see step 6) 13. start benchmark > NO misrendering. 14. quit ROTR 15. change start options to "INTEL_SCALAR_VS=1 %command%". 16. start ROTR 17. start benchmark > NO misrendering - INTEL_SCALAR_VS=1 has NO IMPACT. 18. quit benchmark This is really annoying feral! I recommend doing a RenderDoc capture and debugging with that. If you capture the scene on the mountain and replay with `renderdoccmd replay -l 0 file.rdc` it will replay in a loop and you can see the corruption. The corruption may not occur on the first time it plays the frame but it will eventually. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/840. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.