Bug 103852 - Rendering errors when running dolphin-emu with Vulkan backend, radv (Super Smash Bros. Melee)
Summary: Rendering errors when running dolphin-emu with Vulkan backend, radv (Super Sm...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/radeon (show other bugs)
Version: 17.3
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-22 22:55 UTC by Ben Clapp
Modified: 2018-05-15 19:50 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
text file containing output from glxinfo and vulkaninfo (150.73 KB, text/plain)
2017-11-22 22:55 UTC, Ben Clapp
Details
Dump of optimized shaders in scene with incorrect rendering of vertex color. (28.93 MB, text/plain)
2018-04-04 14:22 UTC, Ben Clapp
Details
Dump of unoptimized shaders in scene with incorrect rendering of vertex color. (56.98 MB, text/plain)
2018-04-04 14:25 UTC, Ben Clapp
Details
vulkaninfo output when using mesa 18.0. (101.15 KB, text/plain)
2018-04-04 15:21 UTC, Ben Clapp
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Clapp 2017-11-22 22:55:41 UTC
Created attachment 135676 [details]
text file containing output from glxinfo and vulkaninfo

The version of dolphin-emu used for testing was version 5.0-5874 (this commit: https://github.com/dolphin-emu/dolphin/commit/01794126ade973a125161ca0ea9904197bccedc3 )

OS used is Debian 10 Buster (the current testing branch of debian).
I've attached the output of glxinfo and vulkaninfo, from which you can see I'm currently on Mesa 17.2.5.
The GPU used is a RX 580.

When playing Super Smash Bros. Melee (NTSC, version 1.02), a number of minor rendering issues/errors can be observed when using the Vulkan backend:
* The game's title screen does not render correctly.
* The background does not render correctly for some stages (Fountain of Dreams, Final Destination, etc...)
* The background for the trophy gallery does not render correctly
* The background for the small screen showing fighters clapping in the results screen seems to renders the wrong color (if playing as P1 against a CPU, player 1 should render red, not light-blue)
* Turning on cropping (Options -> Graphics Settings -> Advanced -> Misc -> Crop) results in a black screen.

None of the aforementioned bugs occur when using the OpenGL backend, or when using the OpenGL or Vulkan backends using NVIDIA's closed-source drivers on a different computer w/GTX 960, so I suspect these are bugs in radv.

Below is a video I recorded showing the rendering errors and steps to reproduce the above issues:
https://youtu.be/mOhB-17b0rg

For comparison, here is the game running on the OpenGL backend, which does not have these rendering issues:
https://youtu.be/owA8TOa6LcQ

As an aside, you may notice a disproportionate number of dropped frames compared to the FPS indicator in dolphin-emu at certain points in the videos I recorded.
I highly suspect this to be due to the following bug in GNOME3's Mutter compositor, and not related to mesa/radv (it occurs even when not recording):
https://bugzilla.gnome.org/show_bug.cgi?id=745032
Comment 1 Ben Clapp 2017-12-14 00:23:03 UTC
I've done some testing with mesa 17.3.0 on my computer with the RX 580 (using the mesa 17.3.0-1 package available in debian unstable).
All of the previously mentioned bugs are still present on 17.3.0, so I'm updating the version number for this ticket to 17.3 as well.
Comment 2 Ben Clapp 2017-12-25 00:54:41 UTC
It's worth noting that since originally reporting this bug, I've switched from GNOME3 to using Cinnamon for my desktop environment.
While the GNOME3 bug I mentioned in the aside clearly went away after this change (the screen stopped freezing for 1-2s every time I opened a right-click menu, which simply should not be happening on a TR 1950X CPU), the issue with unusual frame drops to around 30FPS that are not reflected by a drop in the FPS indicator in dolphin persists nonetheless.

It seems like this is an issue related to frame presentation, but where exactly the issue lies is unclear.
That the issue occurs on both GNOME3 with Wayland and Cinnamon with X suggests the issue may be unrelated to the display server and/or compositor.
I don't observe this problem when using other (3D/non-3D) applications (in a game engine I wrote myself in OpenGL, I can get a smooth 120FPS, nor do I see this frame presentation issue when using citra-emu or other applications), so it seems this frame presentation issue is unrelated to radv, and thus it might be worth opening a separate ticket to further investigate.
The fact that the issue seems to occur suggests an issue with dolphin itself rather than a mesa/graphics driver issue, however the issue isn't present when using NVIDIA closed-source drivers, so it's hard to determine where the problem lies just looking at the symptoms.
Comment 3 Ben Clapp 2017-12-25 01:00:37 UTC
>The fact that the issue seems to occur **only when running dolphin**
Comment 4 Ben Clapp 2018-03-08 05:19:22 UTC
Bug(s) still present as of 17.3.6. (The issues related to frame drops/frame presentation don't seem to be an issue at this point, but crop setting still results in black screen, incorrect colors, etc. persist)
Comment 5 Sven Arvidsson 2018-03-11 16:49:23 UTC
Here's a trace of the intro screen made with vktrace, to make this bug easier to reproduce:
https://www.dropbox.com/s/930kl7agbg3jl6o/dolphin.vktrace.xz?dl=0

(2.7M compressed, 163M uncompressed) 

I hope this is useful given vktrace caveats about replaying on different setups. It replayed and rendered correctly on my Intel Ivy Bridge.

Trace was made with vktrace from git 78f1a8149a3a6c9e48b9bd5cff6debc5726d819e

For reference, running dolphin as an argument to vktrace didn't work for me, but using it in client/server mode and tracing with VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_vktrace worked nicely.

I will retest later on a more up to date version of Mesa/radv.

HTH,
Comment 6 Ben Clapp 2018-03-30 08:18:34 UTC
Bugs still present in 18.0.0.
In addition, there's now a new bug when using the radeonsi driver, where the game will sometimes freeze altogether (not a complete GPU hang, you can kill the dolphin-emu process and move on) during normal play that did not occur in 17.3.7.
(Since this is a radeonsi-only issue and not radv, might be worth opening a separate ticket for.)

There also seems to be a very obvious stuttering issue introduced after the update to 18.0.0 that doesn't seem present in other 3D applications.
This stuttering is a bit different from the "frame presentation issue" I described before in that the audio stutters at the same time as the frame drops, and the FPS actually does drop according to dolphin's FPS indicator.
Given this stuttering was not present with the exact same system/dolphin build/etc. before updating to mesa 18.0.0, I can only guess there's something weird going on with the driver here that is causing the stuttering issue.
The stuttering occurs whether you use dolphin's GL or Vulkan backends.

This gets off the topic of the originally reported bug (and perhaps is worth opening a separate ticket over), but more recently I've found I can consistently get GPU hangs when playing certain games other than Melee in dolphin-emu, so it just seems that dolphin is in general using a range of advanced GL/Vulkan features and mesa is tripping over a number of edge-cases that aren't used by most Linux applications.
Comment 7 Sven Arvidsson 2018-03-30 12:56:17 UTC
Absolutely file new bugs for each issue. Much easier to close duplicate bugs than tracking more than one problem in a report (should it turn out be the same problem in the end).


The freeze issue sounds like a good candidate for git bisect. I might give it a try if I can reproduce and have the time.
Comment 8 Ben Clapp 2018-04-01 05:12:45 UTC
Today I spent a number of hours looking at the background rendering errors in RenderDoc.

The vertex shader outputs some vertices that have two vertex colors, colors_0 and colors_1. (Only colors_0 is relevant here)

The fragment shader does a bunch of fiddling around with colors_0, a lot of unnecessary conversions and re-assignments that effectively do nothing, and ultimately the colors_0 value is passed to rastemp and tevin_d.
Some more fiddling around and, in the case of the areas of the screen where there are rendering errors, the value of colors_0/rastemp/tevin_d ("tevin" means "TEV input", referring to the Gamecube/Wii's Texture EnVironment hardware) becomes the color value written to the framebuffer.

The problem is not in the vertex shader, nor is it in the fragment shader.
For some reason, the value of colors_0 coming out of the vertex shader is correct, but the value of colors_0 in the fragment shader is inverted! So blue will appear yellow, black while appear white, etc...

This seems to be a driver bug after all, and so I did try to spend some time looking into radv's code to try and see if I could figure out a fix.
The issue might lie in radv_pipeline.c, I would think it probably has something to do with the inter-stage varying colors_0 not getting filled or interpreted correctly.

I've done lots of OpenGL and Vulkan programming, but I have little experience with the driver side of things, so while it might be interesting to talk a bit with the radv devs and learn a thing or two, I'm not sure how much further I can go in terms of looking into this on my own without assistance.

Sven: I'll work on making some separate issues on another occasion for the radeonsi freeze for Melee and the system freezes/GPU hangs for other games.
Comment 9 Timothy Arceri 2018-04-01 11:39:18 UTC
(In reply to Ben Clapp from comment #8)
> Today I spent a number of hours looking at the background rendering errors
> in RenderDoc.
> 
> The vertex shader outputs some vertices that have two vertex colors,
> colors_0 and colors_1. (Only colors_0 is relevant here)
> 
> The fragment shader does a bunch of fiddling around with colors_0, a lot of
> unnecessary conversions and re-assignments that effectively do nothing, and
> ultimately the colors_0 value is passed to rastemp and tevin_d.
> Some more fiddling around and, in the case of the areas of the screen where
> there are rendering errors, the value of colors_0/rastemp/tevin_d ("tevin"
> means "TEV input", referring to the Gamecube/Wii's Texture EnVironment
> hardware) becomes the color value written to the framebuffer.
> 
> The problem is not in the vertex shader, nor is it in the fragment shader.
> For some reason, the value of colors_0 coming out of the vertex shader is
> correct, but the value of colors_0 in the fragment shader is inverted! So
> blue will appear yellow, black while appear white, etc...
> 
> This seems to be a driver bug after all, and so I did try to spend some time
> looking into radv's code to try and see if I could figure out a fix.
> The issue might lie in radv_pipeline.c, I would think it probably has
> something to do with the inter-stage varying colors_0 not getting filled or
> interpreted correctly.
> 
> I've done lots of OpenGL and Vulkan programming, but I have little
> experience with the driver side of things, so while it might be interesting
> to talk a bit with the radv devs and learn a thing or two, I'm not sure how
> much further I can go in terms of looking into this on my own without
> assistance.
> 

Do you think you could get a dump of the NIR and LLVM IR for the shaders in question and attach it here? You can use the following environment var to dump the shaders: RADV_DEBUG=shaders

You also might be able to catch the attention of some devs if you jump on the freenode #radeon IRC channel.
Comment 10 Ben Clapp 2018-04-04 13:49:39 UTC
Regarding the freeze when using the OpenGL backend with Mesa 18.0, it seems a different user has already reported that bug:
https://bugs.dolphin-emu.org/issues/10904
https://bugs.freedesktop.org/show_bug.cgi?id=105339

Apologies for the late response Timothy.

>Do you think you could get a dump of the NIR and LLVM IR for the shaders in question and attach it here? You can use the following environment var to dump ?the shaders: RADV_DEBUG=shaders
I'm struggling to properly dump the shaders because RADV now has an on-disk shader cache, and RADV_DEBUG=shaders seems to only print out shaders when they are actually compiled for the first time.
How can I clear and/or disable the shader cache?

>You also might be able to catch the attention of some devs if you jump on the freenode #radeon IRC channel.
I'm already lurking in there, but perhaps I'll actually say something over there sometime soon.
Comment 11 Timothy Arceri 2018-04-04 14:07:24 UTC
(In reply to Ben Clapp from comment #10)
> Regarding the freeze when using the OpenGL backend with Mesa 18.0, it seems
> a different user has already reported that bug:
> https://bugs.dolphin-emu.org/issues/10904
> https://bugs.freedesktop.org/show_bug.cgi?id=105339
> 
> Apologies for the late response Timothy.
> 
> >Do you think you could get a dump of the NIR and LLVM IR for the shaders in question and attach it here? You can use the following environment var to dump ?the shaders: RADV_DEBUG=shaders
> I'm struggling to properly dump the shaders because RADV now has an on-disk
> shader cache, and RADV_DEBUG=shaders seems to only print out shaders when
> they are actually compiled for the first time.
> How can I clear and/or disable the shader cache?

RADV_DEBUG=nocache or MESA_GLSL_CACHE_DISABLE=1 should do it. Also you can dump the unoptimised LLVM IR with RADV_DEBUG=preoptir which can be useful sometimes.
Comment 12 Ben Clapp 2018-04-04 14:22:47 UTC
Created attachment 138582 [details]
Dump of optimized shaders in scene with incorrect rendering of vertex color.
Comment 13 Ben Clapp 2018-04-04 14:25:42 UTC
Created attachment 138583 [details]
Dump of unoptimized shaders in scene with incorrect rendering of vertex color.

OK, here's your shader dumps attached to the ticket, both optimized and optimized.
There may be some unrelated shaders included in the dump due to the way dolphin/shader dumping works, but not sure there's much I can do about that.
Let me know if you need anything else.
Comment 14 Samuel Pitoiset 2018-04-04 15:13:52 UTC
Can you attach your vulkaninfo too?
Comment 15 Ben Clapp 2018-04-04 15:21:30 UTC
Created attachment 138586 [details]
vulkaninfo output when using mesa 18.0.

I already had attached my vulkaninfo, but that was back when I was using 17.2.x, so here's an updated version.
Comment 16 Samuel Pitoiset 2018-04-04 15:51:10 UTC
Thanks, are you still using the same dolphin? If not, can you report the version number, please?
Comment 17 Ben Clapp 2018-04-04 16:00:18 UTC
(In reply to Samuel Pitoiset from comment #16)
> Thanks, are you still using the same dolphin? If not, can you report the
> version number, please?

Currently using commit dea30e08b for dolphin (was latest commit in master branch about two days ago).
Comment 18 Ben Clapp 2018-05-07 02:10:44 UTC
Bug still present on 18.0.2.
I noticed flickering back and forth between a black screen and the game screen when resizing the window, so I made a short video demonstrating this:
https://www.youtube.com/watch?v=W2yuR0-z-EU
Comment 19 Ben Clapp 2018-05-08 13:59:42 UTC
Hello all, I have some insight and fixes for some of the issues described in this ticket:

First, regarding the "black screen when cropping is turned on issue", this can be worked around with the following pull request:
https://github.com/dolphin-emu/dolphin/pull/6786

In theory, there shouldn't be anything wrong with negative Y in the viewport, and you can still see black screen flickering when adjusting the window size, but with this change to dolphin's code made, the screen will never remain black after a resize (only flicker for a moment).
So this issue is probably still worth investigating on the mesa side at some point.

Regarding the strange stuttering issues I was experiencing, this is a CPU-side issue that has nothing to do with mesa.
The TR 1950X is essentially two Ryzen chips glued together.
The TR 1950X has two memory controllers, and each memory controller is owned by one of the two Ryzen chips.
So, for example, I have two 16GB RAM cards plugged into the two memory controllers on my system, and when running "numactl -H", I can see that 16GB of RAM are assigned to each of the two NUMA nodes.
It seems that the memory allocator (or maybe the scheduler?) in Linux wasn't properly allocating memory (or maybe processes) to just one of the two physical chips/just one of the RAM cards, and this resulted in stuttering (perhaps due to needing to transfer some memory from one RAM card to the other for use by another process on the other Ryzen chip?)
The stuttering can be prevented by using numactl like this:
numactl --cpunodebind=0 --membind=0 ./dolphin-emu
Comment 20 Bas Nieuwenhuizen 2018-05-14 01:16:40 UTC
Does

https://patchwork.freedesktop.org/patch/222558/

fix the background rendering for you?
Comment 21 Ben Clapp 2018-05-15 19:50:23 UTC
Bas,
Your patch does fix the issue with incorrect colors :)
Thank you very much for your hard work.

The black screen issue will still be present on versions of dolphin before the workaround was applied, and even with the workaround, black-screen flickering can be seen when resizing the window.
Given this, I would recommend closing this bug ticket and, if it seems worth exploring on the driver side at some point, opening a separate ticket with importance of "low" or "lowest" for the minor issue of black-screen flickering when resizing dolphin's window.

I'll go ahead and marked this issue as RESOLVED/FIXED.
Bas, I'll leave it to you if you want to open a separate ticket for black-screen flickering when resizing the window.
Again, thank you very much for the bugfix!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.