Bug 106187 - Vulkan apps run on secondary GPU on multi-GPU system
Summary: Vulkan apps run on secondary GPU on multi-GPU system
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/radeon (show other bugs)
Version: 17.3
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-23 11:16 UTC by Kristoffer
Modified: 2019-09-18 19:51 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
GPU selection in launcher for RotTR (1.18 MB, image/png)
2018-04-23 11:16 UTC, Kristoffer
Details
vulkaninfo (192.95 KB, text/plain)
2018-04-23 11:51 UTC, Kristoffer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kristoffer 2018-04-23 11:16:35 UTC
Created attachment 138997 [details]
GPU selection in launcher for RotTR

I have two Sapphire R9 Fury's and Vulkan apps default to run on my secondary headless card which performs worse compared to using the primary card.

I was actually in contact with Feral Interactive support about this after seeing the issue in Rise of the Tomb Raider, but they claim it's a Vulkan loader or driver issue, so why don't I just paste the e-mail here as it describes the problem in more detail:

"Hi Feral!
This isn't really a support request. It's more of a suggestion for improving performance on multi-GPU systems.
I'm on Debian Sid with kernel 4.15 and Mesa 17.3.9 and noticed that Vulkan games, including Rise of the Tomb Raider runs on my secondary GPU.
By switching over to my primary GPU in the Feral launcher, I've increased my benchmark score from 60fps to 73fps (1080p high) and the game runs noticably smoother. This is something I have to do every time I launch the game becouse the setting resets between play sessions.
I'ts a bit wierd that the primary GPU is in the bottom of the list. I wonder if this is a RADV issue since it happens in all Vulkan applications, including non Feral apps.


Debian Sid
Kernel 4.15
Mesa 17.3.9
Ryzen 1700X
Asus PRIME X370-PRO
32GiB DDR4-2400 cl14
Sapphire Nitro R9 Fury (two of them)"
Comment 1 Mike Lothian 2018-04-23 11:32:21 UTC
In a laptop situation the 2nd card is usually the more powerful - least this app lets you choose which card.

If you run vulkaninfo, which card is listed first?
Comment 2 Kristoffer 2018-04-23 11:51:39 UTC
Created attachment 138998 [details]
vulkaninfo
Comment 3 Kristoffer 2018-04-23 11:52:40 UTC
I can't really tell which shows up first when running vulkaninfo because they look identical.
Comment 4 Mike Lothian 2018-04-23 13:42:23 UTC
I wonder if that's a bug in the PRIME code, I wouldn't have expected to see a drop in FPS that much if the cards are identical. Out of interest, do you see differences at other resolutions?

As for selecting the first one automatically, I think the Launcher should be remembering the card you last used, and that would be up to Feral to fix
Comment 5 Marc Di Luzio 2018-04-23 13:56:12 UTC
(In reply to Mike Lothian from comment #4)
> As for selecting the first one automatically, I think the Launcher should be
> remembering the card you last used, and that would be up to Feral to fix

Figured I'd drop in here.

So the underlying issue here is that the two cards returned by vkEnumeratePhysicalDevices are effectively identical, so we try and select the same card we had before, and since they both match it picks the first. We also end up defaulting to the first because we have to mostly assume the first one is the primary, given no other viable information on that matter.

It'd be beneficial, if the performance difference isn't resolvable, to somehow return the "primary" card first in the vkEnumeratePhysicalDevices list.

However, we'll track the failure to remember internally and see if we can deal with it for now based on the order we're given the duplicates.
Comment 6 Kristoffer 2018-04-23 14:04:32 UTC
(In reply to Mike Lothian from comment #4)
> I wonder if that's a bug in the PRIME code, I wouldn't have expected to see
> a drop in FPS that much if the cards are identical. Out of interest, do you
> see differences at other resolutions?
> 
> As for selecting the first one automatically, I think the Launcher should be
> remembering the card you last used, and that would be up to Feral to fix

1440p high
"Primary" GPU: 55fps
"Secondary" GPU: 47fps
Comment 7 Kristoffer 2018-04-23 14:12:07 UTC
This probably goes without saying, but OpenGL and OpenCL apps does not have this issue. They will run on the GPU or GPU's I expect them to.

For example, 'ethminer --opencl-device 0' will run on my primary while 'ethminer --opencl-device 1' will run on my secondary.

Also, if you think this is a Vulkan issue rather than driver issue then I already opened an issue on Khronos GitHub. Feel free to join the discussion here:

https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/issues/2600
Comment 8 Alex Deucher 2018-04-23 14:36:28 UTC
There is extra overhead when using secondary GPUs in conjunction with prime when you want to see the results on the display.  The display is only attached to one GPU so when you render with the other GPU, the contents have to be copied to the GPU with the display in order to see the results.
Comment 9 Bas Nieuwenhuizen 2018-04-23 14:44:00 UTC
So I think there are several issues in this bug:

1) Running on a GPU that is not connected to the display is slower. Arguably we should be able to optimize this a bit, especially if the two cards are identical, but there is always some extra bus traffic that is going to happen (and not sure if card B can directly scanout from memory in card A).

2) radv does not return the "primary" GPU first. The problem here is that there is not really a primary GPU. You can have multiple cards, with multiple X server, and we only know which one we are going to use for display once the app gives us an X surface, which is after we have listed the devices. Of course the common case is that there is only one X server running. I'm wondering if we could reasonably do some out of band communication to figure out if there is one X server and what GPU it uses and then preferably list that?

However, generally this is not solvable, especially when the GPUs are not duplicate. e.g. when there is an intel iGPU + Radeon dGPU the driver does not decide the ordering at all, the loader does.

Apps should be able to check this by checking if a specific GPU supports present from any of its queues, but then again we enabled PRIME by default, so radv always says yes ....

A solution might be disabling PRIME by default, but that breaks any game which just picks device 0, which is likely a lot of them ... And IMO slower performance is better than complete breakage, especially since those games often don't really provide a workaround.

3) As for remembering GPUs even in the duplicate case, the external memory extensions have a GPU identifier. I know it is technically not guarantueed to be stable, but for radv at least it should be stable across reboots (and mostly across driver upgrades), and probably the best you can get.
Comment 10 Christian König 2018-05-03 10:00:10 UTC
Just some technical feedback.

(In reply to Bas Nieuwenhuizen from comment #9)
> 1) Running on a GPU that is not connected to the display is slower. Arguably
> we should be able to optimize this a bit, especially if the two cards are
> identical, but there is always some extra bus traffic that is going to
> happen (and not sure if card B can directly scanout from memory in card A).

At least with dedicated AMD GPUs that isn't possible, but for APUs we recently enabled scanout from system memory.
Comment 11 sherwin 2018-08-06 14:53:16 UTC
I have the same problem using DXVK, and there is a similar bug report on vkmark:
https://github.com/vkmark/vkmark/issues/10

It seems that there is something in the RADV driver that causes GPUs to enumerate in reverse order as they appear on the PCIe bus.

The simplest fix that would apply to most applications would be such that sorting of devices presented by the driver should be in the same order as enumeration on the PCIe bus, which should respect slot number and be pretty straightforward (if requiring a small hardware modification) for the user to manage.

It would be even awesome-r if the device name presented could include a unique number (preferably PCIe slot number, if not just a unique number generated when enumerated), so that programs that presented the user with a selection could allow for some clarity when choosing.
Comment 12 Ilia Mirkin 2018-08-06 14:59:13 UTC
I believe that's the loader (in src/loader) - nothing radv specific. It's annoying for GL purposes too, with DRI_PRIME=1 picking the *last* GPU instead of the second.

It used to work this way, but then the logic got changed. I'd be in favor of fixing the reverse-sort.
Comment 13 Thomas Crider 2018-09-16 15:14:57 UTC
Hi, I've run into this issue with DOOM 2016 recently:

so I have a ryzen 2400g and an rx 580 in this system
both vulkan capable

for whatever reason, doom renders on my vega in my 2400g and outputs to whatever display i have hooked up

which is why i was only getting 20-30 fps on like medium settings with my RX 580 set as my primary gpu in bios.

the only way I was able to get around this was to go in my bios and completely disable the integrated graphics

Once I disabled the 2400g's igpu, my framerates returned to normal with my RX 580

Tested on both llvm-svn and mesa-git as well as llvm 6 + mesa 18.2
Comment 14 GitLab Migration User 2019-09-18 19:51:12 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/849.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.