Bug 98626 - X server crashes on DP hotplug: radeon_mode_hotplug() in xf86-video-ati does not enumerate second GPU
Summary: X server crashes on DP hotplug: radeon_mode_hotplug() in xf86-video-ati does ...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-07 14:06 UTC by Max Staudt
Modified: 2017-01-16 14:07 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Draft solution, taking GPUScreens into account (1.70 KB, patch)
2016-11-07 14:06 UTC, Max Staudt
no flags Details | Splinter Review
Refactor radeon_mode_hotplug (2.59 KB, patch)
2016-11-08 07:47 UTC, Michel Dänzer
no flags Details | Splinter Review
Fix radeon_mode_hotplug for GPU screens (2.74 KB, patch)
2016-11-08 07:48 UTC, Michel Dänzer
no flags Details | Splinter Review

Description Max Staudt 2016-11-07 14:06:17 UTC
Created attachment 127814 [details] [review]
Draft solution, taking GPUScreens into account

I have a system with a Kabini (primary) and a Caicos (secondary) GPU.

When I boot into GNOME 3, with one DP display connected to the Kabini and a second to the Caicos GPU, everything works fine in the automatic PRIME setup. However, hotplugging is buggy: xf86-video-ati does not correctly re-detect the displays, and X crashes.

I bisected this to be caused by c801f9f10a5d, which iterates over all X screens.

With current master, X doesn't crash, but rather the screen is reconfigured to be one or three displays wide. It's the same bug though.


The problem with this commit is that it no longer enumerates the connectors on the second GPU, which is not directly backing a visible X screen. It is thus managed by the X server as a "GPU screen", which is not listed in xf86Screens, but in xf86GPUScreens.

A possible solution would be the attached patch, but it does not work due to the X server not exporting xf86NumGPUScreens and xf86GPUScreens.


What's the best solution here, should we export xf86NumGPUScreens and xf86GPUScreens so the Radeon DDX can use them?
Comment 1 Max Staudt 2016-11-07 14:56:30 UTC
c801f9f10a5d was created in response to fdo#93415.
Comment 2 Michel Dänzer 2016-11-08 07:47:16 UTC
Created attachment 127829 [details] [review]
Refactor radeon_mode_hotplug
Comment 3 Michel Dänzer 2016-11-08 07:48:08 UTC
Created attachment 127830 [details] [review]
Fix radeon_mode_hotplug for GPU screens

Do these two patches fix the problem?
Comment 4 Max Staudt 2016-11-08 12:35:38 UTC
Yes, these patches fix the crash.

However, as far as I can see, they do not allow for arbitrary number of Radeon cards in arbitrary primary/secondary configurations. So it's not a final fix...

As far as I can see, instead of going through all screens, we could walk only over the outputs attached to 'scrn', or maybe restore the behavior before c801f9f10a5d and not walk over screens at all.

Hoever that would necessitate a different way of enumerating outputs. Is it just an issue of giving every output a unique name? Maybe, instead of having globally incrementing counters per connector type, we could name the connectors "radeon-0-DisplayPort-1" or similar? Then we wouldn't have to count them globally.
Also, having the driver name in the connector name prevents clashes with other drivers that have a similar naming scheme.
Comment 5 Michel Dänzer 2016-11-09 00:58:14 UTC
You're misunderstanding.

The purpose of this code is to find all connectors of a single GPU. There can only be one or two screens associated with any given GPU controlled by this driver. My patches make use of the entity shared between two screens of a single GPU to find the "other" screen, instead of iterating over all screens. It should work correctly in all cases.
Comment 6 Max Staudt 2016-11-09 15:45:44 UTC
(In reply to Michel Dänzer from comment #5)
> The purpose of this code is to find all connectors of a single GPU.

Right, I mixed that up.


> There can only be one or two screens associated with any given GPU
> controlled by this driver.

Okay, just so I understand this: I can have, say, 6 physical displays connected to one Radeon card (GPU), but I can distribute them between no more than two X screens (as in :0.0 and :0.1)?


I looked at the code again, and now realized that you're taking the ScrnInfoPtr (primary_scrn) for the inner loop from pRADEONEnt, which in turn is a child of scrn.
That's basically doing what I suggested (looping over scrn), so now I understand how this fixes my bug.

Thank you for the quick fix!
Comment 7 Michel Dänzer 2016-11-10 03:38:23 UTC
(In reply to Max Staudt from comment #6)
> Okay, just so I understand this: I can have, say, 6 physical displays
> connected to one Radeon card (GPU), but I can distribute them between no
> more than two X screens (as in :0.0 and :0.1)?

That's correct.


Fixed in Git master:

commit 9760ef33cba5795eddeda4d5c2fcbe2dcce21689
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Tue Nov 8 13:02:43 2016 +0900

    Use pRADEONEnt to find both screens of a GPU in radeon_mode_hotplug


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.