Dell XPS13 connected via Thunderbolt to Dell TB16 docking station as described in bug #103645 is showing a regression when running a 4.16. kernel. Symptoms are a black power saved monitor connected to the DP output of TB16 after connecting the docking station. Keyboard connected via the docking station still works, so Thunderbolt itself is working fine. The patch from bug #104425 was tried out, but did not make any difference to the DP connected monitor.
Kernel 4.15.x works just fine (or as fine it can work taking into account bug #103645) with the DP monitor connected.
Is there any further logs you can provide here?
ping, or are all relevant ones in other bug?
Did you try to unplug and plug again? I think, I'm observing something similar(black screen on both integrated and external displays) with my Dell docking station as well, however it happens only on login screen and can be cured by replugging the USB C cable. Just want to check if this is the same one. If it is same, I can just debug it with my own laptop..
I have been pluggin in/out the USB-C Thunderbolt cable a few times, and at least the DP connected monitor comes out all black/power save, and usually also the HDMI one.
I could reproduce the problem with your docking station and it is slightly different from what I had with mine. For me there is definitely usb driver stack is involved because I get actually some errors from usbhid and external keyboard/mouse occasionally not work.
I think, I could fix the mentioned above issues, at least on mine machine.
The issue here actually consists of two issues: first one is something about Thunderbolt docking station initialization, it seems that it doesn't work properly if it is used as a separate module(periodically I had disconnecting mouse and keyboard, also dislpay). After I started using Thunderbolt as part of kernel it got significantly better.
The second one, i.e black screen seems to be caused by wrong watermark calculations, after adding drm.debug I could see multiple error messages from skl_compute_wm function telling that "requested configuration exceeds maximum wm level". To check this hypothesis, I've implemented a small hack, so that if required res_blocks turns out to be >= than ddb_allocation that just assign.
After those changes I don't get black screen anymore, even after multiple connecting/disconnecting Thunderbolts USB-C cable.
However some USB peripherals might still stop working after that, which I believe is a USB hub related problem.
Minor update: looks like the watermarks go outside ddb_allocation, when framebuffer is in Y_Tiled mode, first of all drm_framebuffer_init complains with "no Y Tiling for legacy addfb", however it seems to continue using Y Tiling when computing watermarks as wp->y_tiled is set to true, which leads it to go outside ddb_allocation for resolution 3840x2560 used with eDP, which leads to black screen.
I could fix it either by forcing X Tiling instead of Y Tiling or just by assigning res_blocks to the current maximum ddb_allocation value.
Which way is correct here, I still have to understand.
After discussing with Ville to me it looks either there might be an issue with watermark calculation algorithm(I will check with bspec if there is a mistake) and also there is an issue with drm_mode_addfb2 as userspace attempts to use Y Tiling for buffer object, but without using explicit FB modifiers, which leads to an error. It seems to fall back to X Tiling which doesn't exceed watermarks, only once we replug the display.
If there are any logs available, I can provide more accurate debug. My analysis is:
In Gen-9 we have enabled few Display WA's which are increasing WM requirement almost by double for Y-tiling. Which may be resulting in WM requirement being
more than available DDB.
Assigning (res_blocks = ddb_allocation - 1) is not right solution, as with that we'll violate HW watermark requirement (it may work for some scenario but not really a right solution).
If failing during sanitize_watermarks will make userspace to fallback to X-tiling then that's right solution.
After discussion it was decided that I need to write a summary in order to escalate this issue further. Problem is that for some architectures Y Tiled mode goes beyond DDB resources available, especially when used with multiple 4K displays.
Currently userspace has no way or procedure to determine and fallback to proper
or at least working display mode, so in some cases, drm_mode_setcrtc fails like with this bug and we get black screen.
It was proposed that we might need to fix this in mesa in order to make it be able to determine if the display mode fits into WM requirements, before it attempts to do a modeset. One, heuristic way could be to probe, if the ddb_allocation can fit at least twice as required buffer(most common scenario) in order to be able to determine if Y Tiled mode is usable, otherwise switch to X Tiled mode. So basically fixing this might require changes both in user and kernel space. In kernel space, an ioctl to query the minimum required ddb_allocation then might be needed.
There is also a proposal to develop, a correspondent stress test case in IGT, in order to be able to detect similar kind of situations.
Yes, please fix this properly even if it takes more time. Meanwhile, does upgrading to a newer kernel version help with the problem in any way? I'm running upstream Linus kernels, so upgrading a kernel is not a problem. With Mesa and other userland I'm more stuck with what is available from distros, e.g. Debian testing.
I've discovered one more "funny" issue, which probably Patrik was also facing:
each second time, when Thunderbolt is disconnected and then connected back, the external display doesn't work.
I've checked with logs + added own traces - seems that kernel(4.18-rc7) sends hotplug event as it should, however each second time, we don't get drm_mode_setcrtc for PIPE_B from userspace. To me it looks like a userspace issue, as kernel seems to reach properly.
I've attached the corresponding logs (not_working_pipe_b - for not working case and working_pipe_b - for working case).
It changes for each second time mostly, which indicates some logic problem and the only difference is that we simply don't get drm_mode_setcrtc for pipe B despite that hotplug was sent through sysfs, which was verified by additional traces(see the logs).
Created attachment 140970 [details]
Cae for not working external display
Created attachment 140971 [details]
Case for working external display
It has been confirmed that correspondent uevent is delivered to the userspace by observing correspondent traces in ddx sna_handle_events gets uevent which is then analyzed by sna_mode_discover which in turn sends the randr notification (RRTellChanged, RRCrtcNotify) to xserver clients. In non-buggy case we get a message ProcRRSetScreenConfig from the desktop clients in response, which then triggers modeset(drm_mode_setcrtc).
However this doesn't happen for some reason when bug happens, despite that DRM connector state is the same as in non-buggy case.
After discussion with Martin, made a decision that this bug can be closed as non-fixable at least from kernel side and correspondent bug filed for gnome-desktop/Ubuntu.
(In reply to Stanislav Lisovskiy from comment #15)
> After discussion with Martin, made a decision that this bug can be closed as
> non-fixable at least from kernel side and correspondent bug filed for
Sounds like a plan. Please add a link to the bug for gnome-desktop/Ubuntu/etc. here, and remember to add all proper adjectives when describing it so that it will get fixed rather sooner than later. It would be great if someone also could keep an eye on the bug and report back here in which version of what desktop component it got fixed so that updated distro versions would be easier to track.
The only thing I can so far recommend at least as one simple workaround is once this happens you can execute xrandr --output <output name> --crtc <some crtc number>. Output name can be figured out by executing xrandr without arguments, as for crtc id, I just tried different ones until secondary screen starts working properly. Eventually you can manually restore secondary screen to proper state by using this command(it just basically sends ProcRRSetScreenConfig request, which we are lacking from desktop manager).
Patrik, can you try above Stanislav instructions?
I can try it as soon as I get my dock back from Stanislav. Meanwhile, what is the bug filed for userland, so that those interested can follow it?
To be honest I didn't file a bug yet. I think I need some time still to figure out where exactly it should filed.
So, I think I had a user here who was running into this bug (on F29). Specifically, what happens for him is the following:
* Boot with dock plugged
-> External monitor is on connector DP-1-2
* Unplug dock
* Plug dock back in
-> External monitor is on connector DP-5 (according to sysfs)
After this xrandr still reports the display on connector DP-1-2 though. The first time the display could not be configured at this point, the second time it worked despite the inconsistency.
I've located a problem. It was in xserver/ddx code, at some point thinking crtc hasn't changed, while it did. I've made some raw patch for xserver and it works for me, but I haven't sent it yet, because basically it just removes part of crtc checking code, which prevents drm_mode_setcrtc ioctl to be done.
As a temporary workaround you can try xrandr --output (your not working output name) --crtc (some number here) - this will force drm_mode_setcrtc call to be made and the screen will be back.
I can also post my temporary fix here, however it requires rebuilding xserver from scratch and then installing it.
Created attachment 142488 [details] [review]
Attaching here a temporary fix for the xserver, if somebody wants to get it fixed, before I figure out the correct way to fix this and send this patch to upstream.
So that fix relates to X. Is there something similar to be done for Wayland? I should perhaps re-test all this, but right now I have only one DP monitor connected to the Dell dock.
I think I've identified the real reason for this bug: the problem is that kernel
allocates dynamically new connector id for DP MST devices each time it is plugged/unplugged and adding/removing correspondent connectors. That seems to confuse userspace into thinking that connector is still in a connected state, thus leading to a lost modeset.
In order to fix that, we must either return only active DP MST connectors or check connector states more carefully on userspace side.
I've implemented both fixes, however not sure which one is correct and some things still need to be understood.
However, it could be helpful if somebody tries those and report if it fixes problem. Userspace is implemented for Intel DDX, something similar I guess might be needed for XWayland or modesetting.
Created attachment 142519 [details] [review]
Userspace fix for Intel DDX(xf86-video-intel)
Userspace fix for Intel DDX(xf86-video-intel)
Created attachment 142520 [details] [review]
Kernel fix, should work without userspace changes, however not sure if that it correct still.
The patch sounds promising to me. I can have a look if I can reproduce the issue and make a test build to check whether it fixes the problem.
AFAIK, mutter (i.e. the GNOME wayland compositor) is not affected by this issue.
So, I did make a build for F29 with the patch:
Unfortunately, the user I had and also myself are be unable to reproduce the issue properly. i.e. the monitor appears to come back correctly at least most of the times.
(In reply to Benjamin Berg from comment #29)
> So, I did make a build for F29 with the patch:
> Unfortunately, the user I had and also myself are be unable to reproduce the
> issue properly. i.e. the monitor appears to come back correctly at least
> most of the times.
Do you mean, that it comes back properly without the patch?
For faster reproduction, consider plugging right immediately after unplugging - for me it reproduces almost every second time with recent drm-tip.
(In reply to Stanislav Lisovskiy from comment #30)
> Do you mean, that it comes back properly without the patch?
Yeah. I had a user on F29 where the display on the dock could not be configured anymore (X11, cinamon, T480s, Thunderbolt dock, after being away for an hour). However, this only happened exactly once. Since then neither the user nor myself have been able to reproduce the issue using different setups.
*** Bug 109059 has been marked as a duplicate of this bug. ***
(In reply to Benjamin Berg from comment #31)
> (In reply to Stanislav Lisovskiy from comment #30)
> > Do you mean, that it comes back properly without the patch?
> Yeah. I had a user on F29 where the display on the dock could not be
> configured anymore (X11, cinamon, T480s, Thunderbolt dock, after being away
> for an hour). However, this only happened exactly once. Since then neither
> the user nor myself have been able to reproduce the issue using different
Do you still have this issue with latest drmtip?
(In reply to Lakshmi from comment #34)
> (In reply to Benjamin Berg from comment #31)
> > (In reply to Stanislav Lisovskiy from comment #30)
> > > Do you mean, that it comes back properly without the patch?
> > Yeah. I had a user on F29 where the display on the dock could not be
> > configured anymore (X11, cinamon, T480s, Thunderbolt dock, after being away
> > for an hour). However, this only happened exactly once. Since then neither
> > the user nor myself have been able to reproduce the issue using different
> > setups.
> Do you still have this issue with latest drmtip?
I checked it. I also checked again - and it seems that mostly the issue is on the user space side. When this happens, kernel detects and reports everything correnctly. I checked that we get correspondent uevent, and also GET_CONNECTOR ioctls show correct output statuses. I traced the issue up to the point when intel ddx driver sends an update to desktop manager, which however doesn't trigger a modeset. I think it most likely gets confused because kernel allocates and new connector id, each time DP MST device is connected or disconnected, however as I understand this is the way currently, how it is supposed to work.
Patrik, The outcome of the investigation is that, this issue is related to Gnome desktop manager. Please report this issue here https://gitlab.gnome.org/GNOME/mutter/issues
Closing this issue.