Bug 106223

Summary: [KBL-R] DP link training failures lead to downgraded link parameters and resolution
Product: DRI Reporter: Ricardo Ribalda <ricardo.ribalda>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: low CC: intel-gfx-bugs, jani.nikula, manasi.d.navare, shashank.sharma, ville.syrjala
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: Triaged, ReadyForDev
i915 platform: KBL i915 features: display/DP
Attachments:
Description Flags
Screen detected as HD
none
Screen detected as 1440p
none
Workaround for this issue
none
dmesg for drm tip
none
Screen detected as 1440p using drm tip and drm.debug=6
none
Screen detected as HD using drm tip and drm.debug=6
none
dmesg logs for the issue where "Link Training failed at link rate = 540000"
none
dmesg logs for the issue where "Link Training failed at link rate = 540000"
none
dmesg logs for the issue where "Link Training failed at link rate = 540000" none

Description Ricardo Ribalda 2018-04-24 21:00:48 UTC
Created attachment 139073 [details]
Screen detected as HD

Tested on Debian 4.15 and 4.16 (from experimental)

I have a secondary monitor connected via USB-C adapter to HDMI (detected as DP by xrandr). It can manage resolutions up to 2560x1440.

Most of the time, when the system is booted the resolution is detected
ok, but If I suspend the machine, or replug the screen, or alternate
to the text console, the resolution is "downgraded" to Full HD.

It is very anonying to reconect the cable up to 5 times to get the expected resolution when this happens. 

I have added the paramter drm.debug=0x06 and I can see that when it is on Full HD there are only 2 lanes detected instead of 4.

The adapter is brand new (Xiaomi) and the cable should be of good
quality (ethernet capable and tested on other platform).

Attached you will find a trace at Full HD and 1440p
Comment 1 Ricardo Ribalda 2018-04-24 21:01:11 UTC
Created attachment 139074 [details]
Screen detected as 1440p
Comment 2 Ricardo Ribalda 2018-04-24 21:29:02 UTC
Just tested on drm-tip with similar results
Comment 3 Ricardo Ribalda 2018-04-24 21:57:38 UTC
But the attached patch fixes the bug for me. So far I have only seen 1 retry, so we could tune it to: tries < 2...
Comment 4 Ricardo Ribalda 2018-04-24 21:58:00 UTC
Created attachment 139077 [details] [review]
Workaround for this issue
Comment 5 Ricardo Ribalda 2018-04-24 22:14:32 UTC
Created attachment 139078 [details]
dmesg for drm tip
Comment 6 Ricardo Ribalda 2018-04-24 22:19:15 UTC
Created attachment 139079 [details]
Screen detected as 1440p using drm tip and drm.debug=6
Comment 7 Ricardo Ribalda 2018-04-24 22:19:38 UTC
Created attachment 139080 [details]
Screen detected as HD using drm tip and drm.debug=6
Comment 8 Jani Saarinen 2018-04-25 05:48:12 UTC
Document CPU: CPU0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (family: 0x6, model: 0x8e, stepping: 0xa) => KBL-R.
Comment 9 Jani Saarinen 2018-04-25 05:49:03 UTC
Ville, Jani, any advice here?
Comment 10 Jani Nikula 2018-04-25 06:18:17 UTC
For some reason the link training sometimes fails at the higher link bandwidth, and, as expected, the link is downgraded to ensure we get something other than a black screen. While I'm sure the degraded resolution is very annoying, it's surely more desirable than a black screen.

Needs further analysis of the link training failure.

Side note, you seem to be missing the DMC firmware, available from the linux-firmware repository. Probably unrelated to the problem at hand, but, well, you never know.
Comment 11 Ricardo Ribalda 2018-04-25 06:56:37 UTC
Hi Jani

Thanks for your comment. I believe that with the proposed workaround there won't be a black screen situation, on the worst case and when the link is bad, it will be 5 times slower on the link training.  

We can even reduce that to two times. I have never seen my system failing training twice in a row.
Comment 12 Jani Nikula 2018-04-25 11:56:10 UTC
What proposed workaround?
Comment 13 Ricardo Ribalda 2018-04-25 12:24:28 UTC
This patch https://bugs.freedesktop.org/attachment.cgi?id=139077(In reply to Jani Nikula from comment #12)
> What proposed workaround?

This patch https://bugs.freedesktop.org/attachment.cgi?id=139077
Comment 14 Jani Saarinen 2018-04-25 14:49:50 UTC
Ville, Jani how does that patch mentioned looks like?
Comment 15 Ricardo Ribalda 2018-04-25 18:25:13 UTC
I just got a situation where I got 2 and 3 failedd attempts before an ok sync. So it seems that 5 tries it not that bad guess.

BTW, in bot situations the amount of black screen time was within reason.
Comment 16 Clinton Taylor 2018-04-25 21:05:47 UTC
Ricardo,
   Do you have the model number of the Xiaomi dongle? We need to find the vendor of the LSPCON in the dongle.
Comment 17 Clinton Taylor 2018-04-25 21:11:22 UTC
Nevermind. It looks like they only make one product that is a USB-C hub with HDMI output. 

Xiaomi USB Type-C to HDMI Multifunction Adapter


Is this the device you are using?
Comment 18 Ricardo Ribalda 2018-04-25 21:18:58 UTC
This is the device: 
http://item.mi.com/1163000011.html

On the device it can be read, among other Chinese characters:

USB-C HDMI
Model: ZJQ01TM


https://photos.app.goo.gl/biCnqgD1RJCRjrrR2
Comment 19 Ricardo Ribalda 2018-04-25 21:22:40 UTC
One of the reasons to use this adapter was to be able to charge the notebook while using the adapter.

Seems that it is a "proprietary" feature and other adapters might not support it. But I am not an expert on cables :P.
Comment 20 Manasi 2018-04-25 21:40:30 UTC
Retrying the Clock recovery phase 5 times without giving up is what the hack used to do earlier before it was removed for DP compliance passing.
But looks like there are more non compliant panels than the compliant ones and may be we need that hack permanently so that the driver doesnt fallback the link parameters way too quickly.
This same work around probably could be tried for :
https://bugs.freedesktop.org/show_bug.cgi?id=105338

Manasi
Comment 21 Clinton Taylor 2018-04-26 17:37:01 UTC
This issue could also manifest if link training is being attempted before the USB-C Hub is ready to accept a signal. A small delay before attempting Link Training with a USB-C LSPCON may work.

Ricardo,
   Could you try adding a short delay msleep (200) to the start of intel_dp_start_link_train() and see if your retries drops to 0?
Comment 22 Ricardo Ribalda 2018-04-26 17:51:00 UTC
(In reply to Clinton Taylor from comment #21)
> This issue could also manifest if link training is being attempted before
> the USB-C Hub is ready to accept a signal. A small delay before attempting
> Link Training with a USB-C LSPCON may work.
> 
> Ricardo,
>    Could you try adding a short delay msleep (200) to the start of
> intel_dp_start_link_train() and see if your retries drops to 0?

That did not do the trick :(, also it does not explain why I got a couple of times more than one retries.

@@ -317,10 +317,18 @@ void
 intel_dp_start_link_train(struct intel_dp *intel_dp)
 {
        struct intel_connector *intel_connector = intel_dp->attached_connector;
+       int tries;
 
-       if (!intel_dp_link_training_clock_recovery(intel_dp))
-               goto failure_handling;
-       if (!intel_dp_link_training_channel_equalization(intel_dp))
+       msleep(200);
+
+       for (tries = 0; tries < 5; tries++) {
+               if (!intel_dp_link_training_clock_recovery(intel_dp))
+                       continue;
+               if (intel_dp_link_training_channel_equalization(intel_dp))
+                       break;
+       }
+
+       if (tries == 5)
                goto failure_handling;
 
        DRM_DEBUG_KMS("[CONNECTOR:%d:%s] Link Training Passed at Link Rate = %d, Lane count = %d",
Comment 23 Clinton Taylor 2018-04-26 19:34:39 UTC
The msleep(200); should be before the first clock recovery call. Right after the int tries;

I'm trying to give extra time for the LSPCON and micro controller in the USB-C dongle to complete its initialization before we try and use it. Clock Recovery is a very basic function and should not be causing this much issue. Equalization I can see compatibility issue, but not clock recovery.
Comment 24 Ricardo Ribalda 2018-04-26 19:44:05 UTC
(In reply to Clinton Taylor from comment #23)
> The msleep(200); should be before the first clock recovery call. Right after
> the int tries;
> 
> I'm trying to give extra time for the LSPCON and micro controller in the
> USB-C dongle to complete its initialization before we try and use it. Clock
> Recovery is a very basic function and should not be causing this much issue.
> Equalization I can see compatibility issue, but not clock recovery.

I believe I placed the msleep where you are saying:  This is the code that I tested:

intel_dp_start_link_train(struct intel_dp *intel_dp)
{
	struct intel_connector *intel_connector = intel_dp->attached_connector;
	int tries;

	msleep(200);

	for (tries = 0; tries < 5; tries++) {
		if (!intel_dp_link_training_clock_recovery(intel_dp))
			continue;
		if (intel_dp_link_training_channel_equalization(intel_dp))
			break;
	}

...
Comment 25 Ricardo Ribalda 2018-04-26 19:44:58 UTC
Also, I am never disconecting the usb-C dongle, I am reconnecting the hdmi cable, or just going to the text console and then back to X
Comment 26 Clinton Taylor 2018-04-26 21:18:01 UTC
(In reply to Ricardo Ribalda from comment #25)
> Also, I am never disconecting the usb-C dongle, I am reconnecting the hdmi
> cable, or just going to the text console and then back to X

The USB-C device should be completely ready to go before HPD is asserted to the SOC. This is just a test to try and simplify a possible quirk for this particular device.

The location of the msleep() in comment 24 is now correct for this test. Thanks for helping with the debug.
Comment 27 Jani Saarinen 2018-04-27 06:14:37 UTC
+Shashank.
Comment 28 shashank.sharma@intel.com 2018-04-27 08:00:22 UTC
I dont think we have HDMI over USB-C (HDMI alt mode) yet, definitely not on KBL. This issue is not related to LSPCON.

- Shashank
Comment 29 Jani Nikula 2018-04-27 08:16:27 UTC
(In reply to shashank.sharma@intel.com from comment #28)
> I dont think we have HDMI over USB-C (HDMI alt mode) yet, definitely not on
> KBL. This issue is not related to LSPCON.

I think Clinton isn't talking about on-board LSPCON, but rather LSPCON embedded in a dongle or a cable, converting the USB Type-C DP Alt Mode to HDMI.

One of the questions is, do we need to treat these things as some special snowflakes?
Comment 30 Jani Nikula 2018-04-27 08:17:37 UTC
(In reply to Ricardo Ribalda from comment #19)
> One of the reasons to use this adapter was to be able to charge the notebook
> while using the adapter.

Does charging vs. not charging make a difference for link training?
Comment 31 Jani Nikula 2018-04-27 08:20:50 UTC
(In reply to Ricardo Ribalda from comment #18)
> This is the device: 
> http://item.mi.com/1163000011.html

And do you have other devices connected to the adapter? Do they make a difference?

IIRC the DP alt mode spec allows for 2 or 4 lane configurations, dynamically allocating 2 lanes to other USB needs.
Comment 32 Ricardo Ribalda 2018-04-27 08:23:47 UTC
(In reply to Jani Nikula from comment #31)
> (In reply to Ricardo Ribalda from comment #18)
> > This is the device: 
> > http://item.mi.com/1163000011.html
> 
> And do you have other devices connected to the adapter? Do they make a
> difference?
> 
> IIRC the DP alt mode spec allows for 2 or 4 lane configurations, dynamically
> allocating 2 lanes to other USB needs.

I have experienced wrong sync with all the combinations :(

-With and without usb
-With and without charging
Comment 33 shashank.sharma@intel.com 2018-04-27 08:35:18 UTC
(In reply to Jani Nikula from comment #29)
> (In reply to shashank.sharma@intel.com from comment #28)
> > I dont think we have HDMI over USB-C (HDMI alt mode) yet, definitely not on
> > KBL. This issue is not related to LSPCON.
> 
> I think Clinton isn't talking about on-board LSPCON, but rather LSPCON
> embedded in a dongle or a cable, converting the USB Type-C DP Alt Mode to
> HDMI.
> 
> One of the questions is, do we need to treat these things as some special
> snowflakes?

This depends on the cable/HW specs. MCA/Parade LSPCONs are motherboard down config, and they need proper probing and enabling, I hope this device knows what its doing :-). 

- Shashank
Comment 34 Jani Nikula 2018-04-27 08:50:23 UTC
From the logs:

[   51.556954] [drm:drm_dp_read_desc [drm_kms_helper]] DP branch: OUI 00-1c-f8 dev-ID 176GB0 HW-rev 1.0 SW-rev 7.38 quirks 0x0000

That's Parade OUI I think.
Comment 35 Jani Nikula 2018-04-27 08:54:21 UTC
(In reply to Jani Nikula from comment #34)
> From the logs:
> 
> [   51.556954] [drm:drm_dp_read_desc [drm_kms_helper]] DP branch: OUI
> 00-1c-f8 dev-ID 176GB0 HW-rev 1.0 SW-rev 7.38 quirks 0x0000
> 
> That's Parade OUI I think.

And dev-ID smells like https://www.paradetech.com/products/ps176/
Comment 36 shashank.sharma@intel.com 2018-04-27 13:15:04 UTC
(In reply to Jani Nikula from comment #34)
> From the logs:
> 
> [   51.556954] [drm:drm_dp_read_desc [drm_kms_helper]] DP branch: OUI
> 00-1c-f8 dev-ID 176GB0 HW-rev 1.0 SW-rev 7.38 quirks 0x0000
> 
> That's Parade OUI I think.

Yes this is Parade OUI, but Parade LSPCON device is PS175. 
From the comment here, this looks like PS176, so probably an external device or cable maybe. 

- Shashank
Comment 37 Manasi 2018-05-02 03:19:51 UTC
So then do we need a quirk specific to this OUI?
Comment 38 Ricardo Ribalda 2018-05-16 04:07:14 UTC
Quick update, I have seen a couple of times that it needed more than 5 retries :S. I have updated my patch to 10 retries.


Cheers
Comment 39 Ricardo Ribalda 2018-08-11 18:30:10 UTC
Since there has been no proper solution for three months. Can we just apply the proposed patch? 
If it breaks a standard we could just add a kernel parameter to allow it.

Cheers!
Comment 40 Lakshmi 2018-08-24 08:41:13 UTC
(In reply to Ricardo Ribalda from comment #39)
> Since there has been no proper solution for three months. Can we just apply
> the proposed patch? 
Have you tried with latest drm-tip which might help in this case?

We keep it open for now and consider to fix when we see similar issues in future.
Comment 41 Lakshmi 2018-09-10 06:27:25 UTC
Changing the priority to low. When we see similar issue again, we can consider to change the priority of this bug.
Comment 42 Ricardo Ribalda 2018-09-10 06:47:38 UTC
I have tried with a different screen (asus PB278) and a different notebook (Thinkpad T420) and I am getting the same results.

A simple google search shows other people with similar problem:

https://askubuntu.com/questions/581574/ubuntu-14-low-screen-resolution-on-intel-hd-display

I believe that people simply does not know where to ask for help and simply use their screens at lower resolutions. And not that many people know how to patch and rebuild their kernels (or do not want to compile it for over an hour every month).

I can make a patch that enables multi tries on intel_dp_start_link_train via debugfs, that api does not need to be maintained.
I will post to the forums if enabling that option fix their issue and then you can have more information for setting the bug priority.
Comment 43 Manasi 2019-09-16 22:46:53 UTC
Ricardo, does the patch with 5 retries of link training (clock recovery plus channel EQ) still work for you to fix this issue?
In that case, the patch will have to be modified to add a quirk for this specific dongle OUI to add this retry loop outside of the DP spec only in case of this dongle.

This dongle seems to not follow the DP spec and needs more retries for clock recovery.

Manasi
Comment 44 Sushma 2019-09-16 23:04:45 UTC
Created attachment 145386 [details]
dmesg logs for the issue where "Link Training failed at link rate = 540000"
Comment 45 Sushma 2019-09-16 23:23:46 UTC
I am able to reproduce the issue on WHL device - when two 4K external monitors are connected, one 4K monitor is trained as 2K sometimes. I am connecting monitors using (USB-C to DP) USB-C and (HDMI to HDMI) HDMI cables. Here are the error messages that I am seeing: 	
Line 2230: [  841.162692] [drm:intel_dp_start_link_train] Channel equalization failed 5 times
Line 2231: [  841.162846] [drm:intel_dp_start_link_train] [CONNECTOR:90:DP-2] Link Training failed at link rate = 540000, lane count = 4
I was able to reproduce the issue on drm tip but it is very hard to reproduce like one out of 50 times.
I tried the patch that is specified in the above comments and couldn't reproduce the bug. However, monitors take a longer time to train.
Attached the log in my previous comment for more details.
Comment 46 Sushma 2019-09-16 23:24:38 UTC
I am able to reproduce the issue on WHL device - when two 4K external monitors are connected, one 4K monitor is trained as 2K sometimes. I am connecting monitors using (USB-C to DP) USB-C and (HDMI to HDMI) HDMI cables. Here are the error messages that I am seeing: 	
Line 2230: [  841.162692] [drm:intel_dp_start_link_train] Channel equalization failed 5 times
Line 2231: [  841.162846] [drm:intel_dp_start_link_train] [CONNECTOR:90:DP-2] Link Training failed at link rate = 540000, lane count = 4
I was able to reproduce the issue on drm tip but it is very hard to reproduce like one out of 50 times.
I tried the patch that is specified in the above comments and couldn't reproduce the bug. However, monitors take a longer time to train.
Attached the log in my previous comment for more details.
Comment 47 Ricardo Ribalda 2019-09-17 11:28:13 UTC
The patch still works for me. 

Agree on the quirk, do you have a example of other OUI with quirk so I can use it as reference?
Comment 48 Sushma 2019-09-17 21:01:15 UTC
Created attachment 145405 [details]
dmesg logs for the issue where "Link Training failed at link rate = 540000"

Adding a full log for the issue when 4K monitor is being trained as 2K.
Comment 49 Sushma 2019-09-17 21:15:24 UTC
When I connect Dell monitor using USB-C to DP cable and Samsung monitor using HDMI cable to DUT, the issue is not reproducible. However, when I have Dell connected to DUT using HDMI and Samsung connected to DUT using USB-C to DP, Samsung monitor resolution changes to 2K instead of 4K. 
Another data point, using LG-27UD88-W monitor with USB-C to DP cable and Dell-P2715Q with HDMI cable, the issue is not reproducible.
Comment 50 Sushma 2019-09-17 23:13:52 UTC
Created attachment 145407 [details]
dmesg logs for the issue where "Link Training failed at link rate = 540000"
Comment 51 Jani Saarinen 2019-11-26 15:53:09 UTC
You are reporter of the issue currently having low priority. Do you still see issue. If so, please spesify clearly what is impact to you.
Comment 52 Ricardo Ribalda 2019-11-26 15:56:15 UTC
On the next week I can try with the same hardware again.
Comment 53 Ricardo Ribalda 2019-11-27 09:31:37 UTC
(In reply to Jani Saarinen from comment #51)
> You are reporter of the issue currently having low priority. Do you still
> see issue. If so, please spesify clearly what is impact to you.

I still see the issue when I use the USB-C adapter to HDMI. I need all the resolution (2560x1440), so I have ended up buying a USB-C adapter to mDP, which seems more stable.

I think that a version of the proposed patches should be merged
Comment 54 Martin Peres 2019-11-29 17:46:26 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/111.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.