Bug 90963

Summary: [DP] i915:No display with Display port [drm:intel_dp_start_link_train] *ERROR* too many full retries, give up
Product: DRI Reporter: Eong Chen <eong.chen>
Component: DRM/IntelAssignee: Dhinakaran Pandiyan <dhinakaran.pandiyan>
Status: CLOSED NOTOURBUG QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: medium CC: airlied, conselvan2, dhinakaran.pandiyan, ethan.hsieh, fabio.coatti, freedesktop, intel-gfx-bugs, lachlan.00, leho, mail, mail, mika.kahola, nmcveity, pepe_commerz, sassmann, tarkasteve, tjaalton, viktor, ville.syrjala
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: ReadyForDev
i915 platform: HSW, SKL i915 features: display/DP MST
Bug Depends on:    
Bug Blocks: 92599    
Attachments:
Description Flags
dmesg withe debug enabled, plug in dp after boot.
none
full dmesg with debug
none
Clear DP train set valid flag
none
Log link status during link training
none
full dmesg with Ander Conselvan's patch
none
Fallback to lower link rate on MST link training
none
full dmesg with <Fallback to lower link rate on MST link training> patch
none
Fallback to lower link rate on MST link training failure
none
dmesg log. (Find the right rate 270000 and set it in the code.)
none
dmesg with i915.enable_ips=0
none
dmesg with dp-cleanup patch
none
dmesg with dp-cleanup kernel
none
connecting display to keyboard-unit of HP Pro x2 612 G1
none
connecting display to docking-station, hung up on graphical login
none
dmesg-4.9-rc8-blankscreen.txt
none
dmesg-4.11.0-rc1+.txt
none
dmesg-4.12-rc4-drmtip.txt
none
Update connector status and keep aux powered up
none
dmesg-4.13-rc3
none
toggle-power-state
none
dmesg-4.13-drmdebug-29853-plus-diff.txt
none
dmesg for crash on resume with DP via USB-C dock
none
dmesg-4.14-rc3 none

Description Eong Chen 2015-06-13 03:13:05 UTC
Created attachment 116468 [details]
dmesg withe debug enabled, plug in dp after boot.

dmesg - kernel 4.0.5

* Laptop model: DELL E7440
* Linux distribution: Gentoo x86_64
* Graphics: VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)
* Arch: x86_64
* kernel: tested with all 4.0.x kernels, the problem appears in all versions.
* xorg-server: 1.16.3-r1
* xf86-video-intel: 2.99.917
* mesa: 10.4.2
* libdrm: 2.4.59

I use dp to connect to a external DELL U3415w(3440 x 1440 at 60Hz). No matter how I plug in the dp port, after boot or before boot, the monitor doesn't give any picture, the monitor says it will turn into power save mode. It used to work with 3.14 kernels.
And it works with my office's DELL U2713HM(2560x1440).
Tried the same cable and same monitor with a MacBook, everything works.
Tried to merge some fixes in the git but found no solution.
Comment 1 Jani Nikula 2015-06-15 06:30:06 UTC
Please provide dmesg with drm.debug=14 module parameter set all the way from early boot to the problem. Add log_buf_len=4M or similar if needed.
Comment 2 Eong Chen 2015-06-15 14:23:17 UTC
Created attachment 116512 [details]
full dmesg with debug
Comment 3 Eong Chen 2015-06-19 03:04:53 UTC
I can help with testing if there is a kernel patch.
Comment 4 Eong Chen 2015-06-23 08:42:51 UTC
(In reply to Jani Nikula from comment #1)
> Please provide dmesg with drm.debug=14 module parameter set all the way from
> early boot to the problem. Add log_buf_len=4M or similar if needed.

Is there any progress about this? I can help with debug and testing if you want.
I always build the kernel myself. I hate that I have to use the laptop screen when I'm working at home.
Comment 5 Viktor Ekmark 2015-08-14 16:18:49 UTC
I'm experiencing the same problem with a Dell XPS 13 (2015) and Dell U3415W.

Works in kernel 3.16 but not in later kernels.
Comment 6 Mika Kahola 2015-08-20 11:06:54 UTC
Created attachment 117807 [details] [review]
Clear DP train set valid flag

Please give it a go for this patch with the latest drm-intel-nightly kernel and check if this would be helpful.
Comment 7 Eong Chen 2015-09-27 16:27:42 UTC
(In reply to Mika Kahola from comment #6)
> Created attachment 117807 [details] [review] [review]
> Clear DP train set valid flag
> 
> Please give it a go for this patch with the latest drm-intel-nightly kernel
> and check if this would be helpful.

A quick test, it doesn't work on the 4.2.1 kernel.
Comment 8 info 2015-10-01 09:37:38 UTC
I had the same problems on this system:
Model: Dell E5440
GPU: Intel Corporation Haswell-ULT Integrated Graphics Controller [8086:0a16] (rev 09)
OS: Arch Linux 64bit
Kernel: 4.2.1

The machine is connected to an Iiyama XB2779QS-B1 screen using the Display Port on a Dell docking station.

Downgrading the linux and linux-headers packages to 4.1.6 solved the problem for me.

Hope that helps anyone.
Comment 9 Ander Conselvan de Oliveira 2015-10-01 14:49:48 UTC
Created attachment 118562 [details] [review]
Log link status during link training

Could you run a kernel with the attached patch and attach a new full dmesg? It should provide more information about what's going on.
Comment 10 Ander Conselvan de Oliveira 2015-10-01 14:51:26 UTC
(In reply to info from comment #8)
> Kernel: 4.2.1

... 

> Downgrading the linux and linux-headers packages to 4.1.6 solved the problem
> for me.

A bisect between those two kernel versions would be really helpful.
Comment 11 Eong Chen 2015-10-19 15:25:18 UTC
Created attachment 118986 [details]
full dmesg with Ander Conselvan's patch

Tried the patch on 4.2.3.
Comment 12 Ander Conselvan de Oliveira 2015-10-20 12:36:54 UTC
(In reply to Eong Chen from comment #11)
> Created attachment 118986 [details]
> full dmesg with Ander Conselvan's patch
> 
> Tried the patch on 4.2.3.

Unfortunately that didn't help. It seems we reach the maximum voltage level and the monitor still doesn't reach clock recovery. 

I see in the log that DP MST is being used and according to Viktor this started failing with 3.17, which introduce MST support, so this is very likely related to that.

It seems the race between hpd_pulse and modeset that Ville tried to address [1] is triggered, except it doesn't look like there is two concurrent calls to intel_dp_start_link_train(). 

    [1] https://patchwork.freedesktop.org/patch/58114/
Comment 13 Eong Chen 2015-10-22 03:29:53 UTC
(In reply to Ander Conselvan de Oliveira from comment #12)
> (In reply to Eong Chen from comment #11)
> > Created attachment 118986 [details]
> > full dmesg with Ander Conselvan's patch
> > 
> > Tried the patch on 4.2.3.
> 
> Unfortunately that didn't help. It seems we reach the maximum voltage level
> and the monitor still doesn't reach clock recovery. 
> 
> I see in the log that DP MST is being used and according to Viktor this
> started failing with 3.17, which introduce MST support, so this is very
> likely related to that.
> 
> It seems the race between hpd_pulse and modeset that Ville tried to address
> [1] is triggered, except it doesn't look like there is two concurrent calls
> to intel_dp_start_link_train(). 
> 
>     [1] https://patchwork.freedesktop.org/patch/58114/

I tried this patch on 4.2.3, but it leads to crashes with external monitor.
Comment 14 Ander Conselvan de Oliveira 2015-10-22 08:25:43 UTC
Created attachment 119060 [details] [review]
Fallback to lower link rate on MST link training

Could you run a kernel with this patch and attach dmesg with debug from boot?

One difference between the link training in SST vs MST mode is that MST uses the highest rate, while in SST we try the lowest rate that supports the mode. The DP standard requires the source device to fallback to a lower rate if link training fails, but our code doesn't do that.
Comment 15 Eong Chen 2015-10-23 06:22:43 UTC
Created attachment 119126 [details]
full dmesg with <Fallback to lower link rate on MST link training> patch

It shows some crash information. And it doesn't work with my monitor. Sorry I didn't read the patch this time. :)
Comment 16 Eong Chen 2015-10-23 06:23:17 UTC
(In reply to Ander Conselvan de Oliveira from comment #14)
> Created attachment 119060 [details] [review] [review]
> Fallback to lower link rate on MST link training
> 
> Could you run a kernel with this patch and attach dmesg with debug from boot?
> 
> One difference between the link training in SST vs MST mode is that MST uses
> the highest rate, while in SST we try the lowest rate that supports the
> mode. The DP standard requires the source device to fallback to a lower rate
> if link training fails, but our code doesn't do that.

Tested it, please check the latest attached file.
Comment 17 Ander Conselvan de Oliveira 2015-10-26 09:44:54 UTC
Created attachment 119195 [details] [review]
Fallback to lower link rate on MST link training failure

(In reply to Eong Chen from comment #15)
> Created attachment 119126 [details]
> full dmesg with <Fallback to lower link rate on MST link training> patch
> 
> It shows some crash information. And it doesn't work with my monitor. Sorry
> I didn't read the patch this time. :)

I didn't expect this to fix the problem completely, but to give a clue if the problem is related to our non-compliant link training implementation. Unfortunately, there was an error in that patch. Here's a corrected version.

If this produce a line like the following in the log, you can try the patch below, replacing the number with the one you see in the log, and check if things work.

[drm:intel_mst_pre_enable_dp] clock recovery succeeded with rate 270000


diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index eba5bf9..63fe050 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -57,6 +57,7 @@ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
        lane_count = drm_dp_max_lane_count(intel_dp->dpcd);
 
        rate = intel_dp_max_link_rate(intel_dp);
+       rate = 270000;
 
        if (intel_dp->num_sink_rates) {
                intel_dp->link_bw = 0;
Comment 18 Eong Chen 2015-10-26 12:22:29 UTC
Thank you for the explain.
I tried it and it shows me the right rate should be 270000, but I still can't use the monitor.
I'll upload another dmesg file.
Comment 19 Eong Chen 2015-10-26 12:24:38 UTC
Created attachment 119199 [details]
dmesg log. (Find the right rate 270000 and set it in the code.)

dmesg file with rate 270000. Both my laptop and external monitors show black screen.
Comment 20 Ander Conselvan de Oliveira 2015-10-26 12:39:48 UTC
(In reply to Eong Chen from comment #19)
> Created attachment 119199 [details]
> dmesg log. (Find the right rate 270000 and set it in the code.)
> 
> dmesg file with rate 270000. Both my laptop and external monitors show black
> screen.

Just to confirm, with the attached patch but without the 'rate = 270000' line, did you see first a failure to train at 540000 and then success with 270000?
Comment 21 Eong Chen 2015-10-26 14:58:47 UTC
No, I didn't see (In reply to Ander Conselvan de Oliveira from comment #20)
> (In reply to Eong Chen from comment #19)
> > Created attachment 119199 [details]
> > dmesg log. (Find the right rate 270000 and set it in the code.)
> > 
> > dmesg file with rate 270000. Both my laptop and external monitors show black
> > screen.
> 
> Just to confirm, with the attached patch but without the 'rate = 270000'
> line, did you see first a failure to train at 540000 and then success with
> 270000?

No, I only see three or maybe four times 270000.
Comment 22 Ander Conselvan de Oliveira 2015-10-27 09:38:22 UTC
(In reply to Eong Chen from comment #19)
> Created attachment 119199 [details]
> dmesg log. (Find the right rate 270000 and set it in the code.)
> 
> dmesg file with rate 270000. Both my laptop and external monitors show black
> screen.

Can you run the same test again but with i915.enable_ips=0 in the kernel command line?
Comment 23 Alex GERARD 2015-10-30 09:49:00 UTC
I have the same problem using:
Software :
- Linux 4.2.0 (From updated Debian stretch/testing) with i915.preliminary_support=1
- xf86-video-intel: 2.99.917 OR Git from 2015.10.19 from debian experimental)
- Dell U3415W (again !) via DP 1.2 (no MST *usage*)

Hardware :
- Asrock H170 Pro4
- Intel *Skylake* i7 6700

The whole system freezes after giving the exact same error as reported here.

I have tried Linux 4.3rc7 too without success (I did not managed to catch the entire kernel output debug via NetConsole this time before freezing).

Hope it will help.
Comment 24 Ander Conselvan de Oliveira 2015-10-30 15:28:59 UTC
(In reply to Alex GERARD from comment #23)
> I have the same problem using:
> Software :
> - Linux 4.2.0 (From updated Debian stretch/testing) with
> i915.preliminary_support=1
> - xf86-video-intel: 2.99.917 OR Git from 2015.10.19 from debian experimental)
> - Dell U3415W (again !) via DP 1.2 (no MST *usage*)
> 
> Hardware :
> - Asrock H170 Pro4
> - Intel *Skylake* i7 6700
> 
> The whole system freezes after giving the exact same error as reported here.
> 
> I have tried Linux 4.3rc7 too without success (I did not managed to catch
> the entire kernel output debug via NetConsole this time before freezing).
> 
> Hope it will help.

The freeze you have is probably unrelated to the link training failure, so please open a separate bug report. The following patch might help:

http://patchwork.freedesktop.org/patch/57419/
Comment 25 Ander Conselvan de Oliveira 2015-10-30 15:31:35 UTC
(In reply to Ander Conselvan de Oliveira from comment #22)
> (In reply to Eong Chen from comment #19)
> > Created attachment 119199 [details]
> > dmesg log. (Find the right rate 270000 and set it in the code.)
> > 
> > dmesg file with rate 270000. Both my laptop and external monitors show black
> > screen.
> 
> Can you run the same test again but with i915.enable_ips=0 in the kernel
> command line?

Also, the following branch has some changes to make link training closer to spec compliant. Could you please give it a try?

https://github.com/anderco/linux/tree/dp-cleanup
Comment 26 Eong Chen 2015-11-09 09:29:44 UTC
Created attachment 119498 [details]
dmesg with i915.enable_ips=0
Comment 27 Eong Chen 2015-11-09 09:30:10 UTC
Created attachment 119499 [details]
dmesg with dp-cleanup patch
Comment 28 Ander Conselvan de Oliveira 2015-11-09 11:01:35 UTC
(In reply to Eong Chen from comment #26)
> Created attachment 119498 [details]
> dmesg with i915.enable_ips=0

So this still gives you a black screen?
Comment 29 Ander Conselvan de Oliveira 2015-11-09 11:02:12 UTC
(In reply to Eong Chen from comment #27)
> Created attachment 119499 [details]
> dmesg with dp-cleanup patch

I'm not sure which patch you tried, but the intent was to test the whole branch.
Comment 30 Eong Chen 2015-11-10 02:53:35 UTC
(In reply to Ander Conselvan de Oliveira from comment #28)
> (In reply to Eong Chen from comment #26)
> > Created attachment 119498 [details]
> > dmesg with i915.enable_ips=0
> 
> So this still gives you a black screen?

Yes, still black screen.
Comment 31 Eong Chen 2015-11-10 02:53:57 UTC
(In reply to Ander Conselvan de Oliveira from comment #29)
> (In reply to Eong Chen from comment #27)
> > Created attachment 119499 [details]
> > dmesg with dp-cleanup patch
> 
> I'm not sure which patch you tried, but the intent was to test the whole
> branch.

Sorry, my mistake. It's the whole branch.
Comment 32 Ander Conselvan de Oliveira 2015-11-10 09:34:03 UTC
(In reply to Eong Chen from comment #31)
> (In reply to Ander Conselvan de Oliveira from comment #29)
> > (In reply to Eong Chen from comment #27)
> > > Created attachment 119499 [details]
> > > dmesg with dp-cleanup patch
> > 
> > I'm not sure which patch you tried, but the intent was to test the whole
> > branch.
> 
> Sorry, my mistake. It's the whole branch.

I'm afraid you uploaded the wrong log or compiled the wrong branch. That branch should produce a kernel with version 4.3.0-rc7+, but the log says 4.2.0-rc7+.
Comment 33 Eong Chen 2015-11-11 16:53:14 UTC
Created attachment 119563 [details]
dmesg with dp-cleanup kernel

dmesg with the right dp-cleanup kernel.
Still black screen.
Comment 34 Eong Chen 2015-11-15 02:14:20 UTC
(In reply to Ander Conselvan de Oliveira from comment #32)
> (In reply to Eong Chen from comment #31)
> > (In reply to Ander Conselvan de Oliveira from comment #29)
> > > (In reply to Eong Chen from comment #27)
> > > > Created attachment 119499 [details]
> > > > dmesg with dp-cleanup patch
> > > 
> > > I'm not sure which patch you tried, but the intent was to test the whole
> > > branch.
> > 
> > Sorry, my mistake. It's the whole branch.
> 
> I'm afraid you uploaded the wrong log or compiled the wrong branch. That
> branch should produce a kernel with version 4.3.0-rc7+, but the log says
> 4.2.0-rc7+.

I used the wrong branch and uploaded a new dmesg file with right branch.
Comment 35 Jim Bride 2015-11-16 17:06:11 UTC
This is the same failure mode that I've been fighting in the course of looking at https://bugs.freedesktop.org/show_bug.cgi?id=91791 up to and including things working at 270000.
Comment 36 Eong Chen 2015-11-19 15:32:01 UTC
(In reply to Jim Bride from comment #35)
> This is the same failure mode that I've been fighting in the course of
> looking at https://bugs.freedesktop.org/show_bug.cgi?id=91791 up to and
> including things working at 270000.

Should I try the patch in that case? Which kernel version should I use if I want to test it?
Comment 37 Joonas Lahtinen 2015-11-26 13:28:07 UTC
Can not mark bug duplicate of multiple bugs, added this bug as blocking the module reloading bug, the test is having lots of errors logged because the link training spits error messages before and after the module reload. So it this is also confirmed on SKL hardware.
Comment 38 Joonas Lahtinen 2015-11-26 13:29:45 UTC
(In reply to Eong Chen from comment #33)
> Created attachment 119563 [details]
> dmesg with dp-cleanup kernel
> 
> dmesg with the right dp-cleanup kernel.
> Still black screen.

Please, do not compress the plain text attachments.
Comment 39 Eong Chen 2015-11-26 15:52:39 UTC
(In reply to Joonas Lahtinen from comment #38)
> (In reply to Eong Chen from comment #33)
> > Created attachment 119563 [details]
> > dmesg with dp-cleanup kernel
> > 
> > dmesg with the right dp-cleanup kernel.
> > Still black screen.
> 
> Please, do not compress the plain text attachments.

It's too big and I can't upload it if I don't compress it.
Comment 40 Viktor Ekmark 2015-12-05 23:36:47 UTC
I'm not sure if this is known here but there is a workaround for U3415W by disabling DP 1.2 in the monitor:
http://en.community.dell.com/support-forums/peripherals/f/3529/p/19658372/20837315#20837315
Comment 41 Eong Chen 2015-12-06 08:20:27 UTC
(In reply to Viktor Ekmark from comment #40)
> I'm not sure if this is known here but there is a workaround for U3415W by
> disabling DP 1.2 in the monitor:
> http://en.community.dell.com/support-forums/peripherals/f/3529/p/19658372/
> 20837315#20837315

It works fine! So the problem is on the DP1.2?
Comment 42 Jani Nikula 2016-01-13 13:10:44 UTC
(In reply to Eong Chen from comment #41)
> (In reply to Viktor Ekmark from comment #40)
> > I'm not sure if this is known here but there is a workaround for U3415W by
> > disabling DP 1.2 in the monitor:
> > http://en.community.dell.com/support-forums/peripherals/f/3529/p/19658372/
> > 20837315#20837315
> 
> It works fine! So the problem is on the DP1.2?

The link didn't open for me, but but disabling DP1.2 disables DP MST, which may be where the problem lies.
Comment 43 Jani Nikula 2016-01-13 13:11:33 UTC
Likely dupe at https://bugzilla.kernel.org/show_bug.cgi?id=92371, closing that one, let's track everything at freedesktop.org.
Comment 44 Henning 2016-01-26 22:12:04 UTC
Same error here on Hewlett-Packard HP Pro x2 612 G1 Tablet with Linux version 4.5.0-0.rc1.git0.1.fc24.x86_64.
Please see attachment 1 [details] [review] for connecting the display directly to the keyboard-unit of the tablet, attachment 2 [details] [review] for connenting it to the docking-station.
Comment 45 Henning 2016-01-26 22:13:28 UTC
Created attachment 121316 [details]
connecting display to keyboard-unit of HP Pro x2 612 G1
Comment 46 Henning 2016-01-26 22:14:39 UTC
Created attachment 121317 [details]
connecting display to docking-station, hung up on graphical login
Comment 47 Steve Harms 2016-01-28 19:37:26 UTC
I can confirm this issue using all kernels 3.16 and newer, with a Lenovo W541 and Docking station over DisplayPort.

This issue still occurs even if I manually set my monitor to DP 1.1 (HP 34C).  
Using kernel 3.13 solves this.
Comment 48 Steve Harms 2016-01-30 16:00:52 UTC
I confirmed with the Dell XPS 13 9333 and the Lenovo W541 using a Mini-DP -> DP adapter that this display does work using 3440x1440@60hz with xorg-video-intel.

The issue appears to be isolated to DisplayPort docking stations (when I dock my Lenovo W541 this issue arises - the display can be driven at 2560x1440 however)
Comment 49 Ander Conselvan de Oliveira 2016-02-03 13:42:21 UTC
(In reply to Henning from comment #44)
> Same error here on Hewlett-Packard HP Pro x2 612 G1 Tablet with Linux
> version 4.5.0-0.rc1.git0.1.fc24.x86_64.
> Please see attachment 1 [details] [review] [review] for connecting the display
> directly to the keyboard-unit of the tablet, attachment 2 [details] [review] [review]
> for connenting it to the docking-station.

Your log doesn't have link training failures or present the message in the bug description, so it is not the same error. Please open a separate bug report.
Comment 50 Ander Conselvan de Oliveira 2016-02-03 13:49:58 UTC
(In reply to Steve Harms from comment #47)
> I can confirm this issue using all kernels 3.16 and newer, with a Lenovo
> W541 and Docking station over DisplayPort.
> 
> This issue still occurs even if I manually set my monitor to DP 1.1 (HP
> 34C).  
> Using kernel 3.13 solves this.

Kernel 3.16 didn't have MST support, so that would seem unrelated.

(In reply to Steve Harms from comment #48)
> I confirmed with the Dell XPS 13 9333 and the Lenovo W541 using a Mini-DP ->
> DP adapter that this display does work using 3440x1440@60hz with
> xorg-video-intel.
> 
> The issue appears to be isolated to DisplayPort docking stations (when I
> dock my Lenovo W541 this issue arises - the display can be driven at
> 2560x1440 however)

Please open a new bug report and attach dmesg all the way from boot with drm.debug=14 in the kernel command line.
Comment 51 Lachlan 2016-02-04 00:19:33 UTC
This is definitely not 3.16 related.

I've had to keep the jessie 3.16 kernel installed while on stretch because this was the last kernel that worked with my dock/displayport.
Comment 52 sassmann 2016-02-25 10:40:06 UTC
Seeing the same issue with my Lenovo w541 and Dell u3415w. With DP 1.2 disabled I'm now able to run the monitor at 3440x1440 with kernel 4.5-rc4. With kernel 4.4 the monitor still goes to powersave.

The weird part is that during boot up the screen still goes to powersave and switches back to the notebooks internal display. The u3415w only turns back on when logging into GNOME.

Checking xrandr when the system has booted shows.

eDP1 connected (normal left inverted right x axis y axis)
   1920x1080     60.00 +
DP2 disconnected (normal left inverted right x axis y axis)
DP2-1 connected primary 3440x1440+0+0 (normal left inverted right x axis y axis) 798mm x 335mm
   3440x1440     59.97*+  49.99

I then tried to force the external screen via kernel cmdline parameters
video=eDP1:d video=DP2-1:D without success.
Comment 53 Marco Trevisan (TreviƱo) 2016-04-20 22:58:39 UTC
I'm getting something similar on my T460p (skylake) when docking/undocking it few times:

[78290.365783] thinkpad_acpi: docked into hotplug port replicator
[78291.075896] [drm:intel_dp_link_training_clock_recovery [i915_bpo]] ERROR too many voltage retries, give up [78291.092347] [drm:intel_wait_ddi_buf_idle [i915_bpo]] ERROR Timeout waiting for DDI BUF D idle bit
Comment 54 Fabio Coatti 2016-05-05 13:23:20 UTC
Hi all, 
I'm getting more or less the same issues.
environment:
linux 4.5.3
xf86-video-intel 2.99.917_p20160423 (and several earlier versions)
Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz
mesa 11.2.1

(gentoo based, vanilla kernel)

External monitor: Dell U2415

short description:
I'm using KDE in multi-monitor setup; if I switch to console with Alt+F2, i start to see messages like this:

[gio mag  5 15:05:29 2016] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* uncleared fifo underrun on pipe A
[gio mag  5 15:05:29 2016] [drm:ironlake_irq_handler] *ERROR* CPU pipe A FIFO underrun
[gio mag  5 15:05:29 2016] [drm:intel_set_pch_fifo_underrun_reporting] *ERROR* uncleared pch fifo underrun on pch transcoder A
[gio mag  5 15:05:29 2016] [drm:cpt_irq_handler] *ERROR* PCH transcoder A FIFO underrun
[gio mag  5 15:05:29 2016] [drm:intel_check_cpu_fifo_underruns] *ERROR* fifo underrun on pipe B
[gio mag  5 15:05:29 2016] [drm:intel_check_pch_fifo_underruns] *ERROR* pch fifo underrun on pch transcoder B
[gio mag  5 15:05:29 2016] [drm:intel_check_cpu_fifo_underruns] *ERROR* fifo underrun on pipe B



After that, swotching back to graphic mode causes external monitor to enter power save mode; plugging out the monitor causes this message:
[gio mag  5 15:06:15 2016] [drm:intel_set_pch_fifo_underrun_reporting] *ERROR* uncleared pch fifo underrun on pch transcoder B

Reconnecting it does not help, while swtiching it off causes both screens to go blank; switching back it on makes the laptop screen to come back to life, the external one still in powersave and the dmesg fills up with a lot of the following messages:

[gio mag  5 15:20:54 2016] [drm:intel_dp_start_link_train] *ERROR* failed to train DP, aborting
[gio mag  5 15:20:54 2016] [drm:intel_dp_link_training_clock_recovery] *ERROR* too many full retries, give up
[gio mag  5 15:20:54 2016] [drm:intel_dp_link_training_clock_recovery] *ERROR* too many full retries, give up
[gio mag  5 15:20:54 2016] [drm:intel_dp_link_training_clock_recovery] *ERROR* too many full retries, give up
[gio mag  5 15:20:54 2016] [drm:intel_dp_link_training_clock_recovery] *ERROR* too many full retries, give up
[gio mag  5 15:20:54 2016] [drm:intel_dp_link_training_clock_recovery] *ERROR* too many full retries, give up
[gio mag  5 15:20:54 2016] [drm:intel_dp_link_training_clock_recovery] *ERROR* too many full retries, give up
[gio mag  5 15:20:54 2016] [drm:intel_dp_link_training_clock_recovery] *ERROR* too many full retries, give up
[gio mag  5 15:20:54 2016] [drm:intel_dp_start_link_train] *ERROR* failed to train DP, aborting


The only way to recover is to turn off the laptop and start it again.
Comment 55 Jani Saarinen 2016-12-09 11:19:36 UTC
Is this issue still seen with latest kernel?
Comment 56 sassmann 2016-12-09 12:23:54 UTC
Created attachment 128390 [details]
dmesg-4.9-rc8-blankscreen.txt

Just tested with 4.9-rc8. I've switched the Dell U3415W to DP1.2 and after booting the monitor goes into standby mode. Attaching log with debug enabled.
Comment 57 Jani Saarinen 2017-03-08 08:19:02 UTC
Now with latest work done on atomic and watermark side, can you still see the issue with drm-tip?
Comment 58 sassmann 2017-03-08 09:19:33 UTC
Created attachment 130118 [details]
dmesg-4.11.0-rc1+.txt

Dell U3415W still goes to standby with DP1.2 enabled on latest drm-tip.
See dmesg.
Comment 59 pepe_commerz 2017-05-11 08:12:39 UTC
Same or related issue on Lenovo T460p with Fedora 25. Cannot reliably reproduce but happens every couple days in interaction with external DP, docking station, and/or s2ram.

Concrete exsample: Waking from s2ram, plug in mini-DP, screen switches but I get a small black rectangle in the middle of the external screen. Mouse cursor moving but buttons/key presses have no effects. I can typically switch to console and restart lightdm to fix the issue. dmesg|grep drm\|i915 looks like this:

[    1.716275] [drm] Initialized
[    1.965209] i915 0000:00:02.0: enabling device (0006 -> 0007)
[    1.965771] [drm] Memory usable by graphics device = 4096M
[    1.965773] fb: switching to inteldrmfb from EFI VGA
[    1.965856] [drm] Replacing VGA console driver
[    1.971139] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.971140] [drm] Driver supports precise vblank timestamp query.
[    1.979746] [drm] Finished loading i915/skl_dmc_ver1_26.bin (v1.26)
[    1.981505] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
[    1.998080] [drm] GuC firmware load skipped
[    2.000761] [drm] Initialized i915 1.6.0 20161121 for 0000:00:02.0 on minor 0
[    2.069638] fbcon: inteldrmfb (fb0) is primary device
[    3.304450] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    3.793494] [drm] RC6 on
[    4.309945] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[50844.645191] [drm] GuC firmware load skipped
[50847.128957] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[50847.402344] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization
[50847.670600] [drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi
[50847.966196] [drm] RC6 on

In other cases I get a complete freeze with kernel talking about NMI and 30sec reboot countdown. These don't seem to be recorded in the systemd journal.

System info:
* Platform: Lenovo T460p, Fedora 25
* Graphics: Intel Corporation HD Graphics 530 (rev 06)
* kernel: 4.10.13-200.fc25.x86_64
* libdrm: 2.4.79
* xorg-server 1.19.3
* xorg-x11-drv-intel 2.99.917


Tell me if I can do anything to help debugging this.
Comment 60 Dhinakaran Pandiyan 2017-06-13 23:50:58 UTC
The number of different reports here have made this bug completely unwieldy, I am not really sure which one I should be looking at. The original report was for a specific monitor (Dell U3415W) and DP-MST.

I request new bugs to be filed for other issues. Please try the latest drm-tip and attach dmesg with drm.debug=14 module parameter.
Comment 61 Dhinakaran Pandiyan 2017-06-14 00:41:39 UTC
(In reply to sassmann from comment #58)
> Created attachment 130118 [details]
> dmesg-4.11.0-rc1+.txt
> 
> Dell U3415W still goes to standby with DP1.2 enabled on latest drm-tip.
> See dmesg.

[   26.338082] [drm:drm_dp_mst_wait_tx_reply [drm_kms_helper]] timedout msg send ffff8ad1be7ac400 2 1


This looks suspicious.
Comment 62 Dhinakaran Pandiyan 2017-06-14 00:52:08 UTC
(In reply to sassmann from comment #56)
> Created attachment 128390 [details]
> dmesg-4.9-rc8-blankscreen.txt
> 
> Just tested with 4.9-rc8. I've switched the Dell U3415W to DP1.2 and after
> booting the monitor goes into standby mode. Attaching log with debug enabled.

As it has been pointed out somewhere in this bug, U3415w does seem to have have issues with the DP1.2 setting.

http://www.dell.com/support/article/us/en/04/SLN295763/how-to-use-and-troubleshoot-the-dell-u3415w-ultrasharp-34-curved-monitor?lang=EN

"Blanking video on a single display w/DisplayPort Connection

Some users have noted that using a single U3415W curved display may not display video out of the box. It has been determined that these systems will display video when the DisplayPort input of the U3415W is set to DisplayPort 1.1."

One experiment you can try is to connect another monitor to the DP-Out port on U3415w and enable DP 1.2.
Comment 63 sassmann 2017-06-14 07:14:15 UTC
Hi Dhinakaran,
thanks for looking into this. Let me assure you that DP1.2 works flawlessly on the U3415W when connecting a Windows machine. So my guess is the i915 takes a wrong turn somewhere with this setting. Switched to DP1.1 for now as a workaround.
Comment 64 Dhinakaran Pandiyan 2017-06-14 17:10:26 UTC
(In reply to sassmann from comment #63)
> Hi Dhinakaran,
> thanks for looking into this. Let me assure you that DP1.2 works flawlessly
> on the U3415W when connecting a Windows machine. So my guess is the i915
> takes a wrong turn somewhere with this setting. Switched to DP1.1 for now as
> a workaround.


Interesting. What's your display setup like? Do you have a dock in between the U3415W and the laptop? 

You mention in one of your earlier comments "u3415w only turns back on when logging into GNOME." Is that true with the latest drm-tip too? i.e., the U3415W is in power save mode starting from boot to logging into the desktop environment? And it works fine after logging in?


-DK
Comment 65 sassmann 2017-06-15 08:45:53 UTC
My setup is the following:
Lenovo w541 docked in ThinkPad Dock with lid close (permanently).
U3415W display connected via DP as the only display.

Following is the current behaviour with latest drp-tip 4.12-rc4.
- Display turns on during BIOS init (grub is fine)
- When booting the kernel a modeswitch happens, this seems to be fine now as well. Display comes back to display boot messages in a different resolution.
- Another switch happens when gdm is starting. This time the display goes into standby mode. Then I can switch to VT2 and display comes back to life showing the login prompt. If I then switch back to VT1 where gdm is running, magically the gdm login screen shows up and I can login.
- However then locking the screen causes the display to go back into standby and there's probably a 50% chance of the display never to wake up again. At this point switching to VT2 also no longer brings the display back.

Attaching another dmesg from exactly that scenario.
Comment 66 sassmann 2017-06-15 08:46:44 UTC
Created attachment 131971 [details]
dmesg-4.12-rc4-drmtip.txt
Comment 67 Dhinakaran Pandiyan 2017-06-26 19:51:08 UTC
Thanks for the logs. We aren't handling two level (dock + monitor in DP1.2 MST mode) branch devices properly. There's a problem with MST hotplug handling too.
Comment 68 Ricardo Madrigal 2017-06-30 20:54:34 UTC
Hello

I just tried to reproduce the problem with following configuration:

KBL NUC, using MST mini-DP to DP with external monitor DP -DP (acer) 3840 x 2160.

Attaching my configuration used to test

======================================
        Graphic stack
======================================
 
======================================
             Software
======================================
kernel version              : 4.12.0-rc3-drm-tip-ww22-commit-187376e+
architecture                : x86_64
os version                  : Ubuntu 17.04
os codename                 : zesty
kernel driver               : i915
bios revision               : 4.6
bios release date           : 03/02/2017
 
======================================
        Graphic drivers
======================================
mesa                        : 17.0.3
modesetting                 : modesetting_drv.so
xorg-xserver                : 1.19.3
libdrm                      : 2.4.81
cairo                       : 1.14.8
xserver                     : X.Org X Server 1.19.99.1
intel-gpu-tools (tag)       : intel-gpu-tools-1.18-211-g00ce341b
intel-gpu-tools (commit)    : 00ce341b
 
======================================
             Hardware
======================================
platform                   : HSW-Nuc
motherboard id             : D54250WYK
form factor                : Desktop
cpu family                 : Core i5
cpu family id              : 6
cpu information            : Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz
gpu card                   : Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
memory ram                 : 3.79 GB
max memory ram             : 16 GB
display resolution         : 1600x900
cpu thread                 : 4
cpu core                   : 2
cpu model                  : 69
cpu stepping               : 1
socket                     : Socket LGA1150
signature                  : Type 0, Family 6, Model 69, Stepping 1
hard drive                 : 223GiB (240GB)
current cd clock frequency : 450000 kHz
maximum cd clock frequency : 450000 kHz
displays connected         : DP-1
======================================================

And yes this is still happening.
Comment 69 rtvernam 2017-07-31 12:42:35 UTC
I would be thrilled to see this resolved and will test any patches that become available.

Is this the best place to monitor for patches?

I am on Dell Precision 7510 (i7-6820HQ) w/ a dell docking station containing two DP ports.
Monitors are old HPZR2440w, each plugged directly into DP on docking station.

I have the i915 turned off in the BIOS and have been using the built-in radeon card without any issues (related to dual screens on the dock DP).
When I turn the i915 on in the BIOS I get only one screen; the other screen says no input signal.
Comment 70 Dhinakaran Pandiyan 2017-08-02 02:41:12 UTC
Created attachment 133188 [details] [review]
Update connector status and keep aux powered up
Comment 71 Dhinakaran Pandiyan 2017-08-02 02:42:14 UTC
(In reply to sassmann from comment #65)
> My setup is the following:
> Lenovo w541 docked in ThinkPad Dock with lid close (permanently).
> U3415W display connected via DP as the only display.
> 
> Following is the current behaviour with latest drp-tip 4.12-rc4.
> - Display turns on during BIOS init (grub is fine)
> - When booting the kernel a modeswitch happens, this seems to be fine now as
> well. Display comes back to display boot messages in a different resolution.
> - Another switch happens when gdm is starting. This time the display goes
> into standby mode. Then I can switch to VT2 and display comes back to life
> showing the login prompt. If I then switch back to VT1 where gdm is running,
> magically the gdm login screen shows up and I can login.
> - However then locking the screen causes the display to go back into standby
> and there's probably a 50% chance of the display never to wake up again. At
> this point switching to VT2 also no longer brings the display back.
> 
> Attaching another dmesg from exactly that scenario.


Can you please try the patch -"Update connector status and keep aux powered up" I've attached?
Comment 72 Dhinakaran Pandiyan 2017-08-02 02:43:14 UTC
(In reply to sassmann from comment #65)
> My setup is the following:
> Lenovo w541 docked in ThinkPad Dock with lid close (permanently).
> U3415W display connected via DP as the only display.
> 
> Following is the current behaviour with latest drp-tip 4.12-rc4.
> - Display turns on during BIOS init (grub is fine)
> - When booting the kernel a modeswitch happens, this seems to be fine now as
> well. Display comes back to display boot messages in a different resolution.
> - Another switch happens when gdm is starting. This time the display goes
> into standby mode. Then I can switch to VT2 and display comes back to life
> showing the login prompt. If I then switch back to VT1 where gdm is running,
> magically the gdm login screen shows up and I can login.
> - However then locking the screen causes the display to go back into standby
> and there's probably a 50% chance of the display never to wake up again. At
> this point switching to VT2 also no longer brings the display back.
> 
> Attaching another dmesg from exactly that scenario.


Can you please try the patch -"Update connector status and keep aux powered up" I've attached?
Comment 73 sassmann 2017-08-02 08:27:51 UTC
(In reply to Dhinakaran Pandiyan from comment #72)
> Can you please try the patch -"Update connector status and keep aux powered
> up" I've attached?

Applied patch on drm-tip 4.13.0-rc3+.
With your patch things look a lot better now. Monitor no longer turns off when gdm start. Switching VTs looks good as well. Locking the screen turns off the monitor and correctly restores at wakeup.

The only thing still broken seems to be suspend/resume. On resume the display still does not wake-up and stays in standby mode.
I'll keep my monitor in DP1.2 mode now for further testing.
Comment 74 sassmann 2017-08-02 08:29:42 UTC
Created attachment 133191 [details]
dmesg-4.13-rc3

dmesg taken after suspend/resume. Monitor did not wake up resume.
Comment 75 Dhinakaran Pandiyan 2017-08-15 00:43:05 UTC
(In reply to sassmann from comment #74)
> Created attachment 133191 [details]
> dmesg-4.13-rc3
> 
> dmesg taken after suspend/resume. Monitor did not wake up resume.

Thanks for the logs, does the suspend/resume issue go away if you don't force the command line mode and apply the patch?
Comment 76 sassmann 2017-08-17 06:44:32 UTC
Unfortunately, even after removing the force parameters, the monitor doesn't come back to life after resume.
Comment 77 sassmann 2017-09-04 07:56:12 UTC
Dhinakaran,
is your patch going into 4.14?
Comment 78 Dhinakaran Pandiyan 2017-09-06 01:43:12 UTC
Unfortunately one of the two patches never got a Reviewed-By. Anyway, I posted a different solution today - https://patchwork.freedesktop.org/series/29853/. I'd appreciate if you can check if this solves any of the issues you are seeing.
Comment 79 sassmann 2017-09-06 07:08:37 UTC
I've applied both patches and the results are mixed.
First of, switching between monitor inputs still works. The screen comes back to life after switching the input back. However during boot-up, when the kernel messages scroll by and encryption passwords are queried, the monitor has a 50% chance to go to standby mode. Which didn't happen with your old patch.
Comment 80 Dhinakaran Pandiyan 2017-09-06 16:52:10 UTC
Interesting, I wonder if this is a timing issue. Can you please provide dmesg with drm.debug=14 appended to the kernel command line? Are the failures correlated to how you boot? eg. reboot from terminal or pressing the power button on the laptop.

Regarding the patches I sent, I think we should get them merged if the patches fix some of your MST problems without regressing any behavior. A Tested-By reply to the mailing list will surely help.
Comment 81 Dhinakaran Pandiyan 2017-09-06 18:02:38 UTC
Created attachment 134022 [details] [review]
toggle-power-state
Comment 82 Dhinakaran Pandiyan 2017-09-06 18:05:02 UTC
(In reply to Dhinakaran Pandiyan from comment #81)
> Created attachment 134022 [details] [review] [review]
> toggle-power-state
I am seeing the monitors consistently come up with https://patchwork.freedesktop.org/series/29853/ + this diff.
Comment 83 sassmann 2017-09-07 07:42:10 UTC
Created attachment 134035 [details]
dmesg-4.13-drmdebug-29853-plus-diff.txt

I've reproduced with the latest patchset plus the diff from your last post.
Comment 84 Dhinakaran Pandiyan 2017-09-18 22:15:29 UTC
I can't point out any issues in that dmesg, I'll have to try to get the monitor you are using to debug this :/ The Dell P2715Q I have, has been working consistently well with the patches I sent.
Comment 85 pepe_commerz 2017-09-26 00:09:45 UTC
Created attachment 134486 [details]
dmesg for crash on resume with DP via USB-C dock

Got a different Laptop now which docks via USB-C (Latitude 7280, Fedora 26). Suspend/resume and dock/undock are a lot more stable now than with the previously reported Thinkpad, which crashed almost every time. Sometimes though I still get a crashed X or worse.. :-/

I have dmesg output for one of those crashes attached that did not completely disable my screen. Thought it might provide a different view on things..
Comment 86 Jani Nikula 2017-10-05 08:24:38 UTC
A patch refenrencing this bug was committed as

commit 5ea2355a100a3c6304901d058aee06d3a6be69bc
Author: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Date:   Tue Oct 3 17:22:11 2017 +0300

    drm/i915/mst: Use MST sideband message transactions for dpms control

Does this fix the issue?
Comment 87 sassmann 2017-10-07 07:09:38 UTC
Created attachment 134721 [details]
dmesg-4.14-rc3

With todays drm-tip d080ef02abfef0511c0e91b22b7367259de574d9 we're back to the point where no matter what I do the monitor goes into standby after first modeswitch and never wakes up again.
Comment 88 Ville Syrjala 2017-10-26 18:02:43 UTC
Not sure if it's related to this bug in particular, but what I've observed with all Dell MST monitors I've tested is that if you reboot the machine while the MST display is active it goes into some kind of bad state, and sometimes you even have to physically remove the power cable from it to get it back to normal.

To fix that I think we should gracefully shut down all displays on reboot. I've posted a patch series to that end long ago. Someone could try to revive it and see if it helps with any of the remaining MST issues. I've pushed an outdated branch here: git://github.com/vsyrjala/linux.git panel_reboot_notifier_2
Comment 89 sassmann 2017-10-27 05:22:45 UTC
Sorry, but I can't help out with this anymore. Replaced the DELL U3415W with a LG monitor and all troubles went away.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.