Bug 103643

Summary: [Lenovo ThinkPad T450s] System hang with MST dock
Product: DRI Reporter: Alexander Kops <alexkops>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: anshuman.gupta, intel-gfx-bugs, johan.freedesktop, tomi, ville.syrjala
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
URL: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1727662
Whiteboard: Triaged, ReadyForDev
i915 platform: BXT i915 features: display/DP MST
Attachments:
Description Flags
kern.log - Crash seemed to happen at 13:31:38
none
kern.log with running drm-tip kernel from today - Computer shut down at 15:20:39
none
kern.log with running drm-tip kernel from today - Computer shut down at 15:20:39
none
kern.log with running drm-tip kernel from today - Computer froze at 16:17:45
none
dmesg
none
Video showing the screen
none
New dmesg 2019-06-19 running drm-tip none

Description Alexander Kops 2017-11-09 11:54:00 UTC
I reported this issue in the Ubuntu bug tracker:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1727662
I was advised to cross post it here. 

It can be reproduced in the latest Mainline kernel 4.14-rc8
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc8/
but doesn't seem to appear (after 1.5 days of testing) with the current drm-tip build
http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/current/

Copying original bug description here:
"What happens is after I dock my laptop into a Lenovo ThinkPad Ultra Dock Type 40A2 20V, lock it via clicking the lock icon, and wait ~30 minutes, one of the following three things happen ~50% of the time when I come back to unlock it:
* Most of the time the computer is found shut down.
* Sometimes the notebook screen flickers weirdly and one of the two external monitors show pixelation.
* The notebook screen shows the lock screen but is completely frozen. Occasionally unlocking works.

I have two external monitors attached to the docking station.

When I dock my laptop, the battery indicator does recognize its charging.

I've not seen this issue when:
1) With the laptop in the dock, while actively using the laptop.
2) With the laptop out of the dock and no external monitors present.

This bug started after upgrading from Ubtuntu 17.04 to Ubuntu 17.10.

Using or not using the following kernel parameter didn't change anything:
i915.enable_rc6=0

PC temperatures are normal as per lm-sensors."
Comment 1 Elizabeth 2017-11-09 21:02:25 UTC
Hello Alexander, 
Could you share a dmesg or/and kern.log with debug information from boot til problem: drm.debug=0x1e log_bug_len=2M on grub.
Comment 2 Alexander Kops 2017-11-09 21:18:32 UTC
I'm currently running my computer with the current Mainline kernel (4.14-rc8) and these settings and will post the dmesg as soon as I'm able to reproduce it.
Comment 3 Alexander Kops 2017-11-10 12:57:04 UTC
Created attachment 135371 [details]
kern.log - Crash seemed to happen at 13:31:38

I added a compressed kern.log (the computer shut down at Nov 10 13:31:38
I reproduced the bug running the current Mainline kernel 4.14.0-041400rc8-generic
Comment 4 Alexander Kops 2017-11-13 14:34:17 UTC
Created attachment 135436 [details]
kern.log with running drm-tip kernel from today - Computer shut down at 15:20:39

Today I was able to reproduce it with the drm-tip kernel found here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/current/

I attach the kern.log, the computer shut itself down at 15:20:39
This time no messages about a fifo underrun are in the log.
Comment 5 Elizabeth 2017-11-13 16:13:20 UTC
(In reply to Alexander Kops from comment #4)
> Created attachment 135436 [details]
> kern.log with running drm-tip kernel from today - Computer shut down at
> 15:20:39
> 
> Today I was able to reproduce it with the drm-tip kernel found here:
> http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/current/
> 
> I attach the kern.log, the computer shut itself down at 15:20:39
> This time no messages about a fifo underrun are in the log.
Hello Alexander, it seems that the log is keeping all the information that you already shared in the first attachment. Could you reproduce with a clean kern.log, since it shuts I guess a dmesg can't be obtained. 

To clean kern.log:
# rm /var/logs/kern.log
# reboot
The kern.log will regenerate after boot.
Comment 6 Elizabeth 2017-11-13 16:16:35 UTC
Also I noticed you marked this bug as a regression, could you please share latest good know kernel commit and bad know commit.
Comment 7 Alexander Kops 2017-11-13 16:46:18 UTC
Created attachment 135438 [details]
kern.log with running drm-tip kernel from today - Computer shut down at 15:20:39

Oops, looks like I re-uploaded the kern.log from last time. This one is the correct one from today.

Also the regression tag was added by "Christopher M. Penalver" from the Ubuntu bug tracker. So I can't point to specific kernel commits.
I just noticed that it started appearing after using the Kernel shipping with Ubuntu 17.10 and it wasn't happening with the kernel from 17.04
Comment 8 Elizabeth 2017-11-13 19:35:47 UTC
(In reply to Alexander Kops from comment #7)
> ...
> Also the regression tag was added by "Christopher M. Penalver" from the
> Ubuntu bug tracker. So I can't point to specific kernel commits.
> I just noticed that it started appearing after using the Kernel shipping
> with Ubuntu 17.10 and it wasn't happening with the kernel from 17.04
That would be 4.9 and 4.13, I guess...
Comment 9 Ville Syrjala 2017-11-14 14:46:04 UTC
(In reply to Alexander Kops from comment #7)
> Created attachment 135438 [details]
> kern.log with running drm-tip kernel from today - Computer shut down at
> 15:20:39
> 
> Oops, looks like I re-uploaded the kern.log from last time. This one is the
> correct one from today.

The logs contain multiple boots with multiple different kernels, so it's hard to say what's what. But this log doesn't seem to have any FIFO underruns. So am I to assume this is now fixed?
Comment 10 Alexander Kops 2017-11-14 16:01:51 UTC
> But this log doesn't seem to have any FIFO underruns. So am I to assume this is now fixed?

Well, it is fixed in a sense that these FIFO underruns don't appear anymore with the drm-tip kernel. But the behaviour, that the computer will just turn itself off a lot of times after enabling lock screen is still there.
Comment 11 Alexander Kops 2017-11-14 16:19:29 UTC
Created attachment 135451 [details]
kern.log with running drm-tip kernel from today - Computer froze at 16:17:45

I'll attach this current kern.log. This time the situation was a bit different, I didn't find the notebook turned off, but the power light was still on, but all three screens were black and it didn't react to anything. So I had to hard reboot it.

Maybe you can see something in the logs that would lead to a follow up bug report?

The last thing I see in the log before the crash are a bunch of 

[drm:drm_mode_addfb2 [drm]] [FB:87]

lines.
Comment 12 Ville Syrjala 2017-11-14 17:01:10 UTC
(In reply to Alexander Kops from comment #11)
> Created attachment 135451 [details]
> kern.log with running drm-tip kernel from today - Computer froze at 16:17:45
> 
> I'll attach this current kern.log. This time the situation was a bit
> different, I didn't find the notebook turned off, but the power light was
> still on, but all three screens were black and it didn't react to anything.
> So I had to hard reboot it.
> 
> Maybe you can see something in the logs that would lead to a follow up bug
> report?
> 
> The last thing I see in the log before the crash are a bunch of 
> 
> [drm:drm_mode_addfb2 [drm]] [FB:87]
> 
> lines.

Nothing interesting there unfortunately. So I guess we're dealing with some kind of hard system hang, and it doesn't manage to write anything useful to the logs. So it's not even clear whether this has anything to do with i915, or caused by something totally different. Maybe try netconsole/serial console if the machine has a ethernet/serial port. Or you may want to look into pstore to see if that might catch something when the machine dies.

Maybe also enable various debug features in the kernel config:
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_PROVE_LOCKING=y

PS. Your logs are huuuuge. Might want to trim away the unrelated boots from the logs.
Comment 13 Maarten Lankhorst 2017-11-15 09:21:23 UTC
iirc when ubuntu puts system in slumber it first calls a bunch of addfb's for the fade to black animation. It doesn't mean it's the cause of the issue, though could very well be related to dpms off.
Comment 14 Angelo Lisco 2018-01-23 19:32:08 UTC
I can confirm that this weird issue also happens without a docking station.
It happens both with an external screen and without it.

system-manufacturer: LENOVO
system-version: ThinkPad T450s
bios-version: JBET66WW (1.30 )
bios-release-date: 09/13/2017
Comment 15 Jani Nikula 2018-01-24 10:23:25 UTC
(In reply to Angelo Lisco from comment #14)
> I can confirm that this weird issue also happens without a docking station.
> It happens both with an external screen and without it.

Alexander Kops, as the original reporter, can you confirm the same without a docking station or external screen?
Comment 16 Alexander Kops 2018-01-24 10:26:55 UTC
I tried to reproduce it without docking station once, but wasn't able to. But I also don't use the notebook without docking station for longer times usually, so no throughout testing happened.
Comment 17 Jani Saarinen 2018-03-29 07:10:37 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 18 Jani Saarinen 2018-04-20 14:51:26 UTC
Closing, please re-open if still occurs.
Comment 19 Johan Thorén 2019-03-02 12:31:46 UTC
This error still occurs on kernel 4.20.

My model is T470s and the behavior is consistent: It only happens when connected to an external display through the dock, not when using it disconnected from the dock. Can (most of the time) be triggered by changing display settings through xrandr.

Dock model is SD20F82750.

Dmesg error message:
[drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun

GPU: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07)
Comment 20 Lakshmi 2019-03-06 11:48:44 UTC
(In reply to Johan Thorén from comment #19)
> This error still occurs on kernel 4.20.
> 
> My model is T470s and the behavior is consistent: It only happens when
> connected to an external display through the dock, not when using it
> disconnected from the dock. Can (most of the time) be triggered by changing
> display settings through xrandr.
> 
> Dock model is SD20F82750.
> 
> Dmesg error message:
> [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO
> underrun
> 
> GPU: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07)

The original bug is reported is on Broadwell. So, your issue could be different from the original issue reported in this bug.

Can you please attach the full dmesg log from boot with kernel parameters drm.debug=0x1e log_buf_len=4M?

What is the impact of this issue other than the error in the log? Can you elaborate the issue?
Comment 21 Johan Thorén 2019-03-10 09:18:55 UTC
Created attachment 143607 [details]
dmesg

Here is my dmesg output with the requested parameters. I'm now running the 5.0.0 kernel with the same behavior. The trigger is sometimes a xrandr change, but almost always coming back from suspend. Reboot is necessary.
Comment 22 Johan Thorén 2019-03-10 09:28:55 UTC
Created attachment 143608 [details]
Video showing the screen
Comment 23 Johan Thorén 2019-03-23 20:20:03 UTC
Would appreciate feedback on the data given, if more is needed or if a separate bug report should be filed. Thanks.
Comment 24 James Ausmus 2019-06-18 14:44:10 UTC
Johan - can you re-run on drm-tip, and see if the issue persists? If it does, please provide the dmesg log output
Comment 25 Johan Thorén 2019-06-19 14:53:01 UTC
Created attachment 144595 [details]
New dmesg 2019-06-19 running drm-tip

James, the issue persists running drm-tip as of today.
Comment 26 Anshuman Gupta 2019-07-18 10:22:22 UTC
Hi johan,

I need few inputs.

1. Is this issue is seen when u connect a external display directly to laptop without dock ?

2. Have u screen the display tear issue on embedder panel of laptop.

3. As i see from dmesg logs your external display resolution is 1920x1080, have u observed the issue with other monitors or with other resolution modes.


4. could you please let me know if you see this issue after running below command

echo "2 0 0 0 0 0 0 0" > /sys/kernel/debug/dri/0/i915_pri_wm_latency

Thanks ,
Anshuman
Comment 27 Johan Thorén 2019-07-19 09:57:58 UTC
Hi Anshuman,

1. Is this issue is seen when u connect a external display directly to laptop without dock?

It happens more frequently (it seems) when connected through a dock, but it also happens when using a cable directly to the laptop.

2. Have u screen the display tear issue on embedder panel of laptop.

It never happens on the embedded panel, only on external displays.

3. As i see from dmesg logs your external display resolution is 1920x1080, have u observed the issue with other monitors or with other resolution modes.

Actually, the resolution of the screen is 1920x1200. I have, however, observed the same behavior on a screen that has 1920x1080 resolution, as well as an older VGA screen with a lower resolution that I don't know as we speak.

4. could you please let me know if you see this issue after running below command

echo "2 0 0 0 0 0 0 0" > /sys/kernel/debug/dri/0/i915_pri_wm_latency

After running this command I did not notice the problem. Was this command supposed to fix the problem or provoke it? If it's meant to fix it, I will need to test more extensively since I only had the opportunity to test for maybe 20 minutes.

Let me know if anything is needed. Thanks for taking the time!
Comment 28 Johan Thorén 2019-07-25 17:18:19 UTC
I've done some additional testing, and after issuing 'echo "2 0 0 0 0 0 0 0" > /sys/kernel/debug/dri/0/i915_pri_wm_latency' I ran for over 2 hours without any freeze, and I tried to provoke the error by switching resolution and screen layout several times.
Comment 29 Johan Thorén 2019-07-29 18:53:17 UTC
I've now verified that the error still occurs with or without that command issued, especially when coming back from suspend.
Comment 30 Lakshmi 2019-08-27 13:04:23 UTC
(In reply to Johan Thorén from comment #27)
> Hi Anshuman,
> 
> 1. Is this issue is seen when u connect a external display directly to
> laptop without dock?
> 
> It happens more frequently (it seems) when connected through a dock, but it
> also happens when using a cable directly to the laptop.
> 
> 2. Have u screen the display tear issue on embedder panel of laptop.
> 
> It never happens on the embedded panel, only on external displays.
> 
> 3. As i see from dmesg logs your external display resolution is 1920x1080,
> have u observed the issue with other monitors or with other resolution modes.
> 
> Actually, the resolution of the screen is 1920x1200. I have, however,
> observed the same behavior on a screen that has 1920x1080 resolution, as
> well as an older VGA screen with a lower resolution that I don't know as we
> speak.
> 
> 4. could you please let me know if you see this issue after running below
> command
> 
> echo "2 0 0 0 0 0 0 0" > /sys/kernel/debug/dri/0/i915_pri_wm_latency
> 
> After running this command I did not notice the problem. Was this command
> supposed to fix the problem or provoke it? If it's meant to fix it, I will
> need to test more extensively since I only had the opportunity to test for
> maybe 20 minutes.
> 
> Let me know if anything is needed. Thanks for taking the time!

@Anshuman, any further suggestions?
Comment 31 Anshuman Gupta 2019-09-02 16:23:30 UTC
(In reply to Johan Thorén from comment #29)
> I've now verified that the error still occurs with or without that command
> issued, especially when coming back from suspend.

hmm, i was expecting this command to improve the issue, if it would improve the issue, then we could think of a watermark issue.
Comment 32 Tomas Janousek 2019-10-08 17:56:13 UTC
I was experiencing the same issue with a ThinkPad 25 (which is almost the same thing as T470) and I implemented these precautions as a workaround:

- disable DPMS when docked and external monitors enabled
- never switch VTs with external monitors enabled
- always disable external monitors before suspending or undocking

Posting here just in case someone is still suffering from this and hasn't figured out a workaround yet.
Comment 33 Jani Saarinen 2019-11-17 08:50:21 UTC
Just lately there was some fixes on drm-tip on MST. Are you able to test with latest drm-tip and report back behaviour.
Comment 34 Johan Thorén 2019-11-17 13:13:49 UTC
I've just built the drm-tip kernel. Will test this for a few days and report back.
Comment 35 Jani Saarinen 2019-11-18 07:06:43 UTC
Thanks.
Comment 36 Johan Thorén 2019-11-23 10:13:30 UTC
I'm happy to report that I've not had a single problem since I installed 5.4.0-rc7-drm-tip-git-g3ff71899c56c.

I'm marking this as resolved, with thanks!
Comment 37 Jani Saarinen 2019-11-23 11:18:51 UTC
Excellent to hear, thank you.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.