Bug 103029 - Reoccuring CPU pipe [A|B] FIFO underrun followed by corrupt external monitor
Summary: Reoccuring CPU pipe [A|B] FIFO underrun followed by corrupt external monitor
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-28 16:32 UTC by Martin Weinberg
Modified: 2018-04-25 08:11 UTC (History)
3 users (show)

See Also:
i915 platform: HSW
i915 features: display/atomic


Attachments
Output of lspci (22.19 KB, text/plain)
2017-09-28 16:32 UTC, Martin Weinberg
no flags Details
Relevant output using drm.debug=0xe (123.95 KB, text/plain)
2017-09-28 16:34 UTC, Martin Weinberg
no flags Details

Description Martin Weinberg 2017-09-28 16:32:46 UTC
Created attachment 134548 [details]
Output of lspci

Since Kernel 4.13, I've been experiencing problems with my Lenovo T440s while on the Ultradock using HDMI.  No problems on earlier kernel.  This has occured on 4.13.0 through 4.13.3 (*current*).

A mostly successful work around is to suspend and resume off the dock.

T440s is Haswell; output of lspci attached. I'm using Ubuntu 17.04, uname -a is:

Linux magpie 4.13.3-041303-generic #201709200606 SMP Wed Sep 20 10:12:46 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


Specifically, the log contains entries like: *ERROR* fifo underrun on pipe B

Some fraction of the time, but not every time, this occurs, the external screen becomes corrupted.  It acts as if it's lost sync.

There are no additional kernel messages.  To help, I rebooted with drm.debug=0xe to capture an "event".  I'm attaching part of the log with drm messages below.
Comment 1 Martin Weinberg 2017-09-28 16:34:53 UTC
Created attachment 134549 [details]
Relevant output using drm.debug=0xe
Comment 2 Jani Saarinen 2018-02-05 12:01:10 UTC
I am really sorry about this delay. Have you been able to see this with latest kernel's or preferably with drm-tip?
Comment 3 Maarten Lankhorst 2018-02-05 16:09:49 UTC
Weird, I don't see anything that could cause it.

If v4.12 works, are you willing to do a bisection?

~Maarten
Comment 4 Martin Weinberg 2018-02-05 16:24:10 UTC
I am still seeing the error messages logged as of 4.14.16.  The weird "loss of sync" corruption on the external monitor does seem to depend on the kernel and I have not seen them as of 4.14.16.  My work around has been to undock and redock.  That seems to clear the graphics issue without a reboot.

In case I wasn't clear, this never happens on the internal LCD.  This is a Lenovo T440s on a Lenovo Ultradock with an HDMI 1080p monitor.  I really like the dock concept, but the MST hub seems to not play nice under linux.  Took years for the HDMI audio pass through to be supported.

Kernel: I am trying 4.15.1 now.  I just keep trying the new kernels and hope for the best.

I am willing to do bisection, if you folks are willing to give me some explicit instructions, but I'm very busy for the next few weeks, so I won't be able to get to that right away.

I am also willing to try the drm-tip (much easier than bisection in the short time frame).
Comment 5 Maarten Lankhorst 2018-02-05 16:47:02 UTC
drm-tip would be useful to know. :)
Comment 6 Martin Weinberg 2018-02-05 17:05:42 UTC
Will do.  

I just started running 4.15.1 today.  

I usually see the "events" once or twice a day.  I usually experience the problem after the monitor wakes up from standby, although not always.  Not sure I mentioned that previously, now that I think about it.  I'm not sure that this is significant in any way.

My plan is to try 4.15.1 for a day or two before trying the drm-tip.  Or would you suggest the other way around?
Comment 7 Maarten Lankhorst 2018-02-06 07:29:14 UTC
drm-tip will tell us the bug is fixed at least. :)
Comment 8 Martin Weinberg 2018-02-07 16:17:44 UTC
I can report that 4.15.1 still has the bug.

I was hopeful---it worked without a hitch for two days---but presented the "event" this morning with the same error messages and corrupted video.

I'm now running drm-tip, downloaded this morning.
Comment 9 maximeqc 2018-02-09 01:03:19 UTC
I think I have the exact same bug. For me it's worst because my external monitor is always corrupted as soon as my laptop screen is closed. I also see the "FIFO underrun" message. 

https://bugs.freedesktop.org/show_bug.cgi?id=105016
Comment 10 maximeqc 2018-02-09 17:55:29 UTC
Hi Martin,
I found a workaround for my problem and it's to remove the laptop battery. After this, no more external monitor issue and no more "FIFO underrun" message. You could try!
Comment 11 Martin Weinberg 2018-02-09 18:34:24 UTC
Really!  I wonder why that would be?  Anyway, I will try it.

I can also report that using the drm-tip kernel that I haven't seen the problem for 2 days.  So you might want to give that a shot.  Two people testing are better than one.
Comment 12 Martin Weinberg 2018-02-09 23:46:05 UTC
Well, I just experienced the bug on the drm-tip kernel build.  So the bug is NOT fixed there, I'm sorry to have to report.

I will try the battery suggestion.  However, the T440s has two batteries, one internal (unremovable) and one external (removable).
Comment 13 maximeqc 2018-02-10 17:02:15 UTC
Finally removing the battery didn't solve the problem completly. That's very weird because I can boot my computer 10 time in a row without any problem. the next 10 times my VGA screen is corrupted! But it's not a hardware failure because this bug only happen since Kernel 4.13 when my laptop screen is closed. There is something very strange with the way the i915 module handle a second monitor since 4.13.
Comment 14 Martin Weinberg 2018-02-10 17:21:27 UTC
Thanks for the message.

I can confirm that the bug happens erratically for me as well.  I see it when the monitor comes out of standby.  I have never seen the problem on boot or on resume from suspend.

As I've mentioned above, it takes 2 to 3 days with the most recent kernels for the bug to occur.  My sense is that this was happening more regularly with 4.13 but I have no controlled, quantified data on that.

If I could induce the bug reliably, it would be possible to bisect the kernel from 4.12 to 4.13 to pinpoint the issue.  But this is almost prohibitively time consuming if I have to wait a day or two between each build.

I can live with the problem but it sure is annoying.
Comment 15 Jani Saarinen 2018-03-29 07:11:46 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 16 Martin Weinberg 2018-04-05 21:27:45 UTC
I am currently testing the drm-tip version dated 03/31/18.  The good news is that I have not seen the bug after running for 4 days.  This is the longest run time without this bug occurring to date, although there have been several reboots during that time, for different causes.  I am not sure whether this invalidates the test.

I am still investigating.
Comment 17 Martin Weinberg 2018-04-05 21:32:26 UTC
BTW, the bug _does_ occur with the kernel about to be shipped with Ubuntu 18.04 LTS, sadly.
Comment 18 Jani Saarinen 2018-04-25 08:11:39 UTC
Closing, please re-open is issue still exists.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.