Bug 95736 - [SNB bisected] *ERROR* uncleared fifo underrun on pipe A
Summary: [SNB bisected] *ERROR* uncleared fifo underrun on pipe A
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) All
: high critical
Assignee: Matt Roper
QA Contact: Elio
URL:
Whiteboard: ReadyForDev
Keywords: bisected
: 100219 105060 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-05-24 13:22 UTC by Chris Bainbridge
Modified: 2018-04-22 15:51 UTC (History)
10 users (show)

See Also:
i915 platform: SNB
i915 features: display/watermark


Attachments
dmesg-drm (169.71 KB, text/plain)
2016-06-29 14:49 UTC, Chris Bainbridge
no flags Details
a kernel configuration which is affected by the bug (112.92 KB, text/x-mpsub)
2017-01-30 21:38 UTC, Fabian Henze
no flags Details
dmesg on sandy bridge with drm.debug=0x1f (97.31 KB, text/plain)
2017-02-23 15:55 UTC, Fabian Henze
no flags Details
kernel 4.11-rc4: dmesg on sandy bridge with drm.debug=0x1f (96.72 KB, text/plain)
2017-04-02 16:21 UTC, Fabian Henze
no flags Details
drm-tip: dmesg on sandy bridge with drm.debug=0x1f (95.72 KB, text/plain)
2017-04-19 21:57 UTC, Fabian Henze
no flags Details
dmesg on ivy bridge with drm.debug=0x1f drm-intel-fixes-2017-08-09-1-801-gd063a48456d2 (842.55 KB, text/plain)
2017-08-13 11:52 UTC, Chris Bainbridge
no flags Details

Description Chris Bainbridge 2016-05-24 13:22:56 UTC
Looks similar to bug #93802 but now occurs on pipe A every boot:

[    1.446499] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* uncleared fifo underrun on pipe A
[    1.446501] [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe A FIFO underrun
[    1.554896] [drm:intel_check_pch_fifo_underruns] *ERROR* pch fifo underrun on pch transcoder A

Bisect result:

ed4a6a7ca853253f9b86f3005d76345482a71283 is the first bad commit
commit ed4a6a7ca853253f9b86f3005d76345482a71283
Author: Matt Roper <matthew.d.roper@intel.com>
Date:   Tue Feb 23 17:20:13 2016 -0800

    drm/i915: Add two-stage ILK-style watermark programming (v11)
Comment 1 Sedat Dilek 2016-06-27 06:30:06 UTC
I have reported this issue separately and I still see this on Linux v4.7-rc5 on my Sandybridge GPU.

[ dmesg ]
...
[   18.524752] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[   18.527260] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A
[   18.527294] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun

$ grep -i sandy /var/log/Xorg.0.log
[    25.674] (II) intel(0): SNA initialized with Sandybridge (gen6, gt2) backend

- Sedat Dilek -
Comment 2 Matt Roper 2016-06-27 22:05:37 UTC
Do these messages only appear one time when the driver first loads, or do they continue to get generated during regular system operation (e.g., moving mouse, changing modes, etc.)?

If you only see them when the driver is loaded, then there's probably something not quite right during our initial watermark sanitization step where we inherit the BIOS configuration and try to clean it up.  If they continue to get generated later while the system is in use, that will indicate that we're still doing something wrong at runtime, which is more serious.

Is there any corruption/flickering that goes along with these messages?

If you could provide a dmesg log of a boot with debugging turned on (drm.debug=0xf), that might help us debug where the problem is.
Comment 3 Chris Bainbridge 2016-06-29 14:44:35 UTC
The messages appear once at boot.

(The similar messages mentioned in bug #93802 appear every time Xorg is quit or switched to Linux terminal)

There is no obvious corruption or flickering.
Comment 4 Chris Bainbridge 2016-06-29 14:49:54 UTC
Created attachment 124779 [details]
dmesg-drm

boot log attached
Comment 5 Jani Saarinen 2016-12-09 08:02:04 UTC
Is this issue still valid?
Comment 6 Chris Bainbridge 2016-12-12 23:57:00 UTC
(In reply to Jani Saarinen from comment #5)
> Is this issue still valid?

Yes still appear at boot on 4.9.0
Comment 7 Fabian Henze 2017-01-07 18:26:52 UTC
This bug is still present on 4.10-rc2 on my Thinkpad W520 (sandy bridge).
Comment 8 Dorota Czaplejewicz 2017-01-30 18:24:56 UTC
I tried to reproduce on IVB-3770 in 2 scenarios:
- boot with no display manager
- turn on display manager after boot

, but no underruns in dmesg with drm.debug=0x1f.

Does it happen if X/display manager is not enabled on boot?

Does this issue require special kernel config?
Comment 9 Fabian Henze 2017-01-30 21:38:12 UTC
Created attachment 129242 [details]
a kernel configuration which is affected by the bug
Comment 10 Fabian Henze 2017-01-30 21:41:59 UTC
(In reply to Dorota Czaplejewicz from comment #8)
> Does it happen if X/display manager is not enabled on boot?
> Does this issue require special kernel config?

It also happens when just my initrd with the normal KMS console is booted.

I don't know if a special kernel configuration is required, but I attached mine just in case.
Comment 11 Tomeu Vizoso 2017-02-09 12:25:14 UTC
(In reply to Fabian Henze from comment #10)
> (In reply to Dorota Czaplejewicz from comment #8)
> > Does it happen if X/display manager is not enabled on boot?
> > Does this issue require special kernel config?
> 
> It also happens when just my initrd with the normal KMS console is booted.
> 
> I don't know if a special kernel configuration is required, but I attached
> mine just in case.

Chris, Fabian, Sedat, would you be able to find out what are the programmed latencies at boot (the ones inherited from the BIOS), and if a tweak similar to the one below helps avoid it?

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e95a2f7509f5219177d6821a0a8754f93892ca56
Comment 12 Fabian Henze 2017-02-15 15:30:38 UTC
(In reply to Tomeu Vizoso from comment #11)
> Chris, Fabian, Sedat, would you be able to find out what are the programmed
> latencies at boot (the ones inherited from the BIOS),

If you give me some hints how to do that, I would.

> and if a tweak similar to the one below helps avoid it?
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=e95a2f7509f5219177d6821a0a8754f93892ca56

Since I am using a sandy bridge notebook (1080p resolution if that matters), this tweak should already be in place, right?
Comment 13 Tomeu Vizoso 2017-02-21 14:49:34 UTC
(In reply to Fabian Henze from comment #12)
> (In reply to Tomeu Vizoso from comment #11)
> > Chris, Fabian, Sedat, would you be able to find out what are the programmed
> > latencies at boot (the ones inherited from the BIOS),
> 
> If you give me some hints how to do that, I would.

It should be already printed, at least in recent kernels. intel_print_wm_latency will tell. Could you please attach your dmesg with drm.debug=0x1f?

> > and if a tweak similar to the one below helps avoid it?
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> > ?id=e95a2f7509f5219177d6821a0a8754f93892ca56
> 
> Since I am using a sandy bridge notebook (1080p resolution if that matters),
> this tweak should already be in place, right?

You are right, could you please find out what's the smaller value that when passed to ilk_increase_wm_latency() will avoid the messages?
Comment 14 Tomeu Vizoso 2017-02-23 07:40:55 UTC
A fix has come for another similar bug that could also solve these issues:

https://patchwork.freedesktop.org/series/19953/

Could someone test it and see if it also fixes the issue described in this ticket? Thanks!
Comment 15 Fabian Henze 2017-02-23 15:55:22 UTC
Created attachment 129876 [details]
dmesg on sandy bridge with drm.debug=0x1f
Comment 16 Fabian Henze 2017-02-23 15:57:31 UTC
(In reply to Tomeu Vizoso from comment #14)
> A fix has come for another similar bug that could also solve these issues:
> https://patchwork.freedesktop.org/series/19953/

Unfortunately it didn't fix this bug, but I attached my dmesg output with drm.debug=0x1f. This was on a 4.10 kernel WITH the mentioned patch applied.
Comment 17 Fabian Henze 2017-02-23 16:11:35 UTC
(In reply to Tomeu Vizoso from comment #13)
> You are right, could you please find out what's the smaller value that when
> passed to ilk_increase_wm_latency() will avoid the messages?
I was not sure how many and which values I should try, so I set it to 10 and the message didn't go away. Then I tried removing ilk_increase_wm_latency() altogether, which didn't help either. Anything else I should try?
Comment 18 Tomeu Vizoso 2017-02-24 07:29:47 UTC
(In reply to Fabian Henze from comment #17)
> (In reply to Tomeu Vizoso from comment #13)
> > You are right, could you please find out what's the smaller value that when
> > passed to ilk_increase_wm_latency() will avoid the messages?
> I was not sure how many and which values I should try, so I set it to 10 and
> the message didn't go away. Then I tried removing ilk_increase_wm_latency()
> altogether, which didn't help either. Anything else I should try?

Hi, I think 10 would be a bit too little. I would expect the value to be between 12 and 20, but maybe more could be needed.
Comment 19 Fabian Henze 2017-02-25 23:39:04 UTC
(In reply to Tomeu Vizoso from comment #18)
> Hi, I think 10 would be a bit too little. I would expect the value to be
> between 12 and 20, but maybe more could be needed.
 
I tried a few values between 13 and 100 (13, 20, 50 and 100 iirc) and all had the fifo underrun. What now ...?
Comment 20 Tomeu Vizoso 2017-02-28 08:05:31 UTC
(In reply to Fabian Henze from comment #19)
> (In reply to Tomeu Vizoso from comment #18)
> > Hi, I think 10 would be a bit too little. I would expect the value to be
> > between 12 and 20, but maybe more could be needed.
>  
> I tried a few values between 13 and 100 (13, 20, 50 and 100 iirc) and all
> had the fifo underrun. What now ...?

So, your last log showed this:

[    1.034165] [drm:gen6_check_mch_setup] Wrong MCH_SSKPD value: 0x16040307 This can cause underruns.

This usually means that the BIOS programmed a too low priority (ox7 instead of 0xc) for memory requests from the display, but that register is locked by when the kernel starts so we cannot fix it.

I see your machine currently has fw 1.42 and there's a 1.43 available from Lenovo's support site, but the BIOS changelog doesn't mention such a fix. It may be worth it anyway to upgrade the BIOS to 1.43.

In the kernel sources it's mentioned that when the BIOS has such a bug and it cannot be updated to a fixed version, that the issue can only be worked around by increasing the latencies. It may still be worth it though to check that the watermarks are programmed at the right values.

But you report that fixing up the latencies above 12 didn't help, so we might not be able to do anything about it other than hiding the message.
Comment 21 Tomeu Vizoso 2017-03-14 12:08:49 UTC
(In reply to Chris Bainbridge from comment #6)
> (In reply to Jani Saarinen from comment #5)
> > Is this issue still valid?
> 
> Yes still appear at boot on 4.9.0

Hi, could you please try again with v4.11-rc1 while booting with ignore_loglevel drm.debug=0x1f and attach the whole dmesg? Thanks!
Comment 22 Jari Tahvanainen 2017-03-28 07:27:07 UTC
Fabian - can you please help out Tomeu? See comment 21, and use the newest kernel if possible.
Comment 23 Fabian Henze 2017-04-02 16:21:12 UTC
Created attachment 130647 [details]
kernel 4.11-rc4: dmesg on sandy bridge with drm.debug=0x1f
Comment 24 Fabian Henze 2017-04-02 16:22:04 UTC
(In reply to Tomeu Vizoso from comment #21)
> Hi, could you please try again with v4.11-rc1 while booting with
> ignore_loglevel drm.debug=0x1f and attach the whole dmesg? Thanks!

Sorry for the long delay. I attached the logfile and will try to respond faster next time ;-)
Comment 25 Ville Syrjala 2017-04-12 18:38:27 UTC
Can you re-test with the latest development kernel?
git://anongit.freedesktop.org/drm-tip drm-tip

I'm thinking that commit a5509abda48e ("drm/i915: Fix legacy cursor vs. watermarks for ILK-BDW") in particular may be helpful.
Comment 26 Fabian Henze 2017-04-19 21:57:12 UTC
Created attachment 130929 [details]
drm-tip: dmesg on sandy bridge with drm.debug=0x1f

still present in drm-tip 1b757084743a.
Comment 27 Ricardo 2017-05-09 16:30:03 UTC
Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs inclueded
Comment 28 Jani Saarinen 2017-06-08 07:00:36 UTC
Reporters, can you again verify if help coming from latest drm-tip?
Comment 29 Jani Saarinen 2017-07-31 13:07:33 UTC
Reporters, ping. Closing soon if no response. 
Is this still valid on latest?
Comment 30 Chris Bainbridge 2017-08-13 11:52:56 UTC
Created attachment 133471 [details]
dmesg on ivy bridge with drm.debug=0x1f drm-intel-fixes-2017-08-09-1-801-gd063a48456d2

Bug still exists in latest drm nightly (drm-intel-fixes-2017-08-09-1-801-gd063a48456d2) on IVB Macbook.
Comment 31 François Guerraz 2017-09-15 11:13:15 UTC
It was reported that some fifo underrund problems can be caused by enabling CONFIG_INTEL_IOMMU_DEFAULT_ON=y (see https://bugs.archlinux.org/task/55629#comment161154).
I notice that Fabian's configuration has this flag set...
Could that really be the problem?
Comment 32 Fabian Henze 2017-09-16 12:20:57 UTC
(In reply to François Guerraz from comment #31)
> It was reported that some fifo underrund problems can be caused by enabling
> CONFIG_INTEL_IOMMU_DEFAULT_ON=y (see
> https://bugs.archlinux.org/task/55629#comment161154).
> I notice that Fabian's configuration has this flag set...
> Could that really be the problem?

Setting intel_iommu=igfx_off indeed fixed it.
Comment 33 Elizabeth 2017-10-06 20:04:36 UTC
*** Bug 100219 has been marked as a duplicate of this bug. ***
Comment 34 Elizabeth 2018-02-12 18:50:53 UTC
*** Bug 105060 has been marked as a duplicate of this bug. ***
Comment 35 Jani Saarinen 2018-03-29 07:10:47 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 36 Jani Saarinen 2018-04-22 15:51:13 UTC
Closing, please re-open if still occurs.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.