97965 – [BSW] External HDMI monitor suddenly shows solid color when playing Youtube video at 1080p [fifo underrun]

Bug 97965 - [BSW] External HDMI monitor suddenly shows solid color when playing Youtube video at 1080p [fifo underrun]

Summary: [BSW] External HDMI monitor suddenly shows solid color when playing Youtube v...

Status:	CLOSED INVALID

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	DRI git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	ykku16dev2
QA Contact:	maria guadalupe

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2016-09-28 19:49 UTC by ykku16dev2
Modified:	2017-08-11 20:30 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:	BSW/CHT
i915 features:	display/HDMI

Attachments
dmesg captured with drm.debug=0x1e log_buf_len=1M (1010.85 KB, text/plain) 2016-09-28 19:49 UTC, ykku16dev2	no flags	Details
Contains all of dmesg starting from boot. (3.01 MB, text/plain) 2016-09-28 19:52 UTC, ykku16dev2	no flags	Details
Workaround using PM QoS (2.26 KB, patch) 2016-09-29 19:51 UTC, ykku16dev2	no flags	Details \| Splinter Review
dmesg captured on BSW RVP (1000.15 KB, text/plain) 2016-09-30 17:20 UTC, ykku16dev2	no flags	Details
Contains all of dmesg captured on BSW RVP (7.31 MB, text/plain) 2016-09-30 17:21 UTC, ykku16dev2	no flags	Details
Output of intel_reg dump --all captured on BSW RVP (95.95 KB, text/plain) 2016-09-30 17:22 UTC, ykku16dev2	no flags	Details
Output of /proc/cpuinfo for BSW RVP (3.66 KB, text/plain) 2016-09-30 17:36 UTC, ykku16dev2	no flags	Details
Output of dmidecode for BSW RVP (14.78 KB, text/plain) 2016-09-30 17:37 UTC, ykku16dev2	no flags	Details
[PATCH] drm/i915: Set DDL to 0 (1.21 KB, patch) 2016-10-04 14:44 UTC, Ville Syrjala	no flags	Details \| Splinter Review
Output of intel_watermark captured on BSW RVP after problem occurred (1.42 KB, text/plain) 2016-10-04 22:26 UTC, ykku16dev2	no flags	Details
Output of intel_watermark on BSW RVP with DDL set to 0 patch (1.42 KB, text/plain) 2016-10-04 23:25 UTC, ykku16dev2	no flags	Details
Experimental patch for setting DDL to 0 and WM latency to 45us (1.69 KB, patch) 2016-10-05 02:27 UTC, ykku16dev2	no flags	Details \| Splinter Review
Output of intel_watermark with DDL=0 and WM latency=45us (1.42 KB, text/plain) 2016-10-05 02:28 UTC, ykku16dev2	no flags	Details
Experimental patch for DDL=0, latency=45us, and cursor WM=63 (2.26 KB, patch) 2016-10-06 04:20 UTC, ykku16dev2	no flags	Details \| Splinter Review
Output of intel_watermark for DDL=0, latency=45us, and cursor WM=63 (1.42 KB, text/plain) 2016-10-06 04:21 UTC, ykku16dev2	no flags	Details
Output of intel_watermark in dock mode (1.42 KB, text/plain) 2016-10-07 16:24 UTC, ykku16dev2	no flags	Details
Output of intel_watermark with PM2 limit (1.42 KB, text/plain) 2016-10-07 20:29 UTC, ykku16dev2	no flags	Details
Output of intel_watermark with trickle feed (1.42 KB, text/plain) 2016-10-07 21:36 UTC, ykku16dev2	no flags	Details
Output of intel_watermark using pipe B (1.42 KB, text/plain) 2016-10-10 01:13 UTC, ykku16dev2	no flags	Details
Output of intel_watermark using pipe B/port B to drive HDMI monitor (1.42 KB, text/plain) 2016-10-27 06:35 UTC, ykku16dev2	no flags	Details
Output of intel_reg dump using pipe B/port B to drive HDMI monitor (96.16 KB, text/plain) 2016-10-27 06:36 UTC, ykku16dev2	no flags	Details
Output of intel_watermark using only port B/pipe B on BSW RVP (1.42 KB, text/plain) 2016-10-31 07:51 UTC, ykku16dev2	no flags	Details
Output of intel_reg dump --all using only port B/pipe B on BSW RVP (96.16 KB, text/plain) 2016-10-31 07:52 UTC, ykku16dev2	no flags	Details
View All

Description ykku16dev2 2016-09-28 19:49:50 UTC

Created attachment 126837 [details]
dmesg captured with drm.debug=0x1e log_buf_len=1M

Steps to reproduce the issue:
=============================
1.  Boot BSW-based machine
2.  Plug in external HDMI monitor at 1920x1080 resolution in extended mode
3.  Play Youtube video at 1080p and at full screen on the HDMI monitor (I did not route audio to the HDMI monitor.)

The problem usually happens within 30 to 40 minutes.  Video playback will be fine most of the time.  However, you will see intermittent flicker or tearing.  The HDMI monitor will suddenly display a solid color (usually dark purple).  The only way I know to recover from this is to reboot the machine.

Reproducibility:
================
The rate of occurrence is roughly 80%.

Information about my system:
============================
system architecture: x86_64
kernel version: 4.8.0-rc8-02558-g6fa2afd-dirty (built from drm-intel-nightly on 9/28/2016)

Comment 1 ykku16dev2 2016-09-28 19:52:08 UTC

Created attachment 126838 [details]
Contains all of dmesg starting from boot.

Comment 2 ykku16dev2 2016-09-28 20:11:23 UTC

I can make the problem go away if I do one of the following:
1.  Limit core C-state to C2.
2.  Limit package C-state to C2.

Comment 3 ykku16dev2 2016-09-29 19:51:45 UTC

Created attachment 126889 [details] [review]
Workaround using PM QoS

Would it be reasonable to use this patch as a workaround?

For HDMI use cases, this patch reduces battery life by about an hour, based on experiments on a specific device.  At the same time, for these tethered use cases, would it be reasonable to assume that the user would be more willing to plug the device into an outlet?

Comment 4 ykku16dev2 2016-09-30 17:20:33 UTC

Created attachment 126911 [details]
dmesg captured on BSW RVP

Comment 5 ykku16dev2 2016-09-30 17:21:26 UTC

Created attachment 126912 [details]
Contains all of dmesg captured on BSW RVP

Comment 6 ykku16dev2 2016-09-30 17:22:17 UTC

Created attachment 126913 [details]
Output of intel_reg dump --all captured on BSW RVP

Comment 7 ykku16dev2 2016-09-30 17:36:24 UTC

Created attachment 126914 [details]
Output of /proc/cpuinfo for BSW RVP

Comment 8 ykku16dev2 2016-09-30 17:37:08 UTC

Created attachment 126915 [details]
Output of dmidecode for BSW RVP

Comment 9 ykku16dev2 2016-09-30 17:40:56 UTC

I was able to reproduce the problem on a BSW RVP using drm-intel-nightly kernel built on 9/28/2016.  The dmesg, /var/log/messages, and "intel_reg dump --all" logs are here:
https://bugs.freedesktop.org/attachment.cgi?id=126911
https://bugs.freedesktop.org/attachment.cgi?id=126912
https://bugs.freedesktop.org/attachment.cgi?id=126913

The output of /proc/cpuinfo is here:
https://bugs.freedesktop.org/attachment.cgi?id=126914

The output of dmidecode is here:
https://bugs.freedesktop.org/attachment.cgi?id=126915

The output of "uname -a" is:
Linux localhost 4.8.0-rc8-02558-g6fa2afd-dirty #4 SMP PREEMPT Fri Sep 30 08:59:04 PDT 2016 x86_64 Genuine Intel(R) CPU @ 1.52GHz GenuineIntel GNU/Linux

Comment 10 Ville Syrjala 2016-10-04 14:37:17 UTC

Hmm. So it seems to die while in steady state, or at least no plane changes visible in the logs around the underruns. Hard to be 100% sure though as I'm not sure we log every plane change, and also underruns don't actually generate an interrupt, so we might detect them after the fact.

If we assume that no planes really change when the underrun happens, then I guess we're just too late in fetching the data. It's a little troubling that we are hitting that even with PM2 watermarks, so PM5/DDR DVFS (which are the usual latency causing suspects) aren't supposedly even enabled.

Can you grab the output of intel_watermark from intel-gpu-tools, just after hitting the underrun? I want to double check that are in fact programming the watermarks correctly.

Comment 11 Ville Syrjala 2016-10-04 14:44:55 UTC

Created attachment 126993 [details] [review]
[PATCH] drm/i915: Set DDL to 0

I guess we could try to tighten the display engine's deadlines a bit. This should force the deadline to be 0 always, so in theory the system agent shouldn't be allowed to delay things at all.

If that doesn't help, you could also try to play around with the memory latency values. Based on your logs I think you'd have to go up to 15 usec to make any change to the watermarks. So something like this:
echo '15 0 0' > /sys/kernel/debug/dri/0/i915_pri_wm_latency
and then force a modeset on all pipes (eg. xset dpms force off; xset dpms force on).

Comment 12 ykku16dev2 2016-10-04 22:26:36 UTC

Created attachment 127007 [details]
Output of intel_watermark captured on BSW RVP after problem occurred

(In reply to comment #10)

Yes, the problem occurred while in steady state, at least from the user's perspective.  I started the video playback, selected 1080p resolution, selected full screen mode, and then just let it sit there.  I did not touch the keyboard or the mouse while playback continued.

As you requested, here's the output of intel_watermark captured on the BSW RVP after the problem occurred.  At the beginning of this particular test, I had moved the cursor to the primary display and left it there untouched for the duration of the test.

Comment 13 ykku16dev2 2016-10-04 23:25:36 UTC

Created attachment 127010 [details]
Output of intel_watermark on BSW RVP with DDL set to 0 patch

(In reply to comment #11, part 1)

I applied your patch 0001-drm-i915-Set-DDL-to-0.patch to drm-intel-nightly (sync'ed 9/28/2016) and re-tested.  The patch did not help.  I attached the output of intel_watermark to show that the patch took effect.  The output was captured after the problem occurred.

Comment 14 ykku16dev2 2016-10-05 02:27:24 UTC

Created attachment 127012 [details] [review]
Experimental patch for setting DDL to 0 and WM latency to 45us

Comment 15 ykku16dev2 2016-10-05 02:28:50 UTC

Created attachment 127013 [details]
Output of intel_watermark with DDL=0 and WM latency=45us

Comment 16 ykku16dev2 2016-10-05 02:51:18 UTC

(In reply to comment #11, part 2)

I experimented with latencies by changing the values defined inside vlv_setup_wm_latency().  Here's my patch (which also includes your DDL=0 change):
https://bugs.freedesktop.org/attachment.cgi?id=127012

In this patch, I bumped up all of the values to 45us so that the watermarks are very close to the maximum.  In theory, we should be pretty busy trying to keep the FIFO close to being full.  Yet, I was still able to reproduce the problem.

The output of intel_watermark for this experiment is here:
https://bugs.freedesktop.org/attachment.cgi?id=127013

Comment 17 Ville Syrjala 2016-10-05 07:05:51 UTC

Hmm. Even though the cursor wasn't probably a factor, let's try and make sure
by setting a tight watermark for it even when it's disabled:

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 7f1748a1e614..92611d6aaea3 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -965,6 +965,9 @@ static uint16_t vlv_compute_wm_level(struct intel_plane *plane,
        if (dev_priv->wm.pri_latency[level] == 0)
                return USHRT_MAX;
 
+       if (plane->base.type == DRM_PLANE_TYPE_CURSOR)
+               return 63;
+
        if (!state->base.visible)
                return 0;
 

Another interesting thing we could check is whether the problem happens when the primary display is using pipe B as opposed to pipe A. The hardware has some warts where the pipes have some invalid linkages between them, so by using another pipe for the primary display we might be able to avoid some of them.
So something like 'xrandr --output eDP1 --crtc 1' should do it.

Comment 18 ykku16dev2 2016-10-06 04:20:03 UTC

Created attachment 127043 [details] [review]
Experimental patch for DDL=0, latency=45us, and cursor WM=63

Comment 19 ykku16dev2 2016-10-06 04:21:49 UTC

Created attachment 127044 [details]
Output of intel_watermark for DDL=0, latency=45us, and cursor WM=63

Comment 20 ykku16dev2 2016-10-06 04:32:44 UTC

(In reply to comment #17, part 1)

I tested the cursor WM change that you suggested, but the problem is still reproducible.  The patch that I tested is here:
https://bugs.freedesktop.org/attachment.cgi?id=127043

Note that this patch contains one additional change needed for your cursor WM change to take effect.  This patch also contains the DDL=0 and latency=45us changes from the previous experiment.

The output of intel_watermark for this experiment is here:
https://bugs.freedesktop.org/attachment.cgi?id=127044

Comment 21 ykku16dev2 2016-10-06 05:33:18 UTC

(In reply to comment #17, part 2)

Regarding the pipe B experiment, could you possibly suggest a change in kernel-space that will make the switch from pipe A to pipe B?  My filesystem currently doesn't have xrandr, and I'm having trouble building it.

Comment 22 Ville Syrjala 2016-10-06 07:50:08 UTC

(In reply to Yu Kang Ku from comment #21)
> (In reply to comment #17, part 2)
> 
> Regarding the pipe B experiment, could you possibly suggest a change in
> kernel-space that will make the switch from pipe A to pipe B?  My filesystem
> currently doesn't have xrandr, and I'm having trouble building it.

I don't think there's a trivial way to do it in the kernel, since it would mean we'd have to somehow swap the crtcs without affecting their indexes on the list of crtcs (since I think a bunch of stuff may assume that index==pipe), or we'd have to find out all the places that have such assumptions and deal with them.

Comment 23 ykku16dev2 2016-10-07 16:24:40 UTC

Created attachment 127107 [details]
Output of intel_watermark in dock mode

Comment 24 ykku16dev2 2016-10-07 16:43:24 UTC

For the pipe B experiment, I'm running into some hurdles that will take some more time to fix.

Meanwhile, I did the following experiment which might indicate that the problem is isolated to pipe C.  I closed the lid so that the primary display would go away while the external HDMI display would continue with the video playback.  This is basically dock mode.  Yet, the problem is still reproducible.  The output of intel_watermark is here:
https://bugs.freedesktop.org/attachment.cgi?id=127107

This experiment was done using drm-intel-nightly with all of the suggestions you have recommended so far (DDL=0, latency=45us, and cursor WM=63).  However, the experiment was done on a production device on which dock mode was functional.  I was not able to "close the lid" on the BSW RVP.  I found the switch on the BSW RVP for "closing the lid", but dock mode wasn't working for some reason.

What do you think of this dock-mode experiment?  Do you think it's still worthwhile to continue with the pipe-B experiment?

Comment 25 Ville Syrjala 2016-10-07 16:56:31 UTC

(In reply to Yu Kang Ku from comment #24)
> For the pipe B experiment, I'm running into some hurdles that will take some
> more time to fix.
> 
> Meanwhile, I did the following experiment which might indicate that the
> problem is isolated to pipe C.  I closed the lid so that the primary display
> would go away while the external HDMI display would continue with the video
> playback.  This is basically dock mode.  Yet, the problem is still
> reproducible.  The output of intel_watermark is here:
> https://bugs.freedesktop.org/attachment.cgi?id=127107

That's not a bit too different since it's gone and enabled PM5 and DDR DVFS. So you should limit it to PM2 to make sure we're doing apples to apples comparison.
Eg.:
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -946,7 +946,7 @@ static void vlv_setup_wm_latency(struct drm_device *dev)
 
        dev_priv->wm.max_level = VLV_WM_LEVEL_PM2;
 
-       if (IS_CHERRYVIEW(dev_priv)) {
+       if (0 && IS_CHERRYVIEW(dev_priv)) {
                dev_priv->wm.pri_latency[VLV_WM_LEVEL_PM5] = 12;
                dev_priv->wm.pri_latency[VLV_WM_LEVEL_DDR_DVFS] = 33;


Another thing you could try is enable trickle feed. That shouldn't change much though as it's essentially just always forcing the watermarks to 8 or something like that. But worth a shot anyway I suppose:

--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -1070,7 +1070,7 @@ static void vlv_init_display_clock_gating(struct drm_i915_private *dev_priv)
        /*
         * Disable trickle feed and enable pnd deadline calculation
         */
-       I915_WRITE(MI_ARB_VLV, MI_ARB_DISPLAY_TRICKLE_FEED_DISABLE);
+       I915_WRITE(MI_ARB_VLV, 0);
        I915_WRITE(CBR1_VLV, 0);
 
        WARN_ON(dev_priv->rawclk_freq == 0);

Comment 26 ykku16dev2 2016-10-07 20:29:55 UTC

Created attachment 127126 [details]
Output of intel_watermark with PM2 limit

(In reply to Ville Syrjala from comment #25)
> That's not a bit too different since it's gone and enabled PM5 and DDR DVFS.
> So you should limit it to PM2 to make sure we're doing apples to apples
> comparison.
> Eg.:
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -946,7 +946,7 @@ static void vlv_setup_wm_latency(struct drm_device *dev)
>  
>         dev_priv->wm.max_level = VLV_WM_LEVEL_PM2;
>  
> -       if (IS_CHERRYVIEW(dev_priv)) {
> +       if (0 && IS_CHERRYVIEW(dev_priv)) {
>                 dev_priv->wm.pri_latency[VLV_WM_LEVEL_PM5] = 12;
>                 dev_priv->wm.pri_latency[VLV_WM_LEVEL_DDR_DVFS] = 33;
> 
> 

I applied this change and repeated the dock-mode experiment.  The problem is still reproducible.  Attached is the output of intel_watermark showing that your change took effect.

Comment 27 ykku16dev2 2016-10-07 21:36:22 UTC

Created attachment 127130 [details]
Output of intel_watermark with trickle feed

(In reply to Ville Syrjala from comment #25, part 2)

> Another thing you could try is enable trickle feed. That shouldn't change
> much though as it's essentially just always forcing the watermarks to 8 or
> something like that. But worth a shot anyway I suppose:
> 
> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
> @@ -1070,7 +1070,7 @@ static void vlv_init_display_clock_gating(struct
> drm_i915_private *dev_priv)
>         /*
>          * Disable trickle feed and enable pnd deadline calculation
>          */
> -       I915_WRITE(MI_ARB_VLV, MI_ARB_DISPLAY_TRICKLE_FEED_DISABLE);
> +       I915_WRITE(MI_ARB_VLV, 0);
>         I915_WRITE(CBR1_VLV, 0);
>  
>         WARN_ON(dev_priv->rawclk_freq == 0);

I repeated the dock-mode experiment, adding this trickle-feed change along with the PM2-limit change.  The trickle-feed change did not help.  The problem is still reproducible.  Attached is the output of intel_watermark showing that the trickle change took effect.

Comment 28 ykku16dev2 2016-10-10 01:13:35 UTC

Created attachment 127159 [details]
Output of intel_watermark using pipe B

Comment 29 ykku16dev2 2016-10-10 01:33:50 UTC

(In reply to Yu Kang Ku from comment #24)
> For the pipe B experiment, I'm running into some hurdles that will take some
> more time to fix.

I was finally able to proceed with the pipe B experiment.  Unfortunately, the problem is still reproducible.  The output of intel_watermark is here: https://bugs.freedesktop.org/attachment.cgi?id=127159.

This experiment was done on the BSW RVP.  The build that I tested included the changes for DDL=0, latency=45us, and cursor WM=63.

Comment 30 ykku16dev2 2016-10-14 16:31:59 UTC

In intel_display.c, there is one difference between valleyview_set_cdclk() and cherryview_set_cdclk() that seems rather interesting.  The following snippet is in valleyview_set_cdclk() but NOT in cherryview_set_cdclk():

        /* adjust self-refresh exit latency value */
        val = vlv_bunit_read(dev_priv, BUNIT_REG_BISOC);
        val &= ~0x7f;

        /*
         * For high bandwidth configs, we set a higher latency in the bunit
         * so that the core display fetch happens in time to avoid underruns.
         */
        if (cdclk == 400000)
                val |= 4500 / 250; /* 4.5 usec */
        else
                val |= 3000 / 250; /* 3.0 usec */
        vlv_bunit_write(dev_priv, BUNIT_REG_BISOC, val);


Is this snippet applicable to BSW?

Comment 31 Ville Syrjala 2016-10-14 16:56:09 UTC

(In reply to Yu Kang Ku from comment #30)
> In intel_display.c, there is one difference between valleyview_set_cdclk()
> and cherryview_set_cdclk() that seems rather interesting.  The following
> snippet is in valleyview_set_cdclk() but NOT in cherryview_set_cdclk():
> 
>         /* adjust self-refresh exit latency value */
>         val = vlv_bunit_read(dev_priv, BUNIT_REG_BISOC);
>         val &= ~0x7f;
> 
>         /*
>          * For high bandwidth configs, we set a higher latency in the bunit
>          * so that the core display fetch happens in time to avoid underruns.
>          */
>         if (cdclk == 400000)
>                 val |= 4500 / 250; /* 4.5 usec */
>         else
>                 val |= 3000 / 250; /* 3.0 usec */
>         vlv_bunit_write(dev_priv, BUNIT_REG_BISOC, val);
> 
> 
> Is this snippet applicable to BSW?

The register does exist there as well. It might be something play around with. You can do so with the intel_reg tool easily, eg. "intel_reg read bunit:0x11",
"intel_reg write bunit:0x11 <value>".

On my BSW the value is  bunit:0x00000011 (0x03:0x00000011): 0x05804816
I didn't check whether the Punit or someone else adjusts that value at runtime.

This is what the register contains
EXIT_SELF_REFRESH_LATENCY       6:0
Exit Self Refresh Latency: Required latency to Exit Self Refresh in 250ns increments. Default is 3uSec. PnP: This depends on SR exit+ data return for an urgent ISOC request when dram is in SR. But SR exit latency depends on PM opcode, setting it to h'24 i.e. 9us exit latency

RESERVED_2                      7:7

SCHEDULER_LATENCY               11:8
Scheduler Latency: Request latency that is considered as a Hi-Priority Request for ISOC requests. Value programmed has 250ns resolution. Default is 2uSec.

ENTER_SELF_REFRESH_DLY          17:12
Enter Self Refresh Delay: Number of 250ns pulses the Bunit waits before entering Self Refresh. Note: should not be set to less than 2 when dynamic SR is enabled, otherwise the system may become unresponsive.

SR_EXIT_SYNC_EN                 18:18
Set this bit will prevent the Bunit get into SR if BRAM is full and high priority requests blocked at badmit.

RESERVED_1                      21:19

ENTER_SELF_REFRESH_THRSH        31:22
Enter Self Refresh Threshold: Required request latency to enter self refresh. If the Bunit receives ISOC requests that have a required latency less than this value the Bunit will keep the Dunit out of Self Refresh.

Actually bits 31:22 look fairly interesting here. But perhaps start experimenting with bits 6:0 a bit instead, since that's what we do on VLV as well.

Comment 32 ykku16dev2 2016-10-15 06:43:52 UTC

(In reply to Ville Syrjala from comment #31)
> Actually bits 31:22 look fairly interesting here. But perhaps start
> experimenting with bits 6:0 a bit instead, since that's what we do on VLV as
> well.

I experimented with bits 6:0, but all of my experiments failed.  The following are the 5 different values I tried:
1.  val |= 3000 / 250; /* This is what VLV is using */
2.  val |= 0x16;       /* This is what you see on your BSW */
3.  val &= ~0x7f;      /* This is the minimum value just for sanity */
4.  val |= 0x1;        /* Another sanity check */
5.  val |= 0x7f;       /* This is the maximum value */

So, I've covered the normal values, as well as the extreme ones.  After every failure, I used intel_reg to verify that the B-unit register contained the correct value.

I then came across a BIOS setting on the BSW RVP that can disable dynamic self refresh altogether.  So, I disabled it, but I can still reproduce the problem.  Since it looks like self-refresh isn't a factor here, I did not bother with bits 31:22.

Comment 33 Ville Syrjala 2016-10-26 17:19:45 UTC

So what might be interesting is checking if we can reproduce the issue when driving HDMI out of port B with pipe A or B. Thus far you've just driven HDMI with port D/pipe C, no?

Assuming a standard RVP setup you would just need any old DP++ DP->HDMI dongle/cable and hook it up to the DP port instead of using the HDMI port on board.

Comment 34 ykku16dev2 2016-10-27 06:35:28 UTC

Created attachment 127557 [details]
Output of intel_watermark using pipe B/port B to drive HDMI monitor

Comment 35 ykku16dev2 2016-10-27 06:36:56 UTC

Created attachment 127558 [details]
Output of intel_reg dump using pipe B/port B to drive HDMI monitor

Comment 36 ykku16dev2 2016-10-27 06:55:53 UTC

(In reply to Ville Syrjala from comment #33)
> So what might be interesting is checking if we can reproduce the issue when
> driving HDMI out of port B with pipe A or B. Thus far you've just driven
> HDMI with port D/pipe C, no?

That's correct.  So far, we've been using port D/pipe C exclusively.

> Assuming a standard RVP setup you would just need any old DP++ DP->HDMI
> dongle/cable and hook it up to the DP port instead of using the HDMI port on
> board.

I ran the new experiment exactly as you described, using a DP->HDMI cable.  Unfortunately, the problem is still reproducible.  The output of intel_watermark is here: https://bugs.freedesktop.org/attachment.cgi?id=127557.  The output of "intel_reg dump --all" is here: https://bugs.freedesktop.org/attachment.cgi?id=127558.

These captured outputs will show that the new experiment is using port B/pipe B.

Just for the record, the experiment was done on a BSW RVP using drm-intel-nightly without any of the changes that you've previously recommended.

Comment 37 Ville Syrjala 2016-10-27 10:17:43 UTC

(In reply to Yu Kang Ku from comment #36)
> (In reply to Ville Syrjala from comment #33)
> > So what might be interesting is checking if we can reproduce the issue when
> > driving HDMI out of port B with pipe A or B. Thus far you've just driven
> > HDMI with port D/pipe C, no?
> 
> That's correct.  So far, we've been using port D/pipe C exclusively.
> 
> > Assuming a standard RVP setup you would just need any old DP++ DP->HDMI
> > dongle/cable and hook it up to the DP port instead of using the HDMI port on
> > board.
> 
> I ran the new experiment exactly as you described, using a DP->HDMI cable. 
> Unfortunately, the problem is still reproducible.  The output of
> intel_watermark is here:
> https://bugs.freedesktop.org/attachment.cgi?id=127557.  The output of
> "intel_reg dump --all" is here:
> https://bugs.freedesktop.org/attachment.cgi?id=127558.
> 
> These captured outputs will show that the new experiment is using port
> B/pipe B.
> 
> Just for the record, the experiment was done on a BSW RVP using
> drm-intel-nightly without any of the changes that you've previously
> recommended.

Can we repeat the failure with, say, only pipe B active + DDR DVFS disabled + PM5 disabled + cxsr/maxfifo disabled + trickle feed enabled?

Comment 38 ykku16dev2 2016-10-27 16:46:29 UTC

(In reply to Ville Syrjala from comment #37)
> Can we repeat the failure with, say, only pipe B active + DDR DVFS disabled
> + PM5 disabled + cxsr/maxfifo disabled + trickle feed enabled?

Yes, I will work on it.  However, I have a problem with the BSW RVP in terms of "closing the lid" to disable pipe A.  The switch on the BSW RVP doesn't work for me.  Do you happen to know what needs to be done, either hardware or software, to make it work?  Or do you have any recommendations for disabling pipe A using just software?  Thanks.

Comment 39 Ville Syrjala 2016-10-27 16:50:10 UTC

(In reply to Yu Kang Ku from comment #38)
> (In reply to Ville Syrjala from comment #37)
> > Can we repeat the failure with, say, only pipe B active + DDR DVFS disabled
> > + PM5 disabled + cxsr/maxfifo disabled + trickle feed enabled?
> 
> Yes, I will work on it.  However, I have a problem with the BSW RVP in terms
> of "closing the lid" to disable pipe A.  The switch on the BSW RVP doesn't
> work for me.  Do you happen to know what needs to be done, either hardware
> or software, to make it work?  Or do you have any recommendations for
> disabling pipe A using just software?  Thanks.

I did recommend xrandr already. That's about the only easy way to do it, assuming you're running X in the first place.

Comment 40 ykku16dev2 2016-10-31 07:51:42 UTC

Created attachment 127636 [details]
Output of intel_watermark using only port B/pipe B on BSW RVP

Comment 41 ykku16dev2 2016-10-31 07:52:49 UTC

Created attachment 127637 [details]
Output of intel_reg dump --all using only port B/pipe B on BSW RVP

Comment 42 ykku16dev2 2016-10-31 08:09:51 UTC

(In reply to Ville Syrjala from comment #39)
> (In reply to Yu Kang Ku from comment #38)
> > (In reply to Ville Syrjala from comment #37)
> > > Can we repeat the failure with, say, only pipe B active + DDR DVFS disabled
> > > + PM5 disabled + cxsr/maxfifo disabled + trickle feed enabled?

I was finally able to simulate "closing the lid" on the BSW RVP and carry out the experiment you requested.  Unfortunately, the problem is still reproducible.

The output of intel_watermark is here:
https://bugs.freedesktop.org/attachment.cgi?id=127636

The output of intel_reg dump --all is here:
https://bugs.freedesktop.org/attachment.cgi?id=127637

These outputs show that the conditions you specified have been met.

The following are the code changes I made to meet the conditions that you specified:

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 5d39ad2..35766b7 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -946,7 +946,7 @@ static void vlv_setup_wm_latency(struct drm_device *dev)
 
        dev_priv->wm.max_level = VLV_WM_LEVEL_PM2;
 
-       if (IS_CHERRYVIEW(dev_priv)) {
+       if (0 && IS_CHERRYVIEW(dev_priv)) {
                dev_priv->wm.pri_latency[VLV_WM_LEVEL_PM5] = 12;
                dev_priv->wm.pri_latency[VLV_WM_LEVEL_DDR_DVFS] = 33;
 
@@ -1288,7 +1288,7 @@ static void vlv_merge_wm(struct drm_device *dev,
        int num_active_crtcs = 0;
 
        wm->level = to_i915(dev)->wm.max_level;
-       wm->cxsr = true;
+       wm->cxsr = false;
 
        for_each_intel_crtc(dev, crtc) {
                const struct vlv_wm_state *wm_state = &crtc->wm_state;

diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 6c11168..10d6eac 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -1070,7 +1070,7 @@ static void vlv_init_display_clock_gating(struct drm_i915_private *dev_priv)
        /*
         * Disable trickle feed and enable pnd deadline calculation
         */
-       I915_WRITE(MI_ARB_VLV, MI_ARB_DISPLAY_TRICKLE_FEED_DISABLE);
+       I915_WRITE(MI_ARB_VLV, 0);
        I915_WRITE(CBR1_VLV, 0);
 
        WARN_ON(dev_priv->rawclk_freq == 0);

Comment 43 Jani Saarinen 2016-12-09 10:12:06 UTC

Can you test with latest kernel is issue still exists?

Comment 44 Jani Saarinen 2016-12-13 07:53:01 UTC

ping

Comment 45 ykku16dev2 2016-12-13 16:39:48 UTC

Yes, I will work on it.

Comment 46 ykku16dev2 2016-12-16 19:45:57 UTC

The problem is still reproducible with the latest drm-intel-nightly, sync'ed 12/15/2016.  The specific commit that I built is 639f10d1159e87cac2f85769dcd081520b904f56.

Comment 47 Ville Syrjala 2017-01-17 12:07:19 UTC

While somewhat unlikely given all our previous findings, we should still check to see if the atomic watermark work would help here.

It's not yet merged to drm-tip, but it's available here:
git://github.com/vsyrjala/linux.git vlv_atomic_wm_4

Comment 48 ykku16dev2 2017-01-19 00:54:31 UTC

(In reply to Ville Syrjala from comment #47)
> While somewhat unlikely given all our previous findings, we should still
> check to see if the atomic watermark work would help here.
> 
> It's not yet merged to drm-tip, but it's available here:
> git://github.com/vsyrjala/linux.git vlv_atomic_wm_4

The problem is still reproducible.  The kernel I built is 4.9.0-01730-g67e15a7-dirty.  The commit ID portion matches the tip of vlv_atomic_wm_4.

Comment 49 dog 2017-02-08 14:01:17 UTC

Are you using a 3rd party DP->HDMI dongle for the failing setup?  Also, is the failing system using LSPCON for the HDMI connection?  I see "BSW-based machine" but no details such as this.

Comment 50 ykku16dev2 2017-02-08 18:47:39 UTC

(In reply to dog from comment #49)
> Are you using a 3rd party DP->HDMI dongle for the failing setup?  Also, is
> the failing system using LSPCON for the HDMI connection?  I see "BSW-based
> machine" but no details such as this.

No, I'm not using a dongle.  The HDMI connection is driven by BSW's pipe C.  The BSW RVP is one such machine on which the problem is reproducible (see comment #9).

Comment 51 ykku16dev2 2017-02-22 04:15:07 UTC

@Ville, I would like to try a test in which memory read accesses made by the display controller is given higher priority over memory accesses made by other components of the GPU.

At the same time, I would also like to prevent these other components of the GPU from making burst accesses to memory.

Can you suggest register settings that will achieve these 2 things?

Comment 52 Ville Syrjala 2017-02-22 14:27:55 UTC

There are a lot of knobs in the Bunit that might be of interest. I did a quick trawl through the registers and came up with the following list:

BARBCTL0 0x3
BARBCTL2 0x4
BARBCTL3 0x5
BARBCTL4 0x6
 for each agent
 [5:0] arbiter weight
 [7:6] reserved

COSCAT 0x12
 for each agent 
 [1:0] COS category (0x0=normal, 0x1=isoc, others=reserved)

BALIMIT0 0xb
BALIMIT1 0xc
BALIMIT2 0xd
BALIMIT3 0xe
 for each agent
 [5:0] limit if some sort
 [7:6] reserved

BFLWT 0x14
 [5:0] read weights
 [13:8] write weights

BISOCWT 0x16
 [5:0] non-isoc weights
 [13:8] isoc weights
 [31] enable isoc weights
 
There are 16 agents possible it seems (agent0 to agent15). I get the
impression that display is agent3, but there was one note somewhere
about display possibly being agent2. But by default agent2 is not isoc whereas agent3 is, so that makes me think agent3 is more likely.

Comment 53 ykku16dev2 2017-02-26 00:34:40 UTC

(In reply to Ville Syrjala from comment #52)
> BALIMIT0 0xb
> BALIMIT1 0xc
> BALIMIT2 0xd
> BALIMIT3 0xe
>  for each agent
>  [5:0] limit if some sort
>  [7:6] reserved

I was able to make the problem go away by configuring the following registers:
BALIMIT0 = 0x1f1f1f1f
BALIMIT1 = 0x1f1f1f1f
BALIMIT2 = 0x1f1f1f1f
BALIMIT3 = 0x1f1f1f1f

The original values for these registers are:
BALIMIT0 = 0x1f3f3f3f
BALIMIT1 = 0x3f3f3f3f
BALIMIT2 = 0x3f3f3f3f
BALIMIT3 = 0x3f3f3f3f


> 
> There are 16 agents possible it seems (agent0 to agent15). I get the
> impression that display is agent3, but there was one note somewhere
> about display possibly being agent2. But by default agent2 is not isoc
> whereas agent3 is, so that makes me think agent3 is more likely.

Based on the original values of the above registers, I think you're right about the display being agent3.  The register descriptions mention specifically that the display needs to use a lower value.

Overall, this experiment seems to suggest that our problem is related to bandwidth rather than C-state latency.  What do you think?

Comment 54 Ricardo 2017-05-09 16:31:19 UTC

Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included

Comment 55 Jani Saarinen 2017-06-06 09:34:26 UTC

Yu Kang, please propose how to get this moving. What is info needed still?

Comment 56 Jani Saarinen 2017-06-08 06:43:14 UTC

HI, Is this really blocker if this is not moving. Should we decrease priority?

Comment 57 Ricardo 2017-07-07 18:41:48 UTC

(In reply to Jani Saarinen from comment #56)
> HI, Is this really blocker if this is not moving. Should we decrease
> priority?

If we do not receive further information from submitter the bug will closed as invalid, please provide information requested and change status to REOPENED

Otherwise Resolve

Comment 58 Ricardo 2017-07-07 18:42:27 UTC

Also adjusting priority until new information is provided

Comment 59 Elizabeth 2017-08-11 20:30:26 UTC

(In reply to Ricardo from comment #57)
> (In reply to Jani Saarinen from comment #56)
> > HI, Is this really blocker if this is not moving. Should we decrease
> > priority?
> 
> If we do not receive further information from submitter the bug will closed
> as invalid, please provide information requested and change status to
> REOPENED
> 
> Otherwise Resolve
Good afternoon, I'm closing this bug due the absent of answer from submitter. If problem still exist with latest kernel versions please file a new bug with HW and SW information, fresh logs and reference to this bug. Thank you.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.