Bug 92084

Summary: [BDW][drm:gen8_irq_handler [i915]] *ERROR* The master control interrupt lied (SDE)!
Product: DRI Reporter: wendy.wang
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: benoitg, c.affolter, cs_gon, darkbasic, intel-gfx-bugs, jairo.daniel.miramontes.caton, jwboyer, morphuspam, nbowler, nmcveity, n.schnelle, sickvolo
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg info with drm.debug=0xe
none
dmesg (without drm.debug)
none
dmesg reproducing the problem on bdw none

Description wendy.wang 2015-09-23 07:32:07 UTC
Test Environment:

BDW-U
4.3.0-rc2_drm-intel-nightly_be4c33_20150923+

Reproduce steps:
1.boot up system and grep error message in dmesg
2. Will see below ERROR message.
[    1.260036] [drm:gen8_irq_handler [i915]] *ERROR* The master control interrupt lied (SDE)!

Dmesg log attached.
Comment 1 Jani Nikula 2015-09-23 08:25:07 UTC
(In reply to wendy.wang from comment #0)
> Dmesg log attached.

There's no attachment.
Comment 2 ye.tian 2015-09-23 08:28:58 UTC
Created attachment 118408 [details]
dmesg info with drm.debug=0xe
Comment 3 ye.tian 2015-09-23 08:42:38 UTC
This bug also exists on the latest drm-intel-fixes and drm-intel-next-queued branch.
Comment 4 Jani Nikula 2015-09-23 13:54:12 UTC
Hmm, I'm wondering if we're handling this correctly, and if that could cause the issue:

"For each bit, the IIR can store a second pending interrupt if two or more of the same interrupt conditions occur before the first condition is cleared. Upon clearing the interrupt, the IIR bit will momentarily go low, then return high to indicate there is another interrupt pending. Only the rising edge of the PCH Display interrupt will cause the North Display IIR (DEIIR) PCH Display Interrupt event bit to be set, so all PCH Display Interrupts, including back to back interrupts, must be cleared here before a new PCH Display Interrupt can cause the DEIIR to be set."
Comment 5 wendy.wang 2015-09-24 07:35:04 UTC
Good commit:
commit b42fa27abff5970649ff07b0ce1691f6464097f3
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jul 8 21:48:11 2015 +0200

    drm-intel-nightly: 2015y-07m-08d-19h-47m-26s UTC integration manifest


Bad commit:
commit 5b4a647fe39cf42753761b7d4ee20d695eec589c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jul 9 21:57:35 2015 +0200

    drm-intel-nightly: 2015y-07m-09d-19h-56m-44s UTC integration manifest
Comment 6 Mika Kuoppala 2015-09-25 15:08:53 UTC
On my bdw after this commit the messages started to appear:

commit aaf5ec2e51ab1d9c5e962b4728a1107ed3ff7a3e
Author: Sonika Jindal <sonika.jindal@intel.com>
Date:   Wed Jul 8 17:07:47 2015 +0530

    drm/i915: Handle HPD when it has actually occurred


But when I just tried with latest nightly, I couldn't reproduce the any dmesg errors.
Comment 7 darkbasic 2015-09-28 09:34:38 UTC
Created attachment 118478 [details]
dmesg (without drm.debug)

I was going to report the very same issue for my XPS 13 2015 9343 (Broadwell) when I saw this.
I attached dmesg (without drm.debug). Kernel is 4.3.0-rc3-mainline.
Comment 8 Jani Nikula 2015-10-01 13:57:28 UTC
(In reply to Mika Kuoppala from comment #6)
> But when I just tried with latest nightly, I couldn't reproduce the any
> dmesg errors.

I still see this on BDW.
Comment 9 Jani Nikula 2015-10-01 14:13:11 UTC
Created attachment 118559 [details]
dmesg reproducing the problem on bdw

Curiously the errors are next to DP aux traffic.
Comment 10 Ville Syrjala 2015-10-01 15:38:40 UTC
Random idea:

--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2345,6 +2345,7 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
                u32 pch_iir = I915_READ(SDEIIR);
                if (pch_iir) {
                        I915_WRITE(SDEIIR, pch_iir);
+                       POSTING_READ(SDEIIR);
                        ret = IRQ_HANDLED;
Comment 11 Jani Nikula 2015-10-02 10:17:36 UTC
(In reply to Ville Syrjala from comment #10)
> +                       POSTING_READ(SDEIIR);

Does not help.
Comment 12 Jani Nikula 2015-10-02 11:41:48 UTC
However this helps. We're missing something.

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 76bd40e13391..0d524034abd7 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1827,6 +1827,9 @@ static void ibx_hpd_irq_handler(struct drm_device *dev, u32 hotplug_trigger,
        dig_hotplug_reg = I915_READ(PCH_PORT_HOTPLUG);
        I915_WRITE(PCH_PORT_HOTPLUG, dig_hotplug_reg);
 
+       if (!hotplug_trigger)
+               return;
+
        intel_get_hpd_pins(&pin_mask, &long_mask, hotplug_trigger,
                           dig_hotplug_reg, hpd,
                           pch_port_hotplug_long_detect);
@@ -1934,8 +1937,7 @@ static void cpt_irq_handler(struct drm_device *dev, u32 pch_iir)
        int pipe;
        u32 hotplug_trigger = pch_iir & SDE_HOTPLUG_MASK_CPT;
 
-       if (hotplug_trigger)
-               ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_cpt);
+       ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_cpt);
 
        if (pch_iir & SDE_AUDIO_POWER_MASK_CPT) {
                int port = ffs((pch_iir & SDE_AUDIO_POWER_MASK_CPT) >>
Comment 13 Ville Syrjala 2015-10-02 16:27:43 UTC
(In reply to Jani Nikula from comment #12)
> However this helps. We're missing something.
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c
> b/drivers/gpu/drm/i915/i915_irq.c
> index 76bd40e13391..0d524034abd7 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1827,6 +1827,9 @@ static void ibx_hpd_irq_handler(struct drm_device
> *dev, u32 hotplug_trigger,
>         dig_hotplug_reg = I915_READ(PCH_PORT_HOTPLUG);
>         I915_WRITE(PCH_PORT_HOTPLUG, dig_hotplug_reg);

Is the read alone enough, or do you need the write too?

>  
> +       if (!hotplug_trigger)
> +               return;
> +
>         intel_get_hpd_pins(&pin_mask, &long_mask, hotplug_trigger,
>                            dig_hotplug_reg, hpd,
>                            pch_port_hotplug_long_detect);
> @@ -1934,8 +1937,7 @@ static void cpt_irq_handler(struct drm_device *dev,
> u32 pch_iir)
>         int pipe;
>         u32 hotplug_trigger = pch_iir & SDE_HOTPLUG_MASK_CPT;
>  
> -       if (hotplug_trigger)
> -               ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_cpt);
> +       ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_cpt);
>  
>         if (pch_iir & SDE_AUDIO_POWER_MASK_CPT) {
>                 int port = ffs((pch_iir & SDE_AUDIO_POWER_MASK_CPT) >>
Comment 14 Sonika 2015-10-05 05:33:19 UTC
(In reply to Jani Nikula from comment #12)
> However this helps. We're missing something.
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c
> b/drivers/gpu/drm/i915/i915_irq.c
> index 76bd40e13391..0d524034abd7 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1827,6 +1827,9 @@ static void ibx_hpd_irq_handler(struct drm_device
> *dev, u32 hotplug_trigger,
>         dig_hotplug_reg = I915_READ(PCH_PORT_HOTPLUG);
>         I915_WRITE(PCH_PORT_HOTPLUG, dig_hotplug_reg);
>  
> +       if (!hotplug_trigger)
> +               return;
> +
Oh, but the whole point of the patch ( drm/i915: Handle HPD when it has actually occurred) was to disallaow writing to PCH_PORT_HOTPLUG register when the HPD did not occur.
And this shows stable HPD with SKL for me and was inline with other interrupts handling.

Do these "[drm:gen8_irq_handler [i915]] *ERROR* The master control interrupt lied (SDE)!" messages have any effect on HPD?

>         intel_get_hpd_pins(&pin_mask, &long_mask, hotplug_trigger,
>                            dig_hotplug_reg, hpd,
>                            pch_port_hotplug_long_detect);
> @@ -1934,8 +1937,7 @@ static void cpt_irq_handler(struct drm_device *dev,
> u32 pch_iir)
>         int pipe;
>         u32 hotplug_trigger = pch_iir & SDE_HOTPLUG_MASK_CPT;
>  
> -       if (hotplug_trigger)
> -               ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_cpt);
> +       ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_cpt);
>  
>         if (pch_iir & SDE_AUDIO_POWER_MASK_CPT) {
>                 int port = ffs((pch_iir & SDE_AUDIO_POWER_MASK_CPT) >>
Comment 15 Jani Nikula 2015-10-05 11:26:42 UTC
(In reply to Ville Syrjala from comment #13)
> (In reply to Jani Nikula from comment #12)
> > However this helps. We're missing something.
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c
> > b/drivers/gpu/drm/i915/i915_irq.c
> > index 76bd40e13391..0d524034abd7 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -1827,6 +1827,9 @@ static void ibx_hpd_irq_handler(struct drm_device
> > *dev, u32 hotplug_trigger,
> >         dig_hotplug_reg = I915_READ(PCH_PORT_HOTPLUG);
> >         I915_WRITE(PCH_PORT_HOTPLUG, dig_hotplug_reg);
> 
> Is the read alone enough, or do you need the write too?

Moving the write below the !hotplug_trigger check brings the problem back, i.e. the write is also needed.

> > +       if (!hotplug_trigger)
> > +               return;
> > +
> >         intel_get_hpd_pins(&pin_mask, &long_mask, hotplug_trigger,
> >                            dig_hotplug_reg, hpd,
> >                            pch_port_hotplug_long_detect);
> > @@ -1934,8 +1937,7 @@ static void cpt_irq_handler(struct drm_device *dev,
> > u32 pch_iir)
> >         int pipe;
> >         u32 hotplug_trigger = pch_iir & SDE_HOTPLUG_MASK_CPT;
> >  
> > -       if (hotplug_trigger)
> > -               ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_cpt);
> > +       ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_cpt);
> >  
> >         if (pch_iir & SDE_AUDIO_POWER_MASK_CPT) {
> >                 int port = ffs((pch_iir & SDE_AUDIO_POWER_MASK_CPT) >>
Comment 16 Ville Syrjala 2015-10-05 12:04:57 UTC
(In reply to Jani Nikula from comment #15)
> (In reply to Ville Syrjala from comment #13)
> > (In reply to Jani Nikula from comment #12)
> > > However this helps. We're missing something.
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_irq.c
> > > b/drivers/gpu/drm/i915/i915_irq.c
> > > index 76bd40e13391..0d524034abd7 100644
> > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > @@ -1827,6 +1827,9 @@ static void ibx_hpd_irq_handler(struct drm_device
> > > *dev, u32 hotplug_trigger,
> > >         dig_hotplug_reg = I915_READ(PCH_PORT_HOTPLUG);
> > >         I915_WRITE(PCH_PORT_HOTPLUG, dig_hotplug_reg);
> > 
> > Is the read alone enough, or do you need the write too?
> 
> Moving the write below the !hotplug_trigger check brings the problem back,
> i.e. the write is also needed.

Are the status bits actually showing long/short pulses when this happens?

Maybe we can just do something like this:

dig_hotplug_reg = I915_READ(PCH_PORT_HOTPLUG);
if (!hotplug_trigger)
    dig_hotplug_reg &= ~(*_HOTPLUG_STATUS_MASK);
I915_WRITE(PCH_PORT_HOTPLUG, dig_hotplug_reg
if (!hotplug_trigger)
    return;
Comment 18 Jani Nikula 2015-10-14 07:12:15 UTC
*** Bug 92454 has been marked as a duplicate of this bug. ***
Comment 19 Daniel Vetter 2015-10-23 13:27:48 UTC
commit 97e5ed1111dcc5300a0f59a55248cd243937a8ab
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Oct 23 10:56:12 2015 +0200

    drm/i915: shut up gen8+ SDE irq dmesg noise
Comment 20 Jani Nikula 2015-10-30 14:38:06 UTC
This also gets rid of the messages, and does *not* print the ### debug msg, i.e. the status bits are clear.


diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 6e0a5683bbdc..7716181473dc 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1825,7 +1825,21 @@ static void ibx_hpd_irq_handler(struct drm_device *dev, u32 hotplug_trigger,
 	u32 dig_hotplug_reg, pin_mask = 0, long_mask = 0;
 
 	dig_hotplug_reg = I915_READ(PCH_PORT_HOTPLUG);
+	if (!hotplug_trigger) {
+		u32 mask = PORTA_HOTPLUG_STATUS_MASK |
+			PORTD_HOTPLUG_STATUS_MASK |
+			PORTC_HOTPLUG_STATUS_MASK |
+			PORTB_HOTPLUG_STATUS_MASK;
+
+		if (dig_hotplug_reg & mask)
+			DRM_DEBUG_KMS("### %08x\n", dig_hotplug_reg & mask);
+
+		dig_hotplug_reg &= ~mask;
+	}
+	
 	I915_WRITE(PCH_PORT_HOTPLUG, dig_hotplug_reg);
+	if (!hotplug_trigger)
+		return;
 
 	intel_get_hpd_pins(&pin_mask, &long_mask, hotplug_trigger,
 			   dig_hotplug_reg, hpd,
@@ -1840,8 +1854,7 @@ static void ibx_irq_handler(struct drm_device *dev, u32 pch_iir)
 	int pipe;
 	u32 hotplug_trigger = pch_iir & SDE_HOTPLUG_MASK;
 
-	if (hotplug_trigger)
-		ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_ibx);
+	ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_ibx);
 
 	if (pch_iir & SDE_AUDIO_POWER_MASK) {
 		int port = ffs((pch_iir & SDE_AUDIO_POWER_MASK) >>
@@ -1934,8 +1947,7 @@ static void cpt_irq_handler(struct drm_device *dev, u32 pch_iir)
 	int pipe;
 	u32 hotplug_trigger = pch_iir & SDE_HOTPLUG_MASK_CPT;
 
-	if (hotplug_trigger)
-		ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_cpt);
+	ibx_hpd_irq_handler(dev, hotplug_trigger, hpd_cpt);
 
 	if (pch_iir & SDE_AUDIO_POWER_MASK_CPT) {
 		int port = ffs((pch_iir & SDE_AUDIO_POWER_MASK_CPT) >>
Comment 21 Jani Nikula 2015-11-26 14:29:15 UTC
commit 6a39d7c986be4fd18eb019e9cdbf774ec36c9f77
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Wed Nov 25 16:47:22 2015 +0200

    drm/i915: fix the SDE irq dmesg warnings properly
Comment 22 Chris Wilson 2015-12-15 15:32:44 UTC
Still present as of

commit 0035ecf934fae0492c2d90390f88b8c79e806ffa
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Dec 14 10:41:10 2015 +0100

    drm-intel-nightly: 2015y-12m-14d-09h-40m-37s UTC integration manifest

will look into the debug printk next time I have the box connected to a display.
Comment 23 Joakim Koed 2015-12-16 18:57:54 UTC
Hello.
 
Just wanted to let you know (if you were unaware) this bug does also make the system freeze randomly. I have tried to figure out what triggers it, but it seems just random. If anything it happens after 2-3 hours of activity and then 2-5 mins idle?

On kernel 3.13 it works.

Please feel free to ask me to test fixes if you need. I can compile myself (with a patch) or I can test a kernel you compile.

Hope its okay to write here as a non-dev.
Comment 24 Jani Nikula 2016-01-07 09:54:26 UTC
(In reply to Joakim Koed from comment #23)
> Just wanted to let you know (if you were unaware) this bug does also make
> the system freeze randomly. I have tried to figure out what triggers it, but
> it seems just random. If anything it happens after 2-3 hours of activity and
> then 2-5 mins idle?

I don't think what you're seeing has anything to do with this bug. Please file a new bug report for the symptoms you're seeing.

> Hope its okay to write here as a non-dev.

Absolutely.
Comment 25 Jani Nikula 2016-01-07 09:54:51 UTC
commit 2dfb0b816d224379efc534764388745c474abeb4
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Thu Jan 7 10:29:10 2016 +0200

    drm/i915: shut up gen8+ SDE irq dmesg noise, again
Comment 26 darkbasic 2016-01-07 14:03:33 UTC
When did you shut up the messages for the first time? I always saw them in every RC, including latest 4.4-rc8.
Comment 27 Jani Nikula 2016-01-07 15:02:04 UTC
(In reply to darkbasic from comment #26)
> When did you shut up the messages for the first time? I always saw them in
> every RC, including latest 4.4-rc8.

drm-next, not Linus' upstream.
Comment 28 Yuri 2016-02-25 14:56:52 UTC
Please see this bug report:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1520040

This bug makes my Dell Latitude laptop shutdown (power off). In that bug report I provided a lot of information about my system, logs and tests I did. Currently, only kernel "3.13.0-36-generic" running module "/lib/modules/3.13.0-36-generic/kernel/ubuntu/i915/i915_bdw.ko" doesn't have this bug.

Hence, this bug is far from "solved" and shutting up dmesg definitely won't solve it.
Comment 29 Chris Wilson 2016-02-25 15:15:58 UTC
(In reply to Yuri from comment #28)
> Please see this bug report:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1520040
> 
> This bug makes my Dell Latitude laptop shutdown (power off). In that bug
> report I provided a lot of information about my system, logs and tests I
> did. Currently, only kernel "3.13.0-36-generic" running module
> "/lib/modules/3.13.0-36-generic/kernel/ubuntu/i915/i915_bdw.ko" doesn't have
> this bug.
> 
> Hence, this bug is far from "solved" and shutting up dmesg definitely won't
> solve it.

That has nothing to do with these error messages.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.