Bug 82551

Summary: [HSW/BDW/BSW/SKL mobile] HDMI hot plug sporadically cause ERROR
Product: DRI Reporter: liulei <lei.a.liu>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: christophe.prigent, intel-gfx-bugs, jeff.zheng
Version: unspecified   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: BDW, BSW/CHT, HSW, SKL i915 features: display/Other
Attachments:
Description Flags
dmesg
none
Delay the hotplug event
none
dmesg-with-patch none

Description liulei 2014-08-13 09:17:34 UTC
Created attachment 104550 [details]
dmesg

==System Environment==
--------------------------
I can't find a good commit.

Non-working platforms: HSW
==kernel==
--------------------------
-nightly: cf1dde8b87834496aabd0a534b1c0695a3572e8d (failed)
    drm-intel-nightly: 2014y-08m-11d-23h-30m-32s integration manifest
-queued: 14bf993e83e1d6924f4bf4506120a15c4b255e58 (failed)
    drm/i915/bdw: Always use MMIO flips with Execlists
-fixes: be71eabebaf9f142612d34d42292b454e984dcb5 (failed)
    Revert "drm/i915: Enable semaphores on BDW"

==Bug detailed description==
-----------------------------
HDMI Hot plug sporadically cause ERROR. Only connecting with ASUS(PB238) monitor can get a high reproduction rate. I enumerate some situations like below:
1 Connecting with other monitor, plug in and unplug HDMI ten times, I can't get this ERROR
2 Connecting with ASUS(PB238) monitor,plug in and unplug HDMI ten times, I get at least 5 times.
3 Connecting with ASUS(PB238) monitor,plug in and unplug HDMI until get ERROR. Then connect with other monitor, plug in and unplug HDMI ten times, I get at least 4 times.


Output ERROR:
[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 228
[   52.007690] Raw EDID:
[   52.007708]          00 ff ff ff ff ff ff 00 04 69 a2 23 6b 9f 00 00
[   52.007742]          31 17 01 03 80 33 1d 78 2a e2 95 a2 55 4f 9f 26
[   52.007775]          11 50 54 b7 ef 00 d1 c0 b3 00 95 00 81 80 81 40
[   52.007808]          81 c0 71 4f 01 01 02 3a 80 ff ff ff ff ff ff ff
[   52.007842]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   52.007891]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   52.007926]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   52.007963]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

==Reproduce steps==
---------------------------- 
1. plugin and unplug HDMI monitor
Comment 1 Chris Wilson 2014-08-13 09:24:07 UTC
To clarify point 3 (since I think that is the most interesting):

After reproducing the EDID error with the bad ASUS monitor, you then get EDID errors plugging in a normally good monitor?

It sounds like the ASUS monitor is faulty, but that should not impact reusing the encoder for a second monitor (at least afaik).
Comment 2 liulei 2014-08-13 10:49:10 UTC
(In reply to comment #1)
> To clarify point 3 (since I think that is the most interesting):
> 
> After reproducing the EDID error with the bad ASUS monitor, you then get
> EDID errors plugging in a normally good monitor?
> 
> It sounds like the ASUS monitor is faulty, but that should not impact
> reusing the encoder for a second monitor (at least afaik).
Using latest -nightly retested, I found this issue can be reproduce too, with connecting a normally good monitor. Sorry for my bug description is misleading. But i indeed met that case as  bug description.
Comment 3 Chris Wilson 2014-08-13 11:26:00 UTC
Created attachment 104557 [details] [review]
Delay the hotplug event

Please try this patch. It should delay the detection to 50ms after the hotplug interrupt, which should be enough time for the HDMI monitor to settle.
Comment 4 liulei 2014-08-14 04:42:55 UTC
(In reply to comment #3)
> Created attachment 104557 [details] [review] [review]
> Delay the hotplug event
> 
> Please try this patch. It should delay the detection to 50ms after the
> hotplug interrupt, which should be enough time for the HDMI monitor to
> settle.
Hi, I can't find this path context. So I failed to try this patch. 
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1143,7 +1143,8 @@ static void i915_digport_work_func(struct work_struct *work)
 		dev_priv->hpd_event_bits |= old_bits;
 		spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
 
-		schedule_work(&dev_priv->hotplug_work);
+		schedule_delayed_work(&dev_priv->hotplug_work,
+				      msecs_to_jiffies(50));
 	}
 }
Comment 5 liulei 2014-08-14 05:47:17 UTC
I will be very grateful if you can inform me which commit and branch your patch based on. Then I will make a quick testing.
Comment 7 liulei 2014-08-14 12:08:23 UTC
(In reply to comment #6)
> Refreshed against -nightly:
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/
> ?id=7d9258b3f29ca8dd814a4e1d1a5a375e55b02a7d

This patch still doesn't work. But I have to try more times to reproduce this issue with this patch.
Comment 8 Chris Wilson 2014-08-15 06:20:12 UTC
Ok, try replacing msecs_to_jiffies(50) with msecs_to_jiffies(100) in that patch. Please could you attach a dmesg after doing so.
Comment 9 liulei 2014-08-18 04:26:24 UTC
(In reply to comment #8)
> Ok, try replacing msecs_to_jiffies(50) with msecs_to_jiffies(100) in that
> patch. Please could you attach a dmesg after doing so.
I still can reproduce this ERROR.
Comment 10 liulei 2014-08-18 04:27:58 UTC
Created attachment 104784 [details]
dmesg-with-patch
Comment 11 zhaodan 2014-09-11 07:30:00 UTC
On the BDW also can reproduce the issue with the same steps.
Test Environment:
Hardware:
CRB:SawTooth Peak
Platform: Broadwell-U
CPU: Broadwell U D0
Chipset PCH: Wildcat Point –LP B0
Audio card: ALC286S
Software:
Commit:62de88e8e65811010deac5375f8f0d8b14dc4d94
Comment 12 jinliangx.wang 2015-03-03 08:48:24 UTC
Still exist on HSW.

Kernel: 4.0.0-rc1_drm-intel-testing-2015-02-27+
Commit: f4213123347e2c8027afe113de47ea7627f95cd8
Comment 13 liulei 2015-03-10 03:12:40 UTC
This issue exist on SKL, too.
Comment 14 Jesse Barnes 2015-03-11 19:13:12 UTC
Can you try this patch?  Just to see if gmbus is at fault here; we might need to reset it harder in pre_xfer somehow.

diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c
index b31088a..0c82044 100644
--- a/drivers/gpu/drm/i915/intel_i2c.c
+++ b/drivers/gpu/drm/i915/intel_i2c.c
@@ -554,6 +554,8 @@ int intel_setup_gmbus(struct drm_device *dev)
                if (IS_I830(dev))
                        bus->force_bit = 1;
 
+               bus->force_bit = 1;
+
                intel_gpio_setup(bus, port);
 
                ret = i2c_add_adapter(&bus->adapter);
Comment 15 liulei 2015-03-13 11:55:20 UTC
I plugin and unplug HDMI 50 times with your patch and don't see any Error.
I do 50 circles on latest -nightly. The result is the same, no error. I'm not sure if this bug is related to machine. I will do more test on other machines.
Comment 16 ye.tian 2015-03-31 08:19:35 UTC
Tested with drm-intel-testing-2015-03-27 on BDW-H, this issue still exists.
Comment 17 Jesse Barnes 2015-03-31 18:16:38 UTC
Tian, were you testing with the patch on BDW-H?  Or without?
Comment 18 Jeff Zheng 2015-04-01 01:29:54 UTC
*** Bug 89601 has been marked as a duplicate of this bug. ***
Comment 19 ye.tian 2015-04-01 01:54:36 UTC
(In reply to Jesse Barnes from comment #17)
> Tian, were you testing with the patch on BDW-H?  Or without?

I tested drm-intel-testing-2015-03-27 without the patch .
Just now, I test with the patch on BDW-H, this issue still exists.
Comment 20 ye.tian 2015-04-01 02:01:44 UTC
(In reply to Jesse Barnes from comment #17)
> Tian, were you testing with the patch on BDW-H?  Or without?

I tested drm-intel-testing-2015-03-27 without the patch .
Just now, I test with the patch on BDW-H, this issue still exists.
Comment 21 ye.tian 2015-04-27 08:10:27 UTC
Tested it on BDW-Y with the testing kernel drm-intel-testing-2015-04-23  .
this issue still exists.
Comment 22 xubin 2015-04-30 04:55:11 UTC
Tested it on Hswu22 with the testing kernel drm-intel-testing-2015-04-23,the bug still exist.

Result:
Notes:[ 1827.089770] [drm:drm_edid_block_valid [drm]] *ERROR* EDID checksum is invalid, remainder is 32
Comment 23 ye.tian 2015-05-12 06:14:22 UTC
Tested it on BDW-H with the testing kernel drm-intel-testing-2015-05-08.
this problem still exists.

output:
--------------
[  374.242159] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 130
[  374.242226] Raw EDID:
[  374.242241]          00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff
[  374.242274]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  374.242307]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  374.242340]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  374.242372]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  374.242405]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  374.242438]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  374.242470]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Comment 24 xubin 2015-05-13 08:30:39 UTC
Tested on SKLY03 with the testing kernel drm-intel-testing-2015-05-08.
this problem still exists.

Output:
[  876.486163] [drm:check_crtc_state [i915]] *ERROR* mismatch in ddi_pll_sel (expected 0x00000000, found 0x00000001)
[  876.486276] ------------[ cut here ]------------
[  876.486311] WARNING: CPU: 2 PID: 1004 at drivers/gpu/drm/i915/intel_display.c:12074 check_crtc_state+0xb67/0xbd1 [i915]()
[  876.486313] pipe state doesn't match!
[  876.486316] Modules linked in: dm_mod snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm ppdev serio_raw pcspkr snd_timer i2c_i801 snd soundcore wmi battery parport_pc parport ac acpi_cpufreq i915 button video drm_kms_helper drm
[  876.486350] CPU: 2 PID: 1004 Comm: kworker/2:1 Tainted: G     U  W       4.1.0-rc2_drm-intel-testing-2015-05-08+ #2
[  876.486354] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.R00.B082.B00.1504240146 04/24/2015
[  876.486389] Workqueue: events i915_hotplug_work_func [i915]
[  876.486417]  0000000000000000 0000000000000009 ffffffff817a66cc ffff88016864f8b8
[  876.486424]  ffffffff8103ebde ffff88016864f8b0 ffffffffa00ca3b0 ffff88007bb10000
[  876.486429]  ffff880163b78000 ffff880163628000 ffff88016864f940 ffff88007ba81c00
[  876.486435] Call Trace:
[  876.486446]  [<ffffffff817a66cc>] ? dump_stack+0x40/0x50
[  876.486454]  [<ffffffff8103ebde>] ? warn_slowpath_common+0x98/0xb0
[  876.486501]  [<ffffffffa00ca3b0>] ? check_crtc_state+0xb67/0xbd1 [i915]
[  876.486513]  [<ffffffff8103ec3b>] ? warn_slowpath_fmt+0x45/0x4a
[  876.486544]  [<ffffffffa00ca3b0>] ? check_crtc_state+0xb67/0xbd1 [i915]
[  876.486583]  [<ffffffffa00d959d>] ? intel_modeset_check_state+0x610/0x9e9 [i915]
[  876.486626]  [<ffffffffa00d9db8>] ? intel_crtc_set_config+0x3f8/0x531 [i915]
[  876.486651]  [<ffffffffa0018203>] ? drm_modeset_lock+0x4e/0xa3 [drm]
[  876.486673]  [<ffffffffa000c2be>] ? drm_mode_set_config_internal+0x4e/0xd2 [drm]
[  876.486683]  [<ffffffffa0058fc5>] ? restore_fbdev_mode+0xac/0xc3 [drm_kms_helper]
[  876.486692]  [<ffffffffa005a7c6>] ? drm_fb_helper_restore_fbdev_mode_unlocked+0x1e/0x54 [drm_kms_helper]
[  876.486701]  [<ffffffffa005a82a>] ? drm_fb_helper_set_par+0x2e/0x32 [drm_kms_helper]
[  876.486709]  [<ffffffffa005a7a2>] ? drm_fb_helper_hotplug_event+0xa1/0xa7 [drm_kms_helper]
[  876.486715]  [<ffffffff8104f985>] ? process_one_work+0x1b2/0x31d
[  876.486721]  [<ffffffff8105026f>] ? worker_thread+0x265/0x351
[  876.486726]  [<ffffffff8105000a>] ? cancel_delayed_work_sync+0xa/0xa
[  876.486732]  [<ffffffff81053ee1>] ? kthread+0xce/0xd6
[  876.486739]  [<ffffffff81053e13>] ? kthread_create_on_node+0x162/0x162
[  876.486745]  [<ffffffff817ac5d2>] ? ret_from_fork+0x42/0x70
[  876.486751]  [<ffffffff81053e13>] ? kthread_create_on_node+0x162/0x162
[  876.486756] ---[ end trace 8103221c8aae6bac ]---
Comment 25 xubin 2015-05-13 08:31:08 UTC
Tested on SKLY03 with the testing kernel drm-intel-testing-2015-05-08.
this problem still exists.

Output:
[  876.486163] [drm:check_crtc_state [i915]] *ERROR* mismatch in ddi_pll_sel (expected 0x00000000, found 0x00000001)
[  876.486276] ------------[ cut here ]------------
[  876.486311] WARNING: CPU: 2 PID: 1004 at drivers/gpu/drm/i915/intel_display.c:12074 check_crtc_state+0xb67/0xbd1 [i915]()
[  876.486313] pipe state doesn't match!
[  876.486316] Modules linked in: dm_mod snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm ppdev serio_raw pcspkr snd_timer i2c_i801 snd soundcore wmi battery parport_pc parport ac acpi_cpufreq i915 button video drm_kms_helper drm
[  876.486350] CPU: 2 PID: 1004 Comm: kworker/2:1 Tainted: G     U  W       4.1.0-rc2_drm-intel-testing-2015-05-08+ #2
[  876.486354] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.R00.B082.B00.1504240146 04/24/2015
[  876.486389] Workqueue: events i915_hotplug_work_func [i915]
[  876.486417]  0000000000000000 0000000000000009 ffffffff817a66cc ffff88016864f8b8
[  876.486424]  ffffffff8103ebde ffff88016864f8b0 ffffffffa00ca3b0 ffff88007bb10000
[  876.486429]  ffff880163b78000 ffff880163628000 ffff88016864f940 ffff88007ba81c00
[  876.486435] Call Trace:
[  876.486446]  [<ffffffff817a66cc>] ? dump_stack+0x40/0x50
[  876.486454]  [<ffffffff8103ebde>] ? warn_slowpath_common+0x98/0xb0
[  876.486501]  [<ffffffffa00ca3b0>] ? check_crtc_state+0xb67/0xbd1 [i915]
[  876.486513]  [<ffffffff8103ec3b>] ? warn_slowpath_fmt+0x45/0x4a
[  876.486544]  [<ffffffffa00ca3b0>] ? check_crtc_state+0xb67/0xbd1 [i915]
[  876.486583]  [<ffffffffa00d959d>] ? intel_modeset_check_state+0x610/0x9e9 [i915]
[  876.486626]  [<ffffffffa00d9db8>] ? intel_crtc_set_config+0x3f8/0x531 [i915]
[  876.486651]  [<ffffffffa0018203>] ? drm_modeset_lock+0x4e/0xa3 [drm]
[  876.486673]  [<ffffffffa000c2be>] ? drm_mode_set_config_internal+0x4e/0xd2 [drm]
[  876.486683]  [<ffffffffa0058fc5>] ? restore_fbdev_mode+0xac/0xc3 [drm_kms_helper]
[  876.486692]  [<ffffffffa005a7c6>] ? drm_fb_helper_restore_fbdev_mode_unlocked+0x1e/0x54 [drm_kms_helper]
[  876.486701]  [<ffffffffa005a82a>] ? drm_fb_helper_set_par+0x2e/0x32 [drm_kms_helper]
[  876.486709]  [<ffffffffa005a7a2>] ? drm_fb_helper_hotplug_event+0xa1/0xa7 [drm_kms_helper]
[  876.486715]  [<ffffffff8104f985>] ? process_one_work+0x1b2/0x31d
[  876.486721]  [<ffffffff8105026f>] ? worker_thread+0x265/0x351
[  876.486726]  [<ffffffff8105000a>] ? cancel_delayed_work_sync+0xa/0xa
[  876.486732]  [<ffffffff81053ee1>] ? kthread+0xce/0xd6
[  876.486739]  [<ffffffff81053e13>] ? kthread_create_on_node+0x162/0x162
[  876.486745]  [<ffffffff817ac5d2>] ? ret_from_fork+0x42/0x70
[  876.486751]  [<ffffffff81053e13>] ? kthread_create_on_node+0x162/0x162
[  876.486756] ---[ end trace 8103221c8aae6bac ]---
Comment 26 Jani Nikula 2015-06-11 08:54:26 UTC
Does this patch help? http://patchwork.freedesktop.org/patch/51634
Comment 27 ye.tian 2015-06-12 05:42:18 UTC
(In reply to Jani Nikula from comment #26)
> Does this patch help? http://patchwork.freedesktop.org/patch/51634

Test it on following kernel with the patch, This problem has go away on HSW and BDW.
http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=topic/bug-90929
Comment 28 Jani Nikula 2016-01-18 13:01:27 UTC
Please try kernel v4.4.
Comment 29 Ricardo 2017-02-21 01:14:08 UTC
Looks like the patch suggested work and this bug can be closed
Comment 30 yann 2017-02-21 08:23:34 UTC
(In reply to Ricardo from comment #29)
> Looks like the patch suggested work and this bug can be closed

I agree this was a 2015 bug tested / verified successfully in 2015 with proper code line. 
So I agree, we can therefore close it.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.