Bug 90804 - System "forgets" about its DVI output
Summary: System "forgets" about its DVI output
Status: CLOSED WONTFIX
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-01 19:59 UTC by Stephen Kitt
Modified: 2017-02-21 16:06 UTC (History)
1 user (show)

See Also:
i915 platform: HSW
i915 features: display/Other


Attachments
Xorg.log (238.81 KB, text/plain)
2015-06-01 19:59 UTC, Stephen Kitt
no flags Details
kern.log (180.15 KB, application/x-gunzip)
2015-06-01 20:01 UTC, Stephen Kitt
no flags Details
Watermark when the screen is working (868 bytes, text/plain)
2015-06-01 20:01 UTC, Stephen Kitt
no flags Details
Watermark when the screen is gone (860 bytes, text/plain)
2015-06-01 20:02 UTC, Stephen Kitt
no flags Details
Register dump when the screen is working (12.89 KB, text/plain)
2015-06-01 20:02 UTC, Stephen Kitt
no flags Details
Register dump when the screen is gone (12.86 KB, text/plain)
2015-06-01 20:02 UTC, Stephen Kitt
no flags Details

Description Stephen Kitt 2015-06-01 19:59:56 UTC
Created attachment 116214 [details]
Xorg.log

Hi,

My system regularly forgets about the screen connected to its DVI output, with slightly varying symptoms. This used to happen irregularly, but since upgrading to 1.17.1 (as packaged in Debian, 2:1.17.1-2) it's systematic. Even with no power manager (I uninstalled xfce4-power-manager) and no screensaver, the screen ends up going into powersave mode and X never comes back. Sometimes switching to a text VT brings the screen back, and then restarting X works; but often switching to another VT doesn't change anything, nor does restarting X; only rebooting helps then.

I'm attaching Xorg.log, the kernel log, and working and failed watermarks and register dumps. The latter reveals that the DVI output is just gone as far as the system is concerned... I've tried disconnecting and reconnecting the DVI cable, and that doesn't help either.

I've noticed lots of traces like

Jun  1 21:03:54 heffalump kernel: [52160.398608] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 215
Jun  1 21:03:54 heffalump kernel: [52160.398611] Raw EDID:
Jun  1 21:03:54 heffalump kernel: [52160.398612]        00 ff ff ff ff ff ff 00 22 f0 f7 26 01 01 01 01
Jun  1 21:03:54 heffalump kernel: [52160.398613]        25 12 01 03 80 36 23 78 ee ce 50 a3 54 4c 99 26
Jun  1 21:03:54 heffalump kernel: [52160.398614]        0f 5f ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Jun  1 21:03:54 heffalump kernel: [52160.398615]        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Jun  1 21:03:54 heffalump kernel: [52160.398616]        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Jun  1 21:03:54 heffalump kernel: [52160.398617]        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Jun  1 21:03:54 heffalump kernel: [52160.398618]        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Jun  1 21:03:54 heffalump kernel: [52160.398619]        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

with varying values; I don't know if that's relevant at all.

Regards,

Stephen
Comment 1 Stephen Kitt 2015-06-01 20:01:11 UTC
Created attachment 116215 [details]
kern.log
Comment 2 Stephen Kitt 2015-06-01 20:01:38 UTC
Created attachment 116216 [details]
Watermark when the screen is working
Comment 3 Stephen Kitt 2015-06-01 20:02:03 UTC
Created attachment 116217 [details]
Watermark when the screen is gone
Comment 4 Stephen Kitt 2015-06-01 20:02:22 UTC
Created attachment 116218 [details]
Register dump when the screen is working
Comment 5 Stephen Kitt 2015-06-01 20:02:39 UTC
Created attachment 116219 [details]
Register dump when the screen is gone
Comment 6 Stephen Kitt 2015-06-01 20:03:51 UTC
I forgot to mention, this is on a Haswell Xeon E3-1245v3, on a Supermicro X10SAE board, with a HP LP2475w connected to the motherboard's DVI output.
Comment 7 Ander Conselvan de Oliveira 2015-06-02 06:25:54 UTC
Your kernel log doesn't have debug information for drm. Plase add drm.debug=0xe to your kernel command line, reproduce the problem again and attach the full dmesg here.
Comment 8 Chris Wilson 2015-06-02 09:14:16 UTC
It's enough, or rather we do not log anything else that is relevant. Frequent hotplug events causing EDID reads which occasionally fail and so rendering the output disconnected.

A workaround would be to save the valid edid and then force the kernel to use that via the drm_kms_helper.edid_firmware= parameter. But before doing that, I would try replacing the DVI cable.
Comment 9 Stephen Kitt 2015-06-02 19:08:07 UTC
Huh, I didn't think of checking the hardware... I've ordered a replacement DVI cable.

Thanks for the suggestion! I'll let you know if it fixes things.
Comment 10 Stephen Kitt 2015-06-06 19:35:58 UTC
I've replaced the DVI cable, but it hasn't fixed things (I even tried with a third DVI cable).

I can trigger this reliably by switching inputs on the monitor; this has always logged invalid EDIDs, but only recently has it caused the system to forget it's there. I've noticed that whichever VT is selected when I switch inputs is the one that goes away; if it's a console VT though, all the console VTs go and I haven't found a way to get them back. An X VT won't come back either but I can restart X and that restores the display (most of the time).

Should I just go for the stored EDID approach? Or is there some change I could revert somewhere?
Comment 11 Chris Wilson 2015-06-07 06:56:50 UTC
The source will be the invalid EDID - we use it to confirm that you have a DVI connection. You can do a bisect to find which commit aggravated the issue for you. It might be something wrong in the comms protocol, e.g. switching to using GMBUS rather than GPIO, caused more frequent failure, but if the monitor always has returned invalid EDID at some point, then I'm not optimistic that it will be that. My guess is that the bisect would report that the offending commit is one that causes to try less hard to recover a broken EDID.

But maybe the bisect would be a surprise and we find a genuine bug.

Whilst I would appreciate a bisect, it may take a few days to perform, longer based on how long it takes for the invalid EDID to generate a disconnect. Overriding the EDID would take a couple of minutes to setup, so I can understand if you just did the workaround.
Comment 12 Stephen Kitt 2015-06-10 04:59:14 UTC
It looks like I'm going to have to bisect this anyway, the EDID workaround isn't enough... This morning the screen was gone again!

There's nothing in the kernel logs overnight, but when I touched the keyboard then started switching VTs to try to get the system back:

Jun 10 06:42:35 heffalump kernel: [30650.505896] platform HDMI-A-3: firmware: direct-loading firmware edid/hp-lp2475w.edid
Jun 10 06:42:35 heffalump kernel: [30650.505918] [drm] Got external EDID base block and 0 extensions from "edid/hp-lp2475w.edid" for connector "HDMI-A-3"

repeated 5 times, then

Jun 10 06:42:47 heffalump kernel: [30663.165540] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 3
Jun 10 06:42:47 heffalump kernel: [30663.165543] Raw EDID:
Jun 10 06:42:47 heffalump kernel: [30663.165544]        00 ff ff ff ff ff ff 00 22 f0 f7 26 01 01 01 01
Jun 10 06:42:47 heffalump kernel: [30663.165544]        25 12 01 03 80 36 23 78 ee ce 50 a3 54 4c 99 26
Jun 10 06:42:47 heffalump kernel: [30663.165545]        0f 50 54 a5 6b 80 81 40 a9 00 a9 40 b3 00 d1 00
Jun 10 06:42:47 heffalump kernel: [30663.165546]        01 01 01 01 01 01 28 3c ff ff ff ff ff ff ff ff
Jun 10 06:42:47 heffalump kernel: [30663.165546]        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Jun 10 06:42:47 heffalump kernel: [30663.165547]        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Jun 10 06:42:47 heffalump kernel: [30663.165547]        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Jun 10 06:42:47 heffalump kernel: [30663.165548]        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

(that came from the screen, the stored EDID is correct); then 11 "direct-loading firmware" messages, followed by

Jun 10 06:42:54 heffalump kernel: [30669.471054] [drm] HPD interrupt storm detected on connector DP-2: switching from hotplug detection to polling

then another few dozen "direct-loading" messages with 2 invalid checksum messages, and again

Jun 10 06:45:48 heffalump kernel: [30844.198498] [drm] HPD interrupt storm detected on connector DP-2: switching from hotplug detection to polling

before I gave up and rebooted.
Comment 13 Ileana 2016-04-19 11:07:00 UTC
Is this still an issue? Have you been able to bisect?
Comment 14 Ricardo 2017-02-21 16:00:15 UTC
Closing this bug, there has been no response for several months, if the problem persist please create a new bug adding logs


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.