Bug 65757 - [snb pch eDP regression] Screen sometimes remains black after mode change - old fast & narrow link is more stable
Summary: [snb pch eDP regression] Screen sometimes remains black after mode change - o...
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium normal
Assignee: Todd Previte
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-14 15:16 UTC by Michal Srb
Modified: 2017-07-24 22:58 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
drm.debug=0xe dmesg of boot and 4 modetest start/stops leading to black screen (104.31 KB, text/plain)
2013-06-14 15:16 UTC, Michal Srb
no flags Details
intel_reg_dump output (14.32 KB, text/plain)
2013-06-14 15:18 UTC, Michal Srb
no flags Details
intel_bios_dumper -> intel_bios_reader output (8.02 KB, text/plain)
2013-06-14 15:20 UTC, Michal Srb
no flags Details

Description Michal Srb 2013-06-14 15:16:44 UTC
Created attachment 80814 [details]
drm.debug=0xe dmesg of boot and 4 modetest start/stops leading to black screen

I have Intel Sandybridge GT1 card (8086:0102) with monitor connected over display port. Sometimes the monitor stays black after mode change.

It happens no matter what caused the mode change. (Observed when i915 module is loaded, on start/end of X, on resolution change in X or when using modetest program from console.) The screen lights up again after few additional mode changes.

It didn't use to happen - I bisected the causing commit:
  2514bc510d0c3aadcc5204056bb440fa36845147
  drm/i915: prefer wide & slow to fast & narrow in DP configs

  It was using 1 lane before the patch and now it is using 2 lanes.

It is still happening with current drm-intel kernel. When I experimentally limited the lane count to 1, mode changing worked reliably again.

There is no difference in dmesg (with drm.debug=0xe) or in intel_reg_dumper output after successful mode change and the one leading to blank screen. The link training passes successfully on first try in both cases.

There is one suspiciously looking message in dmesg:
  [drm] Wrong MCH_SSKPD value: 0x17050407
  [drm] This can cause pipe underruns and display issues.
  [drm] Please upgrade your BIOS to fix this.

Attached dmesg.txt shows boot to console (X is disabled) and 4 modetest program start/stops, where the mode change caused by last modetest program stop caused the black screen. It is annotated with lines starting with "Annotation:".
Comment 1 Michal Srb 2013-06-14 15:18:34 UTC
Created attachment 80815 [details]
intel_reg_dump output

This was taken with the screen working, but the output is the same in case when the screen is black. (The only difference is RC6_RESIDENCY_TIME.)
Comment 2 Michal Srb 2013-06-14 15:20:19 UTC
Created attachment 80816 [details]
intel_bios_dumper -> intel_bios_reader output
Comment 3 Egbert Eich 2013-06-18 15:03:08 UTC
Some further notes:
- The machine uses a Chrontel CH7511B as (e)DP to LVDS converter which supports 2 
  lanes. 
- The VBIOS also seems to use 2 lanes. We have never seen a VBIOS set mode  
  failing.
Comment 4 Daniel Vetter 2013-06-20 13:04:52 UTC
Quick thing to check on latest drm-intel-nightly, specifically

commit 8664281b64c457705db72fc60143d03827e75ca9
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Fri Apr 12 17:57:57 2013 -0300

    drm/i915: report Gen5+ CPU and PCH FIFO underruns

Do the fifo underrun reports fire up?

Also, as discussed in the meeting can you please test what happens when you set the panel to eDP? Without that we also set up the HDMI connector, and the i2c probing over HDMI is known to kill some eDP panels ...
Comment 5 Egbert Eich 2013-06-20 14:18:15 UTC
(In reply to comment #4)
> Quick thing to check on latest drm-intel-nightly, specifically
> 
> commit 8664281b64c457705db72fc60143d03827e75ca9
> Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
> Date:   Fri Apr 12 17:57:57 2013 -0300
> 
>     drm/i915: report Gen5+ CPU and PCH FIFO underruns

Will do that.

> 
> Do the fifo underrun reports fire up?
> 
> Also, as discussed in the meeting can you please test what happens when you
> set the panel to eDP? Without that we also set up the HDMI connector, and
> the i2c probing over HDMI is known to kill some eDP panels ...

When setting the panel eDP it will work: reason is that the bogus VBT eDP tables in the BIOS will mark the panel 1 lane and 18bpp, thus the frequency low enough that only a single lane is used.
I will however disable the HDMI connector setup and the I2C probing on the eDP connector hacking the driver and see if this makes a difference.
Comment 6 Egbert Eich 2013-06-21 12:23:29 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Quick thing to check on latest drm-intel-nightly, specifically
> > 
> > commit 8664281b64c457705db72fc60143d03827e75ca9
> > Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
> > Date:   Fri Apr 12 17:57:57 2013 -0300
> > 
> >     drm/i915: report Gen5+ CPU and PCH FIFO underruns
> 
> Will do that.
> 
> > 
> > Do the fifo underrun reports fire up?

I do not see anything there. The only message regarding underruns is for
pipe underruns as quoted by Michal in the original report.

> > 
> > Also, as discussed in the meeting can you please test what happens when you
> > set the panel to eDP? Without that we also set up the HDMI connector, and
> > the i2c probing over HDMI is known to kill some eDP panels ...
> 
> When setting the panel eDP it will work: reason is that the bogus VBT eDP
> tables in the BIOS will mark the panel 1 lane and 18bpp, thus the frequency
> low enough that only a single lane is used.
> I will however disable the HDMI connector setup and the I2C probing on the
> eDP connector hacking the driver and see if this makes a difference.

I disabled HDMI initialization for testing by returning early from intel_hdmi_init() ie:

@@ -1073,6 +1073,8 @@ void intel_hdmi_init(struct drm_device *dev, int hdmi_reg,
 enum port port)
        struct drm_encoder *encoder;
        struct intel_connector *intel_connector;
 
+       return;
+
        intel_dig_port = kzalloc(sizeof(struct intel_digital_port), GFP_KERNEL);

unfortunately the problem still occurs.
Comment 7 Egbert Eich 2013-06-26 09:50:48 UTC
Thanks to Michal Srb, who took the task to perform this tedious investigation we came to the following conclusions:

1. Of the eDP specific code paths we only need ironlake_edp_panel_on/off()
   and the call to intel_dp_init_panel_power_sequencer_registers() in
   intel_dp_init_connector() to make the panel work properly every time.

2. Some WARN() messages are triggered as the VDD enable/disable functions
   are not called. This however is harmless as the VDD doesn't seem to be
   controlled by the GPU as it seems (the panel works although the
   EDP_FORCE_VDD is off all the time).

3. Only enabling ironlake_edp_panel_off() will result in a dark screen
   all the time. This allows the conclusion that the panel power is controlled
   by the GPU.

If none of the ironlake_edp_panel_on/off() functions are called (as it is in the DP case) the panel power remains on all the time (preset by the BIOS). 
For some reason the panel needs to undergo a power sequence during mode switch 
(ie when the DP lanes get retrained). Why this only needs to happen when two lanes are active is not really clear - it may have to do with a brief loss of the panel LVDS signaling which may case the panel to blank.

To me there are two conclusions possible:
a. the device should really be treated DEVICE_TYPE_eDP in the eDP BDB of VBT.
or
b. a fix is required for a device outside of the control of the driver (ie. the 
   Chrontel CH7511B (e)DP -> LVDS converter).
Comment 8 Daniel Vetter 2013-06-26 12:44:55 UTC
Just to clarify: The magic to make this work is to run the edp panel power sequencing calls?

If that's the case I'd vote to quirk this to be an eDP port: In 3.11 we've refactored all the checks to clearly separate sink-related eDP stuff (backlight, panel power sequencing) from the differences in the eDP source ports. So we can freely adjust this without changing anything else.

For the quirk itself there's hopefully a bit in the DPCD telling is it's an eDP sink, otherwise I guess we'd a dmi new quirk table.
Comment 9 Egbert Eich 2013-06-26 13:02:03 UTC
(In reply to comment #8)
> Just to clarify: The magic to make this work is to run the edp panel power
> sequencing calls?
> 
> If that's the case I'd vote to quirk this to be an eDP port: In 3.11 we've
> refactored all the checks to clearly separate sink-related eDP stuff
> (backlight, panel power sequencing) from the differences in the eDP source
> ports. So we can freely adjust this without changing anything else.
> 
> For the quirk itself there's hopefully a bit in the DPCD telling is it's an
> eDP sink, otherwise I guess we'd a dmi new quirk table.

If there is such a bit in the DPCD it would be generic and worthwhile adding to the driver.

However the latest development in this story is that we have now received word from the manufacturer that the system got retested with the a new version of the Chrontel eDP -> LVDS converter firmware. So far the blank screen issue has not been seen any more during those tests.
We are still waiting for a definitive report (we will also test ourselves) - if this is true the firmware update has succeeded in making the device connected to the GPU look like a regular DP device like the BIOS setting suggests.
Comment 10 Egbert Eich 2013-07-15 12:02:47 UTC
Meanwhile Chrontel has updated the firmware for their (e)DP -> LVDS converter.

The new firmware does not exhibit the issue any more.
Here is the explanation given by Chrontel regarding the cause of the issue and the change made to the firmware:

Here the information about root cause and countermeasure by Chrontel IC f/w
from Chrontel vendor.
============================================================
Root cause:
Chrontel IC CH7511B communicate with OS by DP AUX command for LCD signal
control.
DP Tx drivers seems to be different between SLE11SP2 and SLE11SP3,
and different DP Tx drivers send a bit different command sequences to DP Rx and
set a bit different of timeout of DP AUX command response between SLE11SP2 and
SLE11SP3.
The timeout of DP AUX command response happen occasionally in SLE11SP3 and it
cause black screen in SLE11SP3.

Countermeasure by Chrontel IC f/w:
With MCU embedded in Chrontel IC CH7511B, the issue can be fixed by revising
the firmware of MCU.
Change the interrupt mode and fine-tune the DPCD registers access code, so as
to rapidly respond the DP Tx AUX commands.
AUX responds time of previous firmware  ->around 280us
AUX responds time of new firmware for countermeasure of SP3 issue ->around
250us
Chrontel vendor also tested working well by the new f/w on heave traffic
condition (it means the worse case).
==============================================================
The description of the cause does not match the observations made while investigating this bug.
However since the symptoms of the issue are gone with the latest Chrontel firmware and we don't get more support by either the manufacturer of the system or Chrontel we will close this ticket with NOTOURBUG.
Comment 11 Daniel Vetter 2013-07-15 20:29:08 UTC
Yeah, we've frobbed around the dp aux code a bit in that timeframe, among the changes are that we now use interrupt-driven dp aux transactions. Could very well be that the resulting little timing changes upset the Chrontel, but we should all have timeouts within spec limits.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.