Bug 47117

Summary: [ILK] Dell U3011 monitor turns to power saving mode on DisplayPort when resolution set to 2560 x 1600
Product: DRI Reporter: Mathieu Desnoyers <mathieu.desnoyers>
Component: DRM/IntelAssignee: Jesse Barnes <jbarnes>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: medium CC: ben, chris, daniel, florian, jbarnes, mathieu.desnoyers
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg, xorg, xrandr output of working and failing scenarios
none
dmesg of working and non-working runs with drm.debug=0xe
none
intel_reg_dumper, ok run 1
none
intel_reg_dumper, fail run
none
intel_reg_dumper, ok run 2
none
intel_reg_dumper, fail run, boot with i915.lvds_use_ssc=0 cmdline arg
none
Complete dmesg output with drm.debug=0xe
none
properly clear SSC1
none
Complete dmesg output with drm.debug=0xe with "Properly clear ssc1" patch applied
none
intel_reg_dumper, fail run, with "properly clear SSC1" patch applied
none
intel_reg_dumper, ok, with 3.3 kernel and backported drm/i2c from 9613bebb223dea3179c265dc31e1bb41ae39f321 none

Description Mathieu Desnoyers 2012-03-08 15:09:01 UTC
Created attachment 58212 [details]
dmesg, xorg, xrandr output of working and failing scenarios

Dell U3011 monitor turns to power saving mode when resolution set to 2560 x 1600
(intermittent)

Reproduced with:
Linux 2.6.38-1-amd64 (Debian kernel)
Linux 3.1.0-1-amd64  (Debian kernel)
Linux 3.2.0-1-amd64  (Debian kernel)

The computer is a Lenovo X201 Tablet (Intel QM57 chipset), on a X200
docking station. The docking station uses a DisplayPort cable to connect
to the monitor.  I tried upgrading my Bios from 1.34/1.13 to 1.38/1.14
(latest), which did not fix the problem. I cannot use anything else
than DisplayPort in this case to connect the monitor at 2560 x 1600,
because this resolution requires either the bandwidth provided by a
dual-link DVI (which I cannot connect in my docking station), or
DisplayPort.

The laptop Intel HD Graphic along with the X200 displayport is
documented to support this resolution (ref.
http://www.intel.com/products/notebook/chipsets/qm57/qm57-overview.htm).
The screen works at 2560 x 1600 resolution, but it takes many
attempts to get it to work. I usually need to restart the screen and
reboot the laptop a few times, and shutting down the LVDS screen of the
laptop seems to help (sometimes). Interesting fact, if I bring the LVDS
screen back up after that, I had a few successful cases where both
screens worked fine together. I sometimes had to close the lid of my
laptop, which puts it to sleep, and open it to find that the two screens
end up working correctly.

I run this in gnome 3.2 (in "classic" mode), setup with a resolution of
2560x1600 for the Dell 30", and 1280x800 for my laptop. Both DP1 and
LVDS1 displays are activated (dual-screen).

In my dmesg (3.2 kernel), I sometimes get this message when I power
cycle my monitor (in the cases where it does not work):

[ 3486.644818] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 182
[ 3486.644824] [drm:drm_edid_block_valid] *ERROR* Raw EDID:
[ 3486.644830] <3>00 ff ff ff ff ff ff 00 10 ac 65 40 34 34 30 31  ..........e@4401
[ 3486.644834] <3>15 01 04 b5 40 28 78 3a 8d 85 ad 4f 35 b1 25 0e  ....@(x:...O5.%.
[ 3486.644837] <3>50 54 a5 4b 00 71 4f 81 00 81 80 a9 40 d1 00 d1  PT.K.qO.....@...
[ 3486.644841] <3>40 01 01 01 01 e2 68 00 a0 a0 40 2e 60 30 20 36  @.....h...@.`0 6
[ 3486.644845] <3>00 81 91 21 00 00 1a 00 00 00 ff 00 50 48 35 4e  ...!........PH5N
[ 3486.644848] <3>59 31 43 39 30 34 34 4c 0a 00 00 00 fc 00 44 45  Y1C9044L......DE
[ 3486.644852] <3>4c 4c 20 55 33 30 31 31 0a 20 20 00 00 00 fd 00  LL U3011.  .....
[ 3486.644855] <3>31 56 1d 71 1c 00 0a 20 20 20 20 20 20 01 08 02  1V.q...      ...

I've also seen this other message (after a reboot):

[   52.638311] [drm:intel_dp_complete_link_train] *ERROR* failed to train DP, aborting

The i915 and drm drivers are used (detailed dmesg at the bottom of this
email), with kms.

Xorg version:

% dpkg -l xorg
ii  xorg           1:7.6+10       X.Org X Window System
% dpkg -l xserver-xorg-video-intel
ii  xserver-xorg-v 2:2.17.0-1     X.Org X server -- Intel i8xx, i9xx display d
% dpkg -l gnome
ii  gnome                   1:3.0+6                 The GNOME Desktop Environment, with extra components

More detailed info provided in attachment.
Comment 1 Mathieu Desnoyers 2012-03-08 15:12:39 UTC
Created attachment 58213 [details]
dmesg of working and non-working runs with drm.debug=0xe
Comment 2 Mathieu Desnoyers 2012-03-08 15:17:23 UTC
Another piece of information on this issue: I tried the monitor with an
Apple PowerBook running MacOS X, and the monitor works flawlessly
(monitor always brought up, and the same video works without glitch with
vlc).

I also simplified my routine that successfully bring the monitor into a
working state: running 5 to 20 times the following script sometimes ends
up doing the trick:

xrandr --output DP1 --off
sleep 1
xrandr --output DP1 --mode 2560x1600

But most of the time I must resign to put my laptop in suspend-to-ram:
when I bring it back from suspend-to-ram, the screen usually shows up
fine.

Please let me know if I can provide more information on the issue than
the detailed logs (success/fail) below. I would also like to know if
there is a knob somewhere in the Intel driver I could play with to
provide more time for the screen to send its EDID information. Based on
this info: http://www.intel.com/support/graphics/sb/CS-028366.htm, Intel
provide a modified Windows driver that might target this kind of issue,
and I assume they possibly let more time than the spec requires for the
screen to send EDID info.

Also, if there is specific timing information you need, let me know which functions you want me to trace, so I can use my LTTng kernel tracer for what it's made for. :)
Comment 3 Mathieu Desnoyers 2012-03-08 15:18:42 UTC
Reference to the original LKML thread I posted on this issue (for cross-reference): https://lkml.org/lkml/2012/1/21/149
Comment 4 Daniel Vetter 2012-03-18 09:50:35 UTC
Turning of audio might help, can you please try this:

xrandr --output DP1 --mode 2560x1600 --set audio off
Comment 5 Mathieu Desnoyers 2012-03-26 08:23:47 UTC
(In reply to comment #4)
> Turning of audio might help, can you please try this:
> 
> xrandr --output DP1 --mode 2560x1600 --set audio off

I tried it with "--set audio off", and I get the same success/failure rate as without.

Currently, the one way I achieve getting my DisplayPort to work at 2560x1600 each and every time is by putting my laptop in sleep mode and bringing it back up.

Thanks for the suggestion,

Mathieu
Comment 6 Daniel Vetter 2012-03-26 12:00:25 UTC
In that case, can you please install intel-gpu-tools and attach the output of intel_reg_dumper, once right after suspend (i.e. in the working case) and once when you set the xrandr configuration manually, and things are broken.
Comment 7 Mathieu Desnoyers 2012-03-26 13:06:32 UTC
Created attachment 59087 [details]
intel_reg_dumper, ok run 1

First reg dump with screen working correctly.
Comment 8 Mathieu Desnoyers 2012-03-26 13:07:53 UTC
Created attachment 59088 [details]
intel_reg_dumper, fail run

Dumping registers in state where screen is black, going to power saving mode (experienced failure).
Comment 9 Mathieu Desnoyers 2012-03-26 13:09:25 UTC
Created attachment 59089 [details]
intel_reg_dumper, ok run 2

A second reg dump (after doing to laptop sleep and going back) to have a second OK run, so the delta between the two OK runs can be discarded from diffs with the failed run.
Comment 10 Mathieu Desnoyers 2012-03-26 13:15:49 UTC
What I see as interesting parts of the diffs between register dumps:

OK runs:
PCH_DREF_CONTROL: 0x00001400 (cpu source disable, ssc_source enable, nonspread_source enable, superspread_source disable, ssc4_mode downspread, ssc1 disable, ssc4 disable)

FAIL run:
PCH_DREF_CONTROL: 0x00001402 (cpu source disable, ssc_source enable, nonspread_source enable, superspread_source disable, ssc4_mode downspread, ssc1 enable, ssc4 disable)

(ssc1 is enabled in the failed run)

Other possibly interesting different fields:

OK:
TRANS_HTOTAL_C: 0x00000000 (1 active, 1 total)
TRANSC_DP_LINK_N2: 0x00ffffff (val 0xffffff 16777215)

FAIL:
TRANS_HTOTAL_C: 0x20800000 (1 active, 8321 total)
TRANSC_DP_LINK_N2: 0x00000000 (val 0x0 0)

The PCH_PP_STATUS and PCH_PP_CONTROL fields do not seem to be that interesting, since each of the two OK runs have different values for those.
Comment 11 Daniel Vetter 2012-03-26 14:01:58 UTC
> --- Comment #10 from Mathieu Desnoyers <mathieu.desnoyers@efficios.com> 2012-03-26 13:15:49 PDT ---
> What I see as interesting parts of the diffs between register dumps:
>
> OK runs:
> PCH_DREF_CONTROL: 0x00001400 (cpu source disable, ssc_source enable,
> nonspread_source enable, superspread_source disable, ssc4_mode downspread, ssc1
> disable, ssc4 disable)
>
> FAIL run:
> PCH_DREF_CONTROL: 0x00001402 (cpu source disable, ssc_source enable,
> nonspread_source enable, superspread_source disable, ssc4_mode downspread, ssc1
> enable, ssc4 disable)
>
> (ssc1 is enabled in the failed run)

That's rather interesting. Can you boot with i915.lvds_use_ssc=0
appended to your kernel cmdline? That should disable the use of ssc.

If that helps, please attach the output of lspic -vmmn, so we can add
a corresponding quirk for your machine.
Comment 12 Mathieu Desnoyers 2012-03-26 21:07:17 UTC
Created attachment 59101 [details]
intel_reg_dumper, fail run, boot with i915.lvds_use_ssc=0 cmdline arg

Interesting diff between ok and fail-nossc:

<               PCH_DREF_CONTROL: 0x00001400 (cpu source disable, ssc_source enable, nonspread_source enable, superspread_source disable, ssc4_mode downspread, ssc1 disable, ssc4 disable)
---
>               PCH_DREF_CONTROL: 0x00001402 (cpu source disable, ssc_source enable, nonspread_source enable, superspread_source disable, ssc4_mode downspread, ssc1 enable, ssc4 disable)

(i915.lvds_use_ssc=0 seems to still have ssc1 enable, is it normal ?)

<                     PCH_DPLL_A: 0x88046004 (enable, sdvo high speed no, mode LVDS, p2 Div 14, FPA0 P1 3, FPA1 P1 3, refclk SSC, sdvo/hdmi mul 1)
---
>                     PCH_DPLL_A: 0x88040004 (enable, sdvo high speed no, mode LVDS, p2 Div 14, FPA0 P1 3, FPA1 P1 3, refclk default 120Mhz, sdvo/hdmi mul 1)
Comment 13 Chris Wilson 2012-03-27 02:39:59 UTC
I'm a bit puzzled as PCH_DREF_CONTROL is written once just initialisation, it should not vary at runtime, not even for DPMS or mode changes.
Comment 14 Daniel Vetter 2012-03-27 02:42:30 UTC
I'm decently confused, so I need more information. Can you please grab the _full_ dmesg (the one you've attached is cut of at the top) with drm.debug=0xe after one suspend cycle (i.e. boot into broken state, then fix it with suspend/resume). Please make sure that you don't loose any dmesg output at the beginning of the boot.
Comment 15 Daniel Vetter 2012-03-27 02:43:49 UTC
Correction re Chris' comment: We _do_ reinit the PCH_DREF_CONTROL on resume by calling ilk_init_pch_refclk from i915_drm_thaw. Which makes it even more puzzling that a resume cycle magically fixes this. Hopefully the full drm debug log will shed some light on this ...
Comment 16 Mathieu Desnoyers 2012-03-28 05:19:54 UTC
Created attachment 59156 [details]
Complete dmesg output with drm.debug=0xe

In this dmesg, you will find the complete execution including the following timestamps:

[   19.055042] eth0: no IPv6 routers present
-> this is the point where my login screen appears, at lower resolution, so I get something shown on my DP screen.

[   57.864385] [drm:intel_crtc_cursor_set], cursor off
-> this timestamp is where my DP screen became blank, following attempt to switch to 2560x1600 after login.

[  145.001230] [drm:intel_crtc_cursor_set], cursor off
-> this is after suspend/resume cycle of the laptop, where the DP screen shows up OK at 2560x1600.

Hoping this helps!

Thanks,

Mathieu
Comment 17 Daniel Vetter 2012-03-28 08:25:36 UTC
Created attachment 59162 [details] [review]
properly clear SSC1

Not much more clue yet, but there's a small thing I've noticed. Please test the attached patch.
Comment 18 Mathieu Desnoyers 2012-03-29 07:28:10 UTC
Created attachment 59226 [details]
Complete dmesg output with drm.debug=0xe with "Properly clear ssc1" patch applied

It turns out the patch to properly clear ssc1 does not help. Here is the full dmesg with the patch applied. If I can provide more detailed information about some code paths (by e.g. adding tracing instrumentation or adding printk), please let me know what functions sites/variables are of particular interest.

In this dmesg, you will find the complete execution including the following
timestamps:

[   18.283007] eth0: no IPv6 routers present
-> this is the point where my login screen appears, at lower resolution, so I
get something shown on my DP screen.

[   45.274438] [drm:intel_crtc_cursor_set], cursor off
-> this timestamp is where my DP screen became blank, following attempt to
switch to 2560x1600 after login.

[   67.982331] [drm:intel_crtc_cursor_set], cursor off
-> this is after suspend/resume cycle of the laptop, where the DP screen shows
up OK at 2560x1600.
Comment 19 Daniel Vetter 2012-03-29 08:45:50 UTC
Hm, my hopes vanished. Can you please again grab the output of intel_reg_dumper with the SSC1 clear patch attached from the bad state?
Comment 20 Mathieu Desnoyers 2012-03-30 08:20:59 UTC
Created attachment 59280 [details]
intel_reg_dumper, fail run, with "properly clear SSC1" patch applied

Interesting part of the diff between prior "failed" runs and this failed run with patch applied:

<               PCH_DREF_CONTROL: 0x00001402 (cpu source disable, ssc_source enable, nonspread_source enable, superspread_source disable, ssc4_mode downspread, ssc1 enable, ssc4 disable)
---
>               PCH_DREF_CONTROL: 0x00001400 (cpu source disable, ssc_source enable, nonspread_source enable, superspread_source disable, ssc4_mode downspread, ssc1 disable, ssc4 disable)

so it looks like this patch does indeed clear the ssc1 enable flag, but it does not fix the underlying problem.
Comment 21 Mathieu Desnoyers 2012-03-30 08:36:11 UTC
(In reply to comment #18)
> Created attachment 59226 [details]
> Complete dmesg output with drm.debug=0xe with "Properly clear ssc1" patch
> applied
> 
> It turns out the patch to properly clear ssc1 does not help. Here is the full
> dmesg with the patch applied. If I can provide more detailed information about
> some code paths (by e.g. adding tracing instrumentation or adding printk),
> please let me know what functions sites/variables are of particular interest.
> 
> In this dmesg, you will find the complete execution including the following
> timestamps:
> 
> [   18.283007] eth0: no IPv6 routers present
> -> this is the point where my login screen appears, at lower resolution, so I
> get something shown on my DP screen.
> 
> [   45.274438] [drm:intel_crtc_cursor_set], cursor off
> -> this timestamp is where my DP screen became blank, following attempt to
> switch to 2560x1600 after login.
> 
> [   67.982331] [drm:intel_crtc_cursor_set], cursor off
> -> this is after suspend/resume cycle of the laptop, where the DP screen shows
> up OK at 2560x1600.

There are interesting activity differences between the last sequence that lead to the failed and OK states. The failed state seems to have some activity triggered by a "drm:intel_ironlake_crt_detect_hotplug", while the sequence that leads to the OK state (when coming back from suspend) does not have this:

Just before failed state:

[   39.334403] [drm:intel_crtc_cursor_set],
[   39.334408] [drm:intel_crtc_cursor_set], cursor off
[   39.334416] [drm:intel_crtc_cursor_set],
[   39.334419] [drm:intel_crtc_cursor_set], cursor off
[   39.336658] [drm:intel_crtc_cursor_set],
[   39.336669] [drm:intel_crtc_cursor_set],
[   40.497637] [drm:intel_ironlake_crt_detect_hotplug], ironlake hotplug adpa=0xf40000, result 0
[   40.497644] [drm:intel_crt_detect], CRT not detected via hotplug
[   40.497648] [drm:output_poll_execute], [CONNECTOR:12:VGA-1] status updated from 2 to 2
[   40.509056] [drm:output_poll_execute], [CONNECTOR:15:HDMI-A-1] status updated from 2 to 2
[   40.509372] [drm:intel_dp_detect], DPCD: 110a840101000100
[   40.510427] [drm:i2c_algo_dp_aux_xfer], dp_aux_xfer return 2
[   40.537548] [drm:i2c_algo_dp_aux_xfer], dp_aux_xfer return 2
[   40.546890] wlan0: authenticate with 00:10:18:90:20:db (try 1)
[   40.549653] wlan0: authenticated
[   40.550595] wlan0: associate with 00:10:18:90:20:db (try 1)
[   40.552925] wlan0: RX AssocResp from 00:10:18:90:20:db (capab=0x411 status=0 aid=3)
[   40.552931] wlan0: associated
[   40.552938] wlan0: moving STA 00:10:18:90:20:db to state 1
[   40.552942] wlan0: moving STA 00:10:18:90:20:db to state 2
[   40.560345] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[   40.564755] [drm:i2c_algo_dp_aux_xfer], dp_aux_xfer return 2
[   40.564761] [drm:drm_detect_monitor_audio], Monitor has basic audio support
[   40.564765] [drm:output_poll_execute], [CONNECTOR:18:DP-1] status updated from 1 to 1
[   40.584823] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
[   40.587655] wlan0: moving STA 00:10:18:90:20:db to state 3
[   45.274417] [drm:intel_crtc_cursor_set],
[   45.274424] [drm:intel_crtc_cursor_set], cursor off
[   45.274434] [drm:intel_crtc_cursor_set],
[   45.274438] [drm:intel_crtc_cursor_set], cursor off


Just before OK state:

[   62.548521] [drm:drm_mode_getconnector], [CONNECTOR:18:?]
[   62.548943] [drm:intel_crtc_cursor_set],
[   62.548950] [drm:intel_crtc_cursor_set],
[   63.107716] [drm:intel_crtc_cursor_set],
[   63.107721] [drm:intel_crtc_cursor_set], cursor off
[   63.107728] [drm:intel_crtc_cursor_set],
[   63.107731] [drm:intel_crtc_cursor_set], cursor off
[   63.157934] [drm:intel_crtc_cursor_set],
[   63.157942] [drm:intel_crtc_cursor_set],
[   63.203489] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx
[   66.461456] [drm:intel_crtc_cursor_set],
[   66.461462] [drm:intel_crtc_cursor_set], cursor off
[   66.461472] [drm:intel_crtc_cursor_set],
[   66.461475] [drm:intel_crtc_cursor_set], cursor off
[   66.538491] [drm:intel_crtc_cursor_set],
[   66.538502] [drm:intel_crtc_cursor_set],
[   67.982314] [drm:intel_crtc_cursor_set],
[   67.982320] [drm:intel_crtc_cursor_set], cursor off
[   67.982328] [drm:intel_crtc_cursor_set],
[   67.982331] [drm:intel_crtc_cursor_set], cursor off

I also used LTTng 2.0 to trace the kernel to see where those "cursor_set" calls were coming from, and it seems to be an ioctl from the Xorg server. So if the cursor_set ever needs to happen in a specific timing with respect to the screen, it may get delayed by Xorg preemption due to gnome (and others) initialization. (further thoughts on this hypothesis are welcome)

I also tried disabling wlan entirely from my kernel config, as it appears in the log just before the failed state (I thought it might be causing a timing-related issue). But it did not fix things.

I also tried to run Xorg at nice -20, and it did not help.

I also tried to disable the if(intel_crt_detect_hotplug(connector)) branch within intel_crt_detect (replacing it by an if (0)), but it did not help.

I also tried changing "DRM_OUTPUT_POLL_PERIOD (10*HZ)" to "(30*HZ)", but not effect.

Other ideas are welcome,

Thanks,

Mathieu
Comment 22 Mathieu Desnoyers 2012-03-30 14:08:36 UTC
First success! I tried with Linux master branch, currently at commit 9613bebb223dea3179c265dc31e1bb41ae39f321 and it works fine without any patch applied. Let me know if there are things I could do to spot the fix and help backporting it to older stable kernels.

From what I know, the fix came in between v3.3 and 9613bebb223dea3179c265dc31e1bb41ae39f321.

Thanks,

Mathieu
Comment 23 Daniel Vetter 2012-03-30 14:20:46 UTC
Well, let's just be happy then. If you could, attaching a intel_reg_dump with the fixed version would be great (for documentation). I'll close this as fixed now, please reopen if it blows up again. And thanks for reporting this issue and doing all the testing.
Comment 24 Mathieu Desnoyers 2012-03-30 15:41:41 UTC
Created attachment 59299 [details] [review]
intel_reg_dumper, ok, with 3.3 kernel and backported drm/i2c from 9613bebb223dea3179c265dc31e1bb41ae39f321

Diff between failed setup and this fixed kernel:

34c34
<                       DSPASURF: 0x047ab000
---
>                       DSPASURF: 0x03754000
47c47
<                       DSPBSURF: 0x047ab000
---
>                       DSPBSURF: 0x03754000
165c165
<                BLC_PWM_CPU_CTL: 0x000002a3
---
>                BLC_PWM_CPU_CTL: 0x00000ef4
Comment 25 Florian Mickler 2012-04-16 14:27:59 UTC
A patch referencing this bug report has been merged in Linux v3.4-rc2:

commit e77166b5a653728f312d07e60a80819d1c54fca4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Mar 30 22:14:05 2012 +0200

    drm/i915: properly clear SSC1 bit in the pch refclock init code
Comment 26 Jari Tahvanainen 2016-10-07 10:06:25 UTC
Closing - patch merged 2012.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.