Bug 44876

Summary: [snb regression bisected] Attaching VGA or HDMI cable causes LVDS screen corruption
Product: DRI Reporter: daniel <sec>
Component: DRM/IntelAssignee: Jani Nikula <jani.nikula>
Status: CLOSED FIXED QA Contact:
Severity: blocker    
Priority: medium CC: ben, chris, daniel, florian, jbarnes
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Picture of screen corruption
none
without hdmi cable
none
with hdmi cable
none
lid open
none
lid closed and reopened
none
right after boot xdm start screen everything fine
none
dmesg after closing, 5 seconds wait, re-opening lid with corrupted screen afterwards
none
drm.debug connected hdmi cable screen corruption on lvds as soon as xdm starts
none
picture of HDMI cable attached screen corruption
none
HDMI cable run, waited for idle blank -> dmesg
none
HDMI run, dmesg after key press after idle blank --> screen is good none

Description daniel 2012-01-17 20:04:14 UTC
Created attachment 55706 [details]
Picture of screen corruption

see attached image.

Same happened when LID got closed and reopened. A workaround is to disable ACPI/Power Management in Kernel.

<may be related dmesg output>
[drm] Changing LVDS panel from (-hsync, -vsync) to (+hsync, -vsync)
</dmesg>
Comment 1 Chris Wilson 2012-01-18 02:34:36 UTC
I think a starting point is to compile intel_reg_dumper from http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/ (which may actually be available from your local distributor now!) and grab the registers before and after closing the lid.
Comment 2 daniel 2012-01-18 04:46:35 UTC
HDMI cable:

1. xdm/xorg started -> reg_dump via ssh
2. connected HDMI -> rebooted (screen corruption) -> reg_dump via ssh

LID:
1. xdm/xorg started -> reg_dump via ssh
2. lid closed and re-opened (screen corruption) -> reg_dump via ssh
(diff shows no differences)

Vanilla Kernel 3.2.1 and 3.1.9
Gentoo
Dell E6420
Comment 3 daniel 2012-01-18 04:47:13 UTC
Created attachment 55730 [details]
without hdmi cable
Comment 4 daniel 2012-01-18 04:47:44 UTC
Created attachment 55731 [details]
with hdmi cable
Comment 5 daniel 2012-01-18 04:48:18 UTC
Created attachment 55732 [details]
lid open
Comment 6 daniel 2012-01-18 04:48:47 UTC
Created attachment 55733 [details]
lid closed and reopened
Comment 7 daniel 2012-01-18 04:55:38 UTC
the screen corruption is gone, as soon as the kernel? idle timer blanks the screen for power saving (after 5 minutes idle time?). After a key press the screen is normal till the next closing of lid.
Comment 8 Chris Wilson 2012-01-18 05:40:46 UTC
No change whatsoever when closing/opening the lid. Plugging the HDMI cable obviously brings up the second pipe, and switches the first pipe to a new resolution. 

But no clues there I am afraid. Next up is append drm.debug=0xe to your kernel boot parameters and attach the debug logs. Hopefully one of the other guys has an inspired guess...
Comment 9 daniel 2012-01-18 13:54:19 UTC
drm.debug=0xe

LID:

1. dmesg right after boot
2. dmesg after lid close cycle (screen corruption after reopening the lid)
Comment 10 daniel 2012-01-18 13:54:58 UTC
Created attachment 55755 [details]
right after boot xdm start screen everything fine
Comment 11 daniel 2012-01-18 13:55:42 UTC
Created attachment 55756 [details]
dmesg after closing, 5 seconds wait, re-opening lid with corrupted screen afterwards
Comment 12 Daniel Vetter 2012-01-19 05:20:05 UTC
Hm, maybe we've got a nice dpms issue here. Can you try to run

$ xset dpms force off
$ xset dpms force on

when the screen corruptions happens from a console within X? Also just to check: the "lid closed and reopened" reg dump, is that while the screen corruptions are still there or afterwards?
Comment 13 daniel 2012-01-19 08:28:27 UTC
1. LID 

dpms force off/on doesnt change anything.

xdm -> logged in -> konsole prepared with xset dpms force off -> lid close -> screen corruption -> blind pressed enter and entered blind the xset dpms force on command -> nothing happened, screen corruption still there.

when the screen corruptions happens from a console within X? Also just to
check: the "lid closed and reopened" reg dump, is that while the screen
corruptions are still there or afterwards?

the reg dump was created while the screen corruption still there.

2. HDMI cable

System off -> connected HDMI cable -> system boot -> during kernel startup same content on hdmi and lvds -> xdm start -> hdmi OK xdm login screen -> lvds screen corruption (the screen corruption looks the same for lid close and for hdmi)

created dmesg.
Comment 14 daniel 2012-01-19 08:29:57 UTC
Created attachment 55790 [details]
drm.debug connected hdmi cable screen corruption on lvds as soon as xdm starts
Comment 15 daniel 2012-01-19 08:36:51 UTC
Created attachment 55791 [details]
picture of HDMI cable attached screen corruption
Comment 16 daniel 2012-01-19 08:38:25 UTC
Created attachment 55792 [details]
HDMI cable run, waited for idle blank -> dmesg
Comment 17 daniel 2012-01-19 08:39:06 UTC
Created attachment 55793 [details]
HDMI run, dmesg after key press after idle blank --> screen is good
Comment 18 daniel 2012-01-19 16:34:40 UTC
news:

the directory /sys/class/backlight/ contained two subdirs.
"intel_backlight and dell_backlight"
I disabled the dell_laptop option under x86 and now, there is some different behavior.
Still screen corruption but when i close and reopen the lid the screen is now black. And there are some light/shady characters from the boot process like [drm] initialized drm 1.1.0 etc.
Comment 19 daniel 2012-01-19 16:42:34 UTC
wrong. The change is not from the kernel option. The change is from the kernel command line acpi=off.
Comment 20 daniel 2012-01-19 18:20:18 UTC
3.0.17 is _not_ affected --> this is a regression

cp .config 3.1.10 to 3.0.17 ; make oldconfig 
everything works. LID and HDMI --> no screen corruptions
Comment 21 Daniel Vetter 2012-01-20 01:24:16 UTC
On Fri, Jan 20, 2012 at 02:20:18AM +0000, bugzilla-daemon@freedesktop.org wrote:
> https://bugs.freedesktop.org/show_bug.cgi?id=44876
> 
> --- Comment #20 from daniel <sec@dschroeder.info> 2012-01-19 18:20:18 PST ---
> 3.0.17 is _not_ affected --> this is a regression
> 
> cp .config 3.1.10 to 3.0.17 ; make oldconfig 
> everything works. LID and HDMI --> no screen corruptions

Hm, that's bad and I've got no idea - it might very well be something
going on in acpi, too. Can you please bisect this one with git? Knowing
the bad commit usually helps a _lot_.

Thanks, Daniel
Comment 22 daniel 2012-02-09 12:03:26 UTC
LKML thread with the same problem. Adding as reference:

http://lkml.org/lkml/2011/9/14/299
Comment 23 Chris Wilson 2012-04-14 06:18:10 UTC
Did 3.1 have FBC enabled? One of the first side-effects of FBC was LVDS corruption during hotplug.
Comment 24 Chris Wilson 2012-05-11 06:59:36 UTC
All the errors occurred whilst using FBC. Can you please try i915.i915_enable_fbc=0 or a more recent kernel?
Comment 25 daniel 2012-05-14 23:37:12 UTC
tested this: i915.i915_enable_fbc=0 with Kernel 3.3.6.
tested how: booted -> xdm login screen -> closed the lid -> waited 10 seconds -> reopened -> screen corruption.

I am pretty sure, that this only happens on this type/model of notebook (E6420). A collegue using stock Ubuntu with the same notebook has the same troubles. If it would be more widespread more people would complain. So, possible no faults/errors in the Intel driver and may be a specific firmware/bios problem of the vendor...
Comment 26 Daniel Vetter 2012-05-19 15:51:11 UTC
You've mentioned in comment #20 that this is a regression. Can you please try to bisect this? I guess otherwise we're pretty much stuck.
Comment 27 daniel 2012-05-28 07:42:58 UTC
kernel 3.1-rc3 ==> bad
kernel 3.1-rc2 ==> good
Comment 28 daniel 2012-05-28 11:07:43 UTC
git bisect log
git bisect start
# bad: [fcb8ce5cfe30ca9ca5c9a79cdfe26d1993e65e0c] Linux 3.1-rc3
git bisect bad fcb8ce5cfe30ca9ca5c9a79cdfe26d1993e65e0c
# good: [93ee7a9340d64f20295aacc3fb6a22b759323280] Linux 3.1-rc2
git bisect good 93ee7a9340d64f20295aacc3fb6a22b759323280
# bad: [fbad8991ef9d41d1fad587dff23fa6deff01af83] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
git bisect bad fbad8991ef9d41d1fad587dff23fa6deff01af83
# bad: [870d3be1249b1397395ed3164987397993a16d91] Merge branch 'docs-move' of git://git.kernel.org/pub/scm/linux/kernel/git/rdunlap/linux-docs
git bisect bad 870d3be1249b1397395ed3164987397993a16d91
# good: [cedf03bd9aa54d1d7a9065dddc9e76505f476b12] x86: fix mm/fault.c build
git bisect good cedf03bd9aa54d1d7a9065dddc9e76505f476b12
# good: [798c794df81e0a1af62c1d7e48b464f4096f3b9a] Docs: MSI-HOWTO: MSI -> MSIs
git bisect good 798c794df81e0a1af62c1d7e48b464f4096f3b9a
# bad: [c3613de92ebea302137d21d8938421c3f88d8741] drm/i915: Can't do accurate vblank timestamps with UMS
git bisect bad c3613de92ebea302137d21d8938421c3f88d8741
# bad: [4e6343898fe7eed6b3c0c3c809347bc88d5b4a1e] drm/i915: Remove unused 'reg' argument to dp_pipe_enabled
git bisect bad 4e6343898fe7eed6b3c0c3c809347bc88d5b4a1e
# good: [ed10fca9c351c83ab89a97f3515089e0d36bdccc] drm/i915: Leave LVDS registers unlocked
git bisect good ed10fca9c351c83ab89a97f3515089e0d36bdccc
# bad: [1519b9956eb4b4180fa3f47c73341463cdcfaa37] drm/i915: Fix PCH port pipe select in CPT disable paths
git bisect bad 1519b9956eb4b4180fa3f47c73341463cdcfaa37
Comment 29 Daniel Vetter 2012-05-29 00:43:30 UTC
Ok, so according to your bisect log this commit should be the culprit:

commit 13d83a672e9bbd52ae82c2f611dfd845a957e8b4
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Wed Aug 3 12:59:20 2011 -0700

    drm/i915: split out PCH refclk update code
Comment 30 Daniel Vetter 2012-05-29 00:45:39 UTC
Can you also please test the latest 3.4 release? The pch refclock code changed quite a bit lately, so double-checking whether you still have the same problem would be good (and any patches to fix things would be on top of the latest code anyway).
Comment 31 daniel 2012-05-29 02:18:09 UTC
tested it already with linux-3.4-rc6 ==> bad. Should I test it again with 3.4 final?
Comment 32 Daniel Vetter 2012-05-29 02:26:57 UTC
No, 3.4-rc6 should be good enough. Can you also please double-check the bisect result? I.e. whether 13d83a672e9bbd52 is really broken and the commit right before that really works (i.e. da64c6fc4aba6f02aa800db)? I'm asking because that commit only extracts a bit of code into a separate function, so it should have zero effect.
Comment 33 daniel 2012-05-29 02:34:33 UTC
this was the final bisect screen:
cat bisect-final.txt 
git bisect bad | tee -a /root/bisect.log
1519b9956eb4b4180fa3f47c73341463cdcfaa37 is the first bad commit
commit 1519b9956eb4b4180fa3f47c73341463cdcfaa37
Author: Keith Packard <keithp@keithp.com>
Date:   Sat Aug 6 10:35:34 2011 -0700

    drm/i915: Fix PCH port pipe select in CPT disable paths
    
    CPT pipe select is different from previous generations (using two bits
    instead of one). All of the paths from intel_disable_pch_ports were
    not making this distinction.
    
    Mode setting with pipe A turned off would then also force all outputs                                                                                                                                                         
    on pipe B to get turned off as the disable code would mistakenly                                                                                                                                                              
    decide that all of these outputs were on pipe A and turn them off.                                                                                                                                                            
                                                                                                                                                                                                                                  
    This is an extension of the CPT DP disable fix (why didn't I fix this then?)                                                                                                                                                  
                                                                                                                                                                                                                                  
    Signed-off-by: Keith Packard <keithp@keithp.com>                                                                                                                                                                              
    Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>                                                                                                                                                                          
                                                                                                                                                                                                                                  
:040000 040000 5fb94b34dcaeed70c5da97a371fc2c13a62ddc60 99272b6f031b75fc9f91d8c08abba0d70cc9a527 M      drivers
Comment 34 Daniel Vetter 2012-05-29 02:40:17 UTC
Oops, I've mixed things up, that makes quite a bit more sense.
Comment 35 Jesse Barnes 2012-06-21 12:33:33 UTC
Ok let's try to tackle these one at a time.  With current kernels, is the register dump still identical between the fresh boot and after a lid close & open?

If so, does this patch help?  Also try "intel_reg_write 0xc7204 0x3" as root before doing the close/open.


--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -1885,8 +1885,8 @@ static void intel_disable_pch_ports(struct drm_i915_privat
 {
        u32 reg, val;
 
-       val = I915_READ(PCH_PP_CONTROL);
-       I915_WRITE(PCH_PP_CONTROL, val | PANEL_UNLOCK_REGS);
+//     val = I915_READ(PCH_PP_CONTROL);
+//     I915_WRITE(PCH_PP_CONTROL, val | PANEL_UNLOCK_REGS);
 
        disable_pch_dp(dev_priv, pipe, PCH_DP_B, TRANS_DP_PORT_SEL_B);
        disable_pch_dp(dev_priv, pipe, PCH_DP_C, TRANS_DP_PORT_SEL_C);
Comment 36 Daniel Vetter 2012-08-21 07:27:02 UTC
Please test this patch here:

http://cgit.freedesktop.org/~danvet/drm-intel/patch/?id=e9a851ed634628489ca4a392740694d0ded78cb9

Symptoms seem to match, and if it tests out ok I can forward it to -fixes for 3.6, cc: stable.
Comment 37 daniel 2012-08-21 22:32:42 UTC
(In reply to comment #36)
> Please test this patch here:
> 
> http://cgit.freedesktop.org/~danvet/drm-intel/patch/?id=e9a851ed634628489ca4a392740694d0ded78cb9
> 
> Symptoms seem to match, and if it tests out ok I can forward it to -fixes for
> 3.6, cc: stable.

tested.good. yay! I am happy now :) thx!
Comment 38 Florian Mickler 2012-09-05 20:40:03 UTC
A patch referencing this bug report has been merged in Linux v3.6-rc4:

commit b70ad586162609141f0aa9eb34790f31a8954f89
Author: Xu, Anhua <anhua.xu@intel.com>
Date:   Mon Aug 13 03:08:33 2012 +0000

    drm/i915: fix wrong order of parameters in port checking functions

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.