Summary: | [snb regression bisected] Attaching VGA or HDMI cable causes LVDS screen corruption | ||
---|---|---|---|
Product: | DRI | Reporter: | daniel <sec> |
Component: | DRM/Intel | Assignee: | Jani Nikula <jani.nikula> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | blocker | ||
Priority: | medium | CC: | ben, chris, daniel, florian, jbarnes |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
I think a starting point is to compile intel_reg_dumper from http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/ (which may actually be available from your local distributor now!) and grab the registers before and after closing the lid. HDMI cable: 1. xdm/xorg started -> reg_dump via ssh 2. connected HDMI -> rebooted (screen corruption) -> reg_dump via ssh LID: 1. xdm/xorg started -> reg_dump via ssh 2. lid closed and re-opened (screen corruption) -> reg_dump via ssh (diff shows no differences) Vanilla Kernel 3.2.1 and 3.1.9 Gentoo Dell E6420 Created attachment 55730 [details]
without hdmi cable
Created attachment 55731 [details]
with hdmi cable
Created attachment 55732 [details]
lid open
Created attachment 55733 [details]
lid closed and reopened
the screen corruption is gone, as soon as the kernel? idle timer blanks the screen for power saving (after 5 minutes idle time?). After a key press the screen is normal till the next closing of lid. No change whatsoever when closing/opening the lid. Plugging the HDMI cable obviously brings up the second pipe, and switches the first pipe to a new resolution. But no clues there I am afraid. Next up is append drm.debug=0xe to your kernel boot parameters and attach the debug logs. Hopefully one of the other guys has an inspired guess... drm.debug=0xe LID: 1. dmesg right after boot 2. dmesg after lid close cycle (screen corruption after reopening the lid) Created attachment 55755 [details]
right after boot xdm start screen everything fine
Created attachment 55756 [details]
dmesg after closing, 5 seconds wait, re-opening lid with corrupted screen afterwards
Hm, maybe we've got a nice dpms issue here. Can you try to run $ xset dpms force off $ xset dpms force on when the screen corruptions happens from a console within X? Also just to check: the "lid closed and reopened" reg dump, is that while the screen corruptions are still there or afterwards? 1. LID dpms force off/on doesnt change anything. xdm -> logged in -> konsole prepared with xset dpms force off -> lid close -> screen corruption -> blind pressed enter and entered blind the xset dpms force on command -> nothing happened, screen corruption still there. when the screen corruptions happens from a console within X? Also just to check: the "lid closed and reopened" reg dump, is that while the screen corruptions are still there or afterwards? the reg dump was created while the screen corruption still there. 2. HDMI cable System off -> connected HDMI cable -> system boot -> during kernel startup same content on hdmi and lvds -> xdm start -> hdmi OK xdm login screen -> lvds screen corruption (the screen corruption looks the same for lid close and for hdmi) created dmesg. Created attachment 55790 [details]
drm.debug connected hdmi cable screen corruption on lvds as soon as xdm starts
Created attachment 55791 [details]
picture of HDMI cable attached screen corruption
Created attachment 55792 [details]
HDMI cable run, waited for idle blank -> dmesg
Created attachment 55793 [details]
HDMI run, dmesg after key press after idle blank --> screen is good
news: the directory /sys/class/backlight/ contained two subdirs. "intel_backlight and dell_backlight" I disabled the dell_laptop option under x86 and now, there is some different behavior. Still screen corruption but when i close and reopen the lid the screen is now black. And there are some light/shady characters from the boot process like [drm] initialized drm 1.1.0 etc. wrong. The change is not from the kernel option. The change is from the kernel command line acpi=off. 3.0.17 is _not_ affected --> this is a regression cp .config 3.1.10 to 3.0.17 ; make oldconfig everything works. LID and HDMI --> no screen corruptions On Fri, Jan 20, 2012 at 02:20:18AM +0000, bugzilla-daemon@freedesktop.org wrote: > https://bugs.freedesktop.org/show_bug.cgi?id=44876 > > --- Comment #20 from daniel <sec@dschroeder.info> 2012-01-19 18:20:18 PST --- > 3.0.17 is _not_ affected --> this is a regression > > cp .config 3.1.10 to 3.0.17 ; make oldconfig > everything works. LID and HDMI --> no screen corruptions Hm, that's bad and I've got no idea - it might very well be something going on in acpi, too. Can you please bisect this one with git? Knowing the bad commit usually helps a _lot_. Thanks, Daniel LKML thread with the same problem. Adding as reference: http://lkml.org/lkml/2011/9/14/299 Did 3.1 have FBC enabled? One of the first side-effects of FBC was LVDS corruption during hotplug. All the errors occurred whilst using FBC. Can you please try i915.i915_enable_fbc=0 or a more recent kernel? tested this: i915.i915_enable_fbc=0 with Kernel 3.3.6. tested how: booted -> xdm login screen -> closed the lid -> waited 10 seconds -> reopened -> screen corruption. I am pretty sure, that this only happens on this type/model of notebook (E6420). A collegue using stock Ubuntu with the same notebook has the same troubles. If it would be more widespread more people would complain. So, possible no faults/errors in the Intel driver and may be a specific firmware/bios problem of the vendor... You've mentioned in comment #20 that this is a regression. Can you please try to bisect this? I guess otherwise we're pretty much stuck. kernel 3.1-rc3 ==> bad kernel 3.1-rc2 ==> good git bisect log git bisect start # bad: [fcb8ce5cfe30ca9ca5c9a79cdfe26d1993e65e0c] Linux 3.1-rc3 git bisect bad fcb8ce5cfe30ca9ca5c9a79cdfe26d1993e65e0c # good: [93ee7a9340d64f20295aacc3fb6a22b759323280] Linux 3.1-rc2 git bisect good 93ee7a9340d64f20295aacc3fb6a22b759323280 # bad: [fbad8991ef9d41d1fad587dff23fa6deff01af83] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband git bisect bad fbad8991ef9d41d1fad587dff23fa6deff01af83 # bad: [870d3be1249b1397395ed3164987397993a16d91] Merge branch 'docs-move' of git://git.kernel.org/pub/scm/linux/kernel/git/rdunlap/linux-docs git bisect bad 870d3be1249b1397395ed3164987397993a16d91 # good: [cedf03bd9aa54d1d7a9065dddc9e76505f476b12] x86: fix mm/fault.c build git bisect good cedf03bd9aa54d1d7a9065dddc9e76505f476b12 # good: [798c794df81e0a1af62c1d7e48b464f4096f3b9a] Docs: MSI-HOWTO: MSI -> MSIs git bisect good 798c794df81e0a1af62c1d7e48b464f4096f3b9a # bad: [c3613de92ebea302137d21d8938421c3f88d8741] drm/i915: Can't do accurate vblank timestamps with UMS git bisect bad c3613de92ebea302137d21d8938421c3f88d8741 # bad: [4e6343898fe7eed6b3c0c3c809347bc88d5b4a1e] drm/i915: Remove unused 'reg' argument to dp_pipe_enabled git bisect bad 4e6343898fe7eed6b3c0c3c809347bc88d5b4a1e # good: [ed10fca9c351c83ab89a97f3515089e0d36bdccc] drm/i915: Leave LVDS registers unlocked git bisect good ed10fca9c351c83ab89a97f3515089e0d36bdccc # bad: [1519b9956eb4b4180fa3f47c73341463cdcfaa37] drm/i915: Fix PCH port pipe select in CPT disable paths git bisect bad 1519b9956eb4b4180fa3f47c73341463cdcfaa37 Ok, so according to your bisect log this commit should be the culprit: commit 13d83a672e9bbd52ae82c2f611dfd845a957e8b4 Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Wed Aug 3 12:59:20 2011 -0700 drm/i915: split out PCH refclk update code Can you also please test the latest 3.4 release? The pch refclock code changed quite a bit lately, so double-checking whether you still have the same problem would be good (and any patches to fix things would be on top of the latest code anyway). tested it already with linux-3.4-rc6 ==> bad. Should I test it again with 3.4 final? No, 3.4-rc6 should be good enough. Can you also please double-check the bisect result? I.e. whether 13d83a672e9bbd52 is really broken and the commit right before that really works (i.e. da64c6fc4aba6f02aa800db)? I'm asking because that commit only extracts a bit of code into a separate function, so it should have zero effect. this was the final bisect screen: cat bisect-final.txt git bisect bad | tee -a /root/bisect.log 1519b9956eb4b4180fa3f47c73341463cdcfaa37 is the first bad commit commit 1519b9956eb4b4180fa3f47c73341463cdcfaa37 Author: Keith Packard <keithp@keithp.com> Date: Sat Aug 6 10:35:34 2011 -0700 drm/i915: Fix PCH port pipe select in CPT disable paths CPT pipe select is different from previous generations (using two bits instead of one). All of the paths from intel_disable_pch_ports were not making this distinction. Mode setting with pipe A turned off would then also force all outputs on pipe B to get turned off as the disable code would mistakenly decide that all of these outputs were on pipe A and turn them off. This is an extension of the CPT DP disable fix (why didn't I fix this then?) Signed-off-by: Keith Packard <keithp@keithp.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> :040000 040000 5fb94b34dcaeed70c5da97a371fc2c13a62ddc60 99272b6f031b75fc9f91d8c08abba0d70cc9a527 M drivers Oops, I've mixed things up, that makes quite a bit more sense. Ok let's try to tackle these one at a time. With current kernels, is the register dump still identical between the fresh boot and after a lid close & open? If so, does this patch help? Also try "intel_reg_write 0xc7204 0x3" as root before doing the close/open. --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -1885,8 +1885,8 @@ static void intel_disable_pch_ports(struct drm_i915_privat { u32 reg, val; - val = I915_READ(PCH_PP_CONTROL); - I915_WRITE(PCH_PP_CONTROL, val | PANEL_UNLOCK_REGS); +// val = I915_READ(PCH_PP_CONTROL); +// I915_WRITE(PCH_PP_CONTROL, val | PANEL_UNLOCK_REGS); disable_pch_dp(dev_priv, pipe, PCH_DP_B, TRANS_DP_PORT_SEL_B); disable_pch_dp(dev_priv, pipe, PCH_DP_C, TRANS_DP_PORT_SEL_C); Please test this patch here: http://cgit.freedesktop.org/~danvet/drm-intel/patch/?id=e9a851ed634628489ca4a392740694d0ded78cb9 Symptoms seem to match, and if it tests out ok I can forward it to -fixes for 3.6, cc: stable. (In reply to comment #36) > Please test this patch here: > > http://cgit.freedesktop.org/~danvet/drm-intel/patch/?id=e9a851ed634628489ca4a392740694d0ded78cb9 > > Symptoms seem to match, and if it tests out ok I can forward it to -fixes for > 3.6, cc: stable. tested.good. yay! I am happy now :) thx! A patch referencing this bug report has been merged in Linux v3.6-rc4: commit b70ad586162609141f0aa9eb34790f31a8954f89 Author: Xu, Anhua <anhua.xu@intel.com> Date: Mon Aug 13 03:08:33 2012 +0000 drm/i915: fix wrong order of parameters in port checking functions |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 55706 [details] Picture of screen corruption see attached image. Same happened when LID got closed and reopened. A workaround is to disable ACPI/Power Management in Kernel. <may be related dmesg output> [drm] Changing LVDS panel from (-hsync, -vsync) to (+hsync, -vsync) </dmesg>