Bug 57365 - [i945GM regression] kernel 3.7-rc1 onwards. backlight on, screen dark after KMS kicks in during boot
[i945GM regression] kernel 3.7-rc1 onwards. backlight on, screen dark after K...
Status: RESOLVED FIXED
Product: DRI
Classification: Unclassified
Component: DRM/Intel
XOrg git
All Linux (All)
: high major
Assigned To: Mika Kuoppala
Intel GFX Bugs mailing list
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-21 12:15 UTC by Tomas M.
Modified: 2013-09-10 09:13 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg with drm.debug=0xe on kernel 3.7-rc4 (456.67 KB, text/plain)
2012-11-21 12:15 UTC, Tomas M.
no flags Details
working kernel intel reg dump (15.40 KB, text/plain)
2012-11-23 11:08 UTC, Tomas M.
no flags Details
nonworking kernel intel reg dump (13.18 KB, text/plain)
2012-11-23 11:09 UTC, Tomas M.
no flags Details
/var/log/gdm/:0.log for a working kernel (17.67 KB, text/plain)
2012-12-05 23:24 UTC, Tomas M.
no flags Details
/var/log/X11/xorg.0.log for a working kernel (23.04 KB, text/plain)
2012-12-05 23:26 UTC, Tomas M.
no flags Details
/var/log/gdm/:0.log for a not working kernel (17.68 KB, text/plain)
2012-12-05 23:27 UTC, Tomas M.
no flags Details
/var/log/X11/xorg.0.log for a not working kernel (23.06 KB, text/plain)
2012-12-05 23:27 UTC, Tomas M.
no flags Details
intel reg dump with x running in the background. not working kernel (13.73 KB, text/plain)
2012-12-05 23:46 UTC, Tomas M.
no flags Details
intel_reg_dump on a boot that for some reason worked! (13.71 KB, text/plain)
2012-12-05 23:48 UTC, Tomas M.
no flags Details
boot system, after kms, plugged monitor and restored the broken display (15.18 KB, text/plain)
2013-01-22 20:42 UTC, Tomas M.
no flags Details
backporting a commit which fails on 3.8 (5.14 KB, text/plain)
2013-03-06 12:14 UTC, Tomas M.
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tomas M. 2012-11-21 12:15:22 UTC
Created attachment 70363 [details]
dmesg with drm.debug=0xe on kernel 3.7-rc4

as a reference, this is the original thread in the dri-devel mailing list.

http://lists.freedesktop.org/archives/dri-devel/2012-November/030565.html

Computer boots correctly, but right after the mode is set by the i915 kernel module. the display goes dark (backlight is on).

The display is restored with a suspend-resume cycle.


Attached is a dmesg output with drm.debug=0xe in the kernel boot parameters.


i did try to bisect this between 3.6 and 3.7-rc1 but other bugs in between make this impossible. (lockups everywhere).

3.6.6 works correctly.


PS. The dirty mark on my kernel is due to skipping depmod with my build script. The kernel has not been modified in any way, and no binary blobs are used with it.
Comment 1 Daniel Vetter 2012-11-21 12:31:09 UTC
Ok a few things for you to test:

- Does this still happen with the latest drm-intel-nightly tree from http://cgit.freedesktop.org/~danvet/drm-intel

- For unrelated reasons I've had to backport the current drm-intel-next onto 3.6: http://cgit.freedesktop.org/~danvet/drm/log/?h=backport-3.6 Can you try to reproduce the bug there and if you can reproduce, attempt a bisect? The commits are all pretty much the same, so should help a lot (if it works out).
Comment 2 Tomas M. 2012-11-21 23:33:14 UTC
Ok, im building the nightly branch right now.

will test with the backport later too.


one additional manifestation on this kernel:

the gnome-shell overview which should fade the desktop background, just fades it to a black background instead of a darker version of it.

maybe its unrelated.
Comment 3 Tomas M. 2012-11-22 22:23:01 UTC
(In reply to comment #1)
> Ok a few things for you to test:
> 
> - Does this still happen with the latest drm-intel-nightly tree from
> http://cgit.freedesktop.org/~danvet/drm-intel
> 
> - For unrelated reasons I've had to backport the current drm-intel-next onto
> 3.6: http://cgit.freedesktop.org/~danvet/drm/log/?h=backport-3.6 Can you try
> to reproduce the bug there and if you can reproduce, attempt a bisect? The
> commits are all pretty much the same, so should help a lot (if it works out).


Both branches have the same result.

bisected backport-3.6 to the following commit:

24929352481f085c5f85d4d4cbc919ddf106d381
drm/i915: read out the modeset hw state at load and resume time


does it make sense? i could not revert it on linus's tree
Comment 4 Daniel Vetter 2012-11-23 09:22:21 UTC
Can you please grab the latest intel-gpu-tools (http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/) and then run the intel_reg_dumper tool both on a working and a broken kernel?

The bisected commit makes some sense, since after that we drop a bunch of spurious state-transitions. Could be that those papered over some bugs and now we can't enable the hw correctly any longer.
Comment 5 Tomas M. 2012-11-23 10:51:12 UTC
(In reply to comment #4)
> Can you please grab the latest intel-gpu-tools
> (http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/) and then run the
> intel_reg_dumper tool both on a working and a broken kernel?
> 
> The bisected commit makes some sense, since after that we drop a bunch of
> spurious state-transitions. Could be that those papered over some bugs and
> now we can't enable the hw correctly any longer.

what do you make of this error?

#  intel_reg_dumper
Gen2/3 Ranges are not supported. Please use unsafe access.Aborted

i cant find out how to use "unsafe access". any hint?
Comment 6 Tomas M. 2012-11-23 11:08:39 UTC
Created attachment 70472 [details]
working kernel intel reg dump
Comment 7 Tomas M. 2012-11-23 11:09:41 UTC
Created attachment 70473 [details]
nonworking kernel intel reg dump
Comment 8 Tomas M. 2012-11-23 11:11:28 UTC
to disable safe access i had to modify the source code;

-                 intel_register_access_init(pci_dev, 1);
+                 intel_register_access_init(pci_dev, 0);


hope this was correct.

thanks!
Comment 9 Chris Wilson 2012-12-05 13:17:58 UTC
It looks very much like X failed to start. Can you please attach the working/non-working Xorg.logs as well?
Comment 10 Chris Wilson 2012-12-05 13:18:36 UTC
Hmm, might need to be the output of the dm as well, such as the xdm.log or /var/log/gdm/:0 etc.
Comment 11 Tomas M. 2012-12-05 17:27:07 UTC
(In reply to comment #9)
> It looks very much like X failed to start. Can you please attach the
> working/non-working Xorg.logs as well?

Sorry Cris.

I can provide this info later on just in case. but keep in mind:

I can log myself blindly into GNOME3.
in these cases i dont think i let gdm finish loading. i used a virtual console.
Comment 12 Tomas M. 2012-12-05 23:24:50 UTC
Created attachment 71055 [details]
/var/log/gdm/:0.log for a working kernel
Comment 13 Tomas M. 2012-12-05 23:26:36 UTC
Created attachment 71056 [details]
/var/log/X11/xorg.0.log for a working kernel
Comment 14 Tomas M. 2012-12-05 23:27:12 UTC
Created attachment 71057 [details]
/var/log/gdm/:0.log for a not working kernel
Comment 15 Tomas M. 2012-12-05 23:27:50 UTC
Created attachment 71058 [details]
/var/log/X11/xorg.0.log for a not working kernel
Comment 16 Tomas M. 2012-12-05 23:28:48 UTC
here are the requested logs.

from a quick inspection i dont find anything relevant...but im no expert :(
Comment 17 Chris Wilson 2012-12-05 23:33:39 UTC
Hmm, ok. My suspicion was based on the non-working intel_reg_dump having the fbcon attached to the scanout, so I presumed that X failed leaving the system in disarray. Can you (blindly) regrab that intel_reg_dump after X starts? (Or use ssh.)
Comment 18 Tomas M. 2012-12-05 23:46:47 UTC
Created attachment 71061 [details]
intel reg dump with x running in the background. not working kernel
Comment 19 Tomas M. 2012-12-05 23:48:16 UTC
Created attachment 71062 [details]
intel_reg_dump on a boot that for some reason worked!

ok. booted to do as asked. and to my surprise, this boot worked correctly.

maybe a race condition during KMS init?

maybe you can make out something out of this reg dump too

hope it helps
Comment 20 Tomas M. 2012-12-17 10:42:33 UTC
sorry to bump but the line has been a bit quiet. is there anything else i can test? i'd think this bug would hit more people. noone uses 945gm anymore?


BTW, tested my distribution's 3.7.0 build and failed too.
Comment 21 Adam 2012-12-22 15:34:57 UTC
This bug is probably a duplicate of #53926
https://bugs.freedesktop.org/show_bug.cgi?id=53926
Comment 22 hadrons123 2013-01-21 15:23:41 UTC
I have an ivybridge lenovo y580 and I have the same issue starting from kernel 3.7.X series in Fedora , Debian sid with experimental kernel, Arch. Until Linux kernel 3.6.11 there were no issues.
Comment 23 diolu 2013-01-22 08:54:22 UTC
I have a kernel crash at the insertion of the i915 module with the kernel 3.7.3-1 in archlinux. Kernels 3.6.x works fine if I boot with the VIDEO=SVIDEO-1:d kernel parameter. I have put a comment in this bug (with a link to logs): https://bugs.archlinux.org/task/33062#comment104966
Comment 24 Daniel Vetter 2013-01-22 09:11:27 UTC
Thomas, sorry for the long delay in answering. I've just looked at the reg dumps, but unfortunately nothing important is different :( We do know of random lvds issues in general, so there's probably still something broken in our code somewhere. And big timing changes while booting break it or fix it again.

The only indication is the pipe B underrun thus far.

Another thing to try is whether subsequent modesets fix things again, i.e. in X enable/disable the lvds output a few times. Easiest way to do it is to connect a 2nd screen ...

To everyone else: Since there are tons of different ways to end up with a black screen, please file your own bug reports about your issues - if we intermingle different bugs in the same bug report we'll quickly loose track of things.
Comment 25 Tomas M. 2013-01-22 10:25:57 UTC
(In reply to comment #24)
The only indication is the pipe B underrun thus far.
> 
> Another thing to try is whether subsequent modesets fix things again, i.e.
> in X enable/disable the lvds output a few times. Easiest way to do it is to
> connect a 2nd screen ...

Hi Daniel,

Thanks for the reply. plugging a monitor restores the display correctly. it happened in the middle of a boot process (no X started yet) do you still need X running for the test?
Comment 26 Daniel Vetter 2013-01-22 10:44:31 UTC
So just just plugging in a 2nd screen fixes things up, even when X is not running?
Comment 27 Tomas M. 2013-01-22 10:48:18 UTC
(In reply to comment #26)
> So just just plugging in a 2nd screen fixes things up, even when X is not
> running?

yes. in the middle of the boot process.

1. only internal panel connected.
2. turn notebook on.
3. kms kicks in, screen goes dark.
4. plug in external monitor.
5. both screens light up cloned (systemd still doing its thing).
6. unplug external screen
7. profit ?
Comment 28 Daniel Vetter 2013-01-22 11:55:20 UTC
Certainyl interesting. Can you please attach a reg dump after you've plugged in your screen?
Comment 29 Tomas M. 2013-01-22 20:42:44 UTC
Created attachment 73475 [details]
boot system, after kms, plugged monitor and restored the broken display

here it goes, same as previous test. reg_dump attached
Comment 30 Tomas M. 2013-01-22 21:27:45 UTC
shall i add that by the time the intel-reg-dump tool was ran, X11 was already up and running.
do you need a dump without xorg?
Comment 31 Tomas M. 2013-03-04 21:57:44 UTC
hello.

3.9-rc1 appears to have fixed the problem for me. rebooted 10 times aprox with success!.

Daniel. do you need to know which commit could have fixed it? was it by luck (race condition harder to trigger?) or was there a fix i missed?
Comment 32 Daniel Vetter 2013-03-04 22:51:13 UTC
We've improved the handling of the panel fitter a bit for older platforms in 3.9-rc1, so could be that the underlying race is now finally fixed. Unfortunately those patches still have a few issues (regressions) left, so I don't like to backport them to stable kernels for now.

Relevant patches are

commit 24a1f16de97c4cf0029d9acd04be06db32208726
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Fri Feb 8 16:35:37 2013 +0200

    drm/i915: disable shared panel fitter for pipe

and

commit 9d6d9f19e8146fa24903cb561e204a22232740e3
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Fri Feb 8 16:35:38 2013 +0200

    drm/i915: clean up panel fitter handling in lvds

If those two indeed fix things for you, we could backport them once the regression they introduce is tackled.

If those aren't the fixes, a reverse bisect to hunt for the first good commit would be interesting.
Comment 33 Tomas M. 2013-03-05 20:32:36 UTC
Hi Daniel,

yes, indeed reverting both patches on 3.9-rc1 brings the issue back.

im still testing more just in case but it seems to be fixed.

Thanks a lot for all your help.

backporting, even if its interesting, is not of much consequence to me since im usually running the -rc series.

Of course, if you need additional testing for those regression fixes, just let me know.
Comment 34 Daniel Vetter 2013-03-06 09:20:09 UTC
It'd be great if you can test whether backport the 2nd patch on top of 3.8.1 fixes things, too. Then we could submit that patch to stable and close this issue here.
Comment 35 Tomas M. 2013-03-06 12:14:50 UTC
Created attachment 76012 [details]
backporting a commit which fails on 3.8

n reply to comment #34)
> It'd be great if you can test whether backport the 2nd patch on top of 3.8.1
> fixes things, too. Then we could submit that patch to stable and close this
> issue here.


hello Daniel,

Sorry, but adding commit drm/i915: clean up panel fitter handling in lvds
fails on 3.8



 patching file drivers/gpu/drm/i915/intel_display.c
Hunk #1 succeeded at 3664 (offset 49 lines).
patching file drivers/gpu/drm/i915/intel_lvds.c
Hunk #1 FAILED at 51.
Hunk #2 succeeded at 92 with fuzz 1 (offset -59 lines).
Hunk #3 succeeded at 138 with fuzz 2 (offset -57 lines).
Hunk #4 FAILED at 224.
Hunk #5 succeeded at 413 (offset -55 lines).
Hunk #6 FAILED at 1107.
3 out of 6 hunks FAILED -- saving rejects to file drivers/gpu/drm/i915/intel_lvds.c.rej

im attaching the patch i generated from git as a reference

I do not know how to fix these conflicts.
Comment 36 Daniel Vetter 2013-03-06 13:39:11 UTC
Ok, I've mixed things up too much and meant to edit another bug.

It looks like there's no simple way to backport the 2nd patch, I'll hence close this one here for now as fixed.
Comment 37 Lucas 2013-04-06 19:10:23 UTC
PLease check this thread for backporting and inclusion in 3.8 series:

http://comments.gmane.org/gmane.linux.kernel.stable/48872

Thanks,
Lucas
Comment 38 Lucas 2013-04-06 19:34:51 UTC
I forgot to mention that, as 3.6.11 is the last working kernel, both Linux 3.7 and Linux 3.8 need backporting.

Thanks,
Lucas
Comment 39 Daniel Vetter 2013-04-07 19:23:27 UTC
Lucas: Can you pls test 3.9-rc kernels and confirm that these two patches indeed fix your regression, too?
Comment 40 Jani Nikula 2013-09-10 09:13:11 UTC
I'm closing this bug as fixed since 3.9.

Please reopen if you face this issue with kernel 3.10 or later.

Kernel versions prior to 3.10 that had the issue are EOL, and there will be no more backports. Sorry.