Description
Tomas M.
2012-11-21 12:15:22 UTC
Ok a few things for you to test: - Does this still happen with the latest drm-intel-nightly tree from http://cgit.freedesktop.org/~danvet/drm-intel - For unrelated reasons I've had to backport the current drm-intel-next onto 3.6: http://cgit.freedesktop.org/~danvet/drm/log/?h=backport-3.6 Can you try to reproduce the bug there and if you can reproduce, attempt a bisect? The commits are all pretty much the same, so should help a lot (if it works out). Ok, im building the nightly branch right now. will test with the backport later too. one additional manifestation on this kernel: the gnome-shell overview which should fade the desktop background, just fades it to a black background instead of a darker version of it. maybe its unrelated. (In reply to comment #1) > Ok a few things for you to test: > > - Does this still happen with the latest drm-intel-nightly tree from > http://cgit.freedesktop.org/~danvet/drm-intel > > - For unrelated reasons I've had to backport the current drm-intel-next onto > 3.6: http://cgit.freedesktop.org/~danvet/drm/log/?h=backport-3.6 Can you try > to reproduce the bug there and if you can reproduce, attempt a bisect? The > commits are all pretty much the same, so should help a lot (if it works out). Both branches have the same result. bisected backport-3.6 to the following commit: 24929352481f085c5f85d4d4cbc919ddf106d381 drm/i915: read out the modeset hw state at load and resume time does it make sense? i could not revert it on linus's tree Can you please grab the latest intel-gpu-tools (http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/) and then run the intel_reg_dumper tool both on a working and a broken kernel? The bisected commit makes some sense, since after that we drop a bunch of spurious state-transitions. Could be that those papered over some bugs and now we can't enable the hw correctly any longer. (In reply to comment #4) > Can you please grab the latest intel-gpu-tools > (http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/) and then run the > intel_reg_dumper tool both on a working and a broken kernel? > > The bisected commit makes some sense, since after that we drop a bunch of > spurious state-transitions. Could be that those papered over some bugs and > now we can't enable the hw correctly any longer. what do you make of this error? # intel_reg_dumper Gen2/3 Ranges are not supported. Please use unsafe access.Aborted i cant find out how to use "unsafe access". any hint? Created attachment 70472 [details]
working kernel intel reg dump
Created attachment 70473 [details]
nonworking kernel intel reg dump
to disable safe access i had to modify the source code; - intel_register_access_init(pci_dev, 1); + intel_register_access_init(pci_dev, 0); hope this was correct. thanks! It looks very much like X failed to start. Can you please attach the working/non-working Xorg.logs as well? Hmm, might need to be the output of the dm as well, such as the xdm.log or /var/log/gdm/:0 etc. (In reply to comment #9) > It looks very much like X failed to start. Can you please attach the > working/non-working Xorg.logs as well? Sorry Cris. I can provide this info later on just in case. but keep in mind: I can log myself blindly into GNOME3. in these cases i dont think i let gdm finish loading. i used a virtual console. Created attachment 71055 [details]
/var/log/gdm/:0.log for a working kernel
Created attachment 71056 [details]
/var/log/X11/xorg.0.log for a working kernel
Created attachment 71057 [details]
/var/log/gdm/:0.log for a not working kernel
Created attachment 71058 [details]
/var/log/X11/xorg.0.log for a not working kernel
here are the requested logs. from a quick inspection i dont find anything relevant...but im no expert :( Hmm, ok. My suspicion was based on the non-working intel_reg_dump having the fbcon attached to the scanout, so I presumed that X failed leaving the system in disarray. Can you (blindly) regrab that intel_reg_dump after X starts? (Or use ssh.) Created attachment 71061 [details]
intel reg dump with x running in the background. not working kernel
Created attachment 71062 [details]
intel_reg_dump on a boot that for some reason worked!
ok. booted to do as asked. and to my surprise, this boot worked correctly.
maybe a race condition during KMS init?
maybe you can make out something out of this reg dump too
hope it helps
sorry to bump but the line has been a bit quiet. is there anything else i can test? i'd think this bug would hit more people. noone uses 945gm anymore? BTW, tested my distribution's 3.7.0 build and failed too. This bug is probably a duplicate of #53926 https://bugs.freedesktop.org/show_bug.cgi?id=53926 I have an ivybridge lenovo y580 and I have the same issue starting from kernel 3.7.X series in Fedora , Debian sid with experimental kernel, Arch. Until Linux kernel 3.6.11 there were no issues. I have a kernel crash at the insertion of the i915 module with the kernel 3.7.3-1 in archlinux. Kernels 3.6.x works fine if I boot with the VIDEO=SVIDEO-1:d kernel parameter. I have put a comment in this bug (with a link to logs): https://bugs.archlinux.org/task/33062#comment104966 Thomas, sorry for the long delay in answering. I've just looked at the reg dumps, but unfortunately nothing important is different :( We do know of random lvds issues in general, so there's probably still something broken in our code somewhere. And big timing changes while booting break it or fix it again. The only indication is the pipe B underrun thus far. Another thing to try is whether subsequent modesets fix things again, i.e. in X enable/disable the lvds output a few times. Easiest way to do it is to connect a 2nd screen ... To everyone else: Since there are tons of different ways to end up with a black screen, please file your own bug reports about your issues - if we intermingle different bugs in the same bug report we'll quickly loose track of things. (In reply to comment #24) The only indication is the pipe B underrun thus far. > > Another thing to try is whether subsequent modesets fix things again, i.e. > in X enable/disable the lvds output a few times. Easiest way to do it is to > connect a 2nd screen ... Hi Daniel, Thanks for the reply. plugging a monitor restores the display correctly. it happened in the middle of a boot process (no X started yet) do you still need X running for the test? So just just plugging in a 2nd screen fixes things up, even when X is not running? (In reply to comment #26) > So just just plugging in a 2nd screen fixes things up, even when X is not > running? yes. in the middle of the boot process. 1. only internal panel connected. 2. turn notebook on. 3. kms kicks in, screen goes dark. 4. plug in external monitor. 5. both screens light up cloned (systemd still doing its thing). 6. unplug external screen 7. profit ? Certainyl interesting. Can you please attach a reg dump after you've plugged in your screen? Created attachment 73475 [details]
boot system, after kms, plugged monitor and restored the broken display
here it goes, same as previous test. reg_dump attached
shall i add that by the time the intel-reg-dump tool was ran, X11 was already up and running. do you need a dump without xorg? hello. 3.9-rc1 appears to have fixed the problem for me. rebooted 10 times aprox with success!. Daniel. do you need to know which commit could have fixed it? was it by luck (race condition harder to trigger?) or was there a fix i missed? We've improved the handling of the panel fitter a bit for older platforms in 3.9-rc1, so could be that the underlying race is now finally fixed. Unfortunately those patches still have a few issues (regressions) left, so I don't like to backport them to stable kernels for now. Relevant patches are commit 24a1f16de97c4cf0029d9acd04be06db32208726 Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Fri Feb 8 16:35:37 2013 +0200 drm/i915: disable shared panel fitter for pipe and commit 9d6d9f19e8146fa24903cb561e204a22232740e3 Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Fri Feb 8 16:35:38 2013 +0200 drm/i915: clean up panel fitter handling in lvds If those two indeed fix things for you, we could backport them once the regression they introduce is tackled. If those aren't the fixes, a reverse bisect to hunt for the first good commit would be interesting. Hi Daniel, yes, indeed reverting both patches on 3.9-rc1 brings the issue back. im still testing more just in case but it seems to be fixed. Thanks a lot for all your help. backporting, even if its interesting, is not of much consequence to me since im usually running the -rc series. Of course, if you need additional testing for those regression fixes, just let me know. It'd be great if you can test whether backport the 2nd patch on top of 3.8.1 fixes things, too. Then we could submit that patch to stable and close this issue here. Created attachment 76012 [details] backporting a commit which fails on 3.8 n reply to comment #34) > It'd be great if you can test whether backport the 2nd patch on top of 3.8.1 > fixes things, too. Then we could submit that patch to stable and close this > issue here. hello Daniel, Sorry, but adding commit drm/i915: clean up panel fitter handling in lvds fails on 3.8 patching file drivers/gpu/drm/i915/intel_display.c Hunk #1 succeeded at 3664 (offset 49 lines). patching file drivers/gpu/drm/i915/intel_lvds.c Hunk #1 FAILED at 51. Hunk #2 succeeded at 92 with fuzz 1 (offset -59 lines). Hunk #3 succeeded at 138 with fuzz 2 (offset -57 lines). Hunk #4 FAILED at 224. Hunk #5 succeeded at 413 (offset -55 lines). Hunk #6 FAILED at 1107. 3 out of 6 hunks FAILED -- saving rejects to file drivers/gpu/drm/i915/intel_lvds.c.rej im attaching the patch i generated from git as a reference I do not know how to fix these conflicts. Ok, I've mixed things up too much and meant to edit another bug. It looks like there's no simple way to backport the 2nd patch, I'll hence close this one here for now as fixed. PLease check this thread for backporting and inclusion in 3.8 series: http://comments.gmane.org/gmane.linux.kernel.stable/48872 Thanks, Lucas I forgot to mention that, as 3.6.11 is the last working kernel, both Linux 3.7 and Linux 3.8 need backporting. Thanks, Lucas Lucas: Can you pls test 3.9-rc kernels and confirm that these two patches indeed fix your regression, too? I'm closing this bug as fixed since 3.9. Please reopen if you face this issue with kernel 3.10 or later. Kernel versions prior to 3.10 that had the issue are EOL, and there will be no more backports. Sorry. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.