Description
andreas.sturmlechner
2012-12-30 01:18:12 UTC
Created attachment 72288 [details]
intel-reg-dump-3.8git-noterm.log
OK, either it is more like 9 in 10 times or this is again related to external display usage - I've lost track of that somewhen after the n-th reboot today.
The issue would appear to be a transient failure to read the EDID, but no idea why. Created attachment 72291 [details]
dmesg_3.8git_extdvi-drmdebug.log
dmesg log with drm.debug=6 attached
Created attachment 72294 [details]
dmesg_3.8git_extdvi-drmdebug-v2.log
no actually containing full dmesg output
OK, this seems to always work for LVDS, and never for the external display. Access to a different setup, new results - it seems I can isolate the trouble to the DVI setup. In short: 1) LVDS: OK 2) DP* to display: OK 3) DP* + DVI-D-Adaptor to display: no fbcon * two UltraBase docking stations on different locations Setup (3) is fine with older kernels - except that it usually needs the BIOS to bring up the connection. I have updated to 3.8_rc2 and I'm back to setup (3) for a couple days - it's still the same, and it definitely happens only there. Since it's 100% reproducible I could go on and bisect now, unless someone has an idea what patch could be the problem. Hmm, a bisect would be valuable. They were a few DPCD handling patches that aimed to improve handling of dongles, but knowing what caused the regression here may lead to further improvements. git bisect has ended with ce4a9cc579381bc70b12ebb91c57da31baf8e3b7 being the first bad commit, which doesn't make any sense to me at all. Maybe the actual bad commit is somewhere close before, but I was already far beyond the point of getting a healthy amount of sleep, so I stopped there. I don't have access to that setup now for a few days. Hm, DPCD should affect things on the DP-DVI dongle, we'll use the HDMI encoder in that case. Still, a bisect would be really useful to track things down. I would like to bisect more, but it's hard. During testing it became apparent that this is another timing related issue. For several reboots it sometimes works, then at random it doesn't anymore. The same kernel image that seemingly reproduced the behaviour with absolute certainty weeks ago, suddenly worked again a few days ago. The only thing I can say for sure is that I never saw that behaviour before 3.8 merge window. Just a small notice that it's still present in 3.8_rc6 (instant 1st boot experience). I haven't had time for more kernel building during the last weeks, and I should really outsource this to a beefier machine in the future - unless Lenovo brings out a non-pathetic successor to the X200s. Now something interesting just happened. Coming back to that setup, I had forgotten to re-enable xdm in default runlevel, and once again was greeted with a black screen. Then, after what must have been about half a minute - right before resignation -, suddenly the terminal appeared in a non-native resolution (possibly the one from closed-lid LVDS). I've so far seen that non-native terminal res only when booting open-lid. Observation made with vanilla-kernel-3.8.7. Any possibly related change to EDID or lid handling? I was just trying 3.10 (rc5) for the first time and had a few reboots/cold starts, all with fbcon brought up successfully. Out of experience I won't call it fixed just yet, but it looks good. :) So maybe we are indeed getting better at this DP whack-a-mole game ... /me is hopefully Please update this bug once you're confident that it works (or that it broke again). Sorry, it has already happened again. :/ Someone with a similar problem (but with LvDS) has it working with i915 as a module, will look into that, but then next thing will be looking into compile offload for all the bisecting that is going to be needed. Created attachment 82124 [details]
bisect.log
Well, I spent the day on bisecting but seemed to have failed again. Based on the result I produced a revert and applied that over 3.10.0 sources, but a few reboots later it was the same old trouble again. I'm on the verge of just throwing away that display and be done with it...
...except that the display is not to blame, as I just reproduced the same failure on the family's other shiny new LCD. Either way, the cost of bisecting this is just way too much with the countless reboots required to gain *some* *questionable* security whether it's good/bad. Another release, another try? I was following 3.11 since rc2 and it's all the same. But I don't trust that setup anymore, so I will finally carry over my second docking station just to be sure. OK, this isn't funny anymore: For months, I am fighting with 3.11 RCs to detect my external display (the behaviour had regressed in so far that after fbcon it also didn't bring it up in X anymore), often rebooting several times until it worked. - so I try the same on an other external display: positive, same failure - so I transfer my other docking station to reproduce it: positive Then, having ruled out hardware issures with my DVI display, docking stations, DVI cables, I think to myself: let's try out good old 3.4 series, because I can't test this DP-DVI dongle on an other system (soon, there'll be a Haswell box to the rescue), so at least I can try to reproduce success on older kernels to rule out a broken dongle. - so, I build and boot into 3.4.61 once: success - so, I build and boot into 3.10.11 afterwards: success with fbcon as well as X (??) - so, I boot again into 3.11.0: success (fbcon and X) (???) All the while, I have zero problems in my flat where the Thinkpad is connected to a Displayport screen. Every time I think I could come to a conclusion, there comes my system and hits me right back in my face. Today my system is back to normal. No fbcon/X with 3.11 all the time, X only after manually enabling output on the external display (while often it isn't even detected). The only constant being that 3.4.61/62 (anything pre-3.8) works all the time. 3.12.0 Update: Blank screen (no fbcon, no X) on second try, so nothing has changed. Meanwhile, 3.4 (.68) runs great and I'm glad it continues to be supported for some time. (In reply to comment #23) > 3.12.0 Update: Blank screen (no fbcon, no X) on second try, so nothing has > changed. > > Meanwhile, 3.4 (.68) runs great and I'm glad it continues to be supported > for some time. Hm, can you please try to bisect where this regression has been introduced? (In reply to comment #24) > (In reply to comment #23) > > 3.12.0 Update: Blank screen (no fbcon, no X) on second try, so nothing has > > changed. > > > > Meanwhile, 3.4 (.68) runs great and I'm glad it continues to be supported > > for some time. > > Hm, can you please try to bisect where this regression has been introduced? Either that, or a retry of drm-intel-nightly http://cgit.freedesktop.org/~danvet/drm-intel/log/?h=drm-intel-nightly which has some DP dongle fixes since the last try. I'm a few boots into current drm-intel-nightly, and so far it looks good! Also threw in Linus' git master (insta-fail), while the following reboot into the drm-intel-nightly image was successful once more. Too soon once more. Unfortunately, the blank screen is back again... (In reply to comment #24) > (In reply to comment #23) > > 3.12.0 Update: Blank screen (no fbcon, no X) on second try, so nothing has > > changed. > > > Hm, can you please try to bisect where this regression has been introduced? I tried 3.8/3.9/3.10 again and now X never comes up when there's no fbcon. So I guess there was not a further regression in kernel, but rather a change in xorg-server. Created attachment 91268 [details]
20131228-2203_3.13.0-rc4+_dmesg.log
attaching new dmesg, in one year the output has changed a bit.
(In reply to comment #26) > I'm a few boots into current drm-intel-nightly, and so far it looks good! > Also threw in Linus' git master (insta-fail), while the following reboot > into the drm-intel-nightly image was successful once more. Tentatively closing as working, thanks for reporting this issue and please reopen when it breaks again. Sorry, perhaps I didn't state it clear enough in my last two answers, that the issue was back again the day after my tests. Hi Andreas, Could you please retry latest drm-intel-nightly? And attach log please. Also it would be great if you could bisect between the version that worked on Dec 27 and the one that didn't work on Dec 28, trying to find the patch that reintroduced the issue. Thanks If you look at the log it happened anyway, just the desktop environment queried the configuration so often that the fact that it failed once made no difference. It reported the EDID at the vital time and so everything appeared to work. i.e. I don't think the problem was ever mysteriously fixed, just bad/good timing. Created attachment 93201 [details] 20140201-2343_3.13.0+_dmesg.log took me a few cold boots, but here is the latest blank screen dmesg log with drm-intel-nightly - would the output of intel_reg_dumper also be useful? (In reply to comment #31) > Also it would be great if you could bisect between the version that worked > on Dec 27 and the one that didn't work on Dec 28, trying to find the patch > that reintroduced the issue. Every single kernel image built since 3.8-rc1 has at some point (repeatedly) failed to bring up the DP screen. git bisect between start of merge window and rc1 so far has been like a lottery with that kind of error... Yes, I'd like to see the reg dumps before and after the failure. But also I'm curious about the dmesg output when it *works*. Could you attach a dmesg with drm.debug=0xe from a good case where detection and edid read goes right? Does suspend/resume sequences aftect your results anyhow? Created attachment 100724 [details]
20140609-1030_3.15.0_dmesg-OFF.log
3.15.0 - no connection - dmesg with drm.debug=0xe
Created attachment 100725 [details]
20140609-1030_3.15.0_i915regdump-OFF.log
3.15.0 - no connection - intel-gpu-tools-1.6 regdump
Created attachment 100727 [details]
20140609-1330_3.15.0_dmesg-ON.log
3.15.0 - success - dmesg with drm.debug=0xe
Created attachment 100728 [details]
20140609-1330_3.15.0_i915regdump-ON.log
3.15.0 - success - intel-gpu-tools-1.6 regdump
Created attachment 100731 [details]
20140609-1410_3.4.92-gentoo_dmesg-ON.log
3.4.92 - 100% success rate - dmesg with drm.debug=0xe
Created attachment 100732 [details]
20140609-1410_3.4.92-gentoo_i915regdump-ON.log
3.4.92 - 100% success rate - intel-gpu-tools-1.6 regdump
(In reply to comment #34) > Does suspend/resume sequences aftect your results anyhow? I do not suspend/resume at all - however, I could look into configuring it since the swap partition is rather bored anyway. There's also no consistency with cold- or reboots - sometimes 3.15.0 (as a placeholder for any >=3.8-rc1 kernel image) will work after reboot from out of 3.4, sometimes not, but then maybe - not necessarily - a coldboot into 3.15.0 will work fine. However, I don't think it has ever brought up a connection from *only* rebooting from 3.15.0 when it didn't work the first time - though I didn't try that for too long since staring at my constantly standbied display (only waking up for POST and grub2) is a rather sad experience. Thanks for the logs and dump. They make me think that something is disabling primary plane before it show. Could you please continue the investigation with the following branch: http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=58876-investigation Please attach the logs for the good and bad case. Don't need the reg dumps. Created attachment 107287 [details]
20141003-2338_3.17.0-rc7+_dmesg-stop-OFF.log
Thanks for the effort - new log with your branch.
I have not managed to get a 'good case' yet, however interesting stuff. The old match on 'no connectors reported' doesn´t work anymore, and there even appear modelines for the external display when I switch between ttys. No signal though.
Created attachment 108073 [details]
20141019-2125_3.17.0-rc7+_dmesg-stop-ON.log (drm.debug=6)
I couldn't get any output on the external display in weeks, but today I accidentally used drm.debug=6 instead of 0xe, and suddenly there's a screen working as intended. And indeed, switching back and forth from these two debug settings, it either works or not. Oh fun... ;)
As a side note, the above mentioned 'No connectors reported' message is triggered by my trusty old (default) kernel param i915.panel_ignore_lid=0, so no change actually.
Fun fact: drm.debug=6 seems to raise the chance for screen output _considerably_, compared both to drm.debug=0xe and no drm.debug. Several reboots into the same 3.17 image were successful with that parameter, one unsuccessful. Ok, this is definitely just the gmbus controller being pissed somehow and refusing to work. No idea why, but given that it's only happening at boot-up it's probably leftover bios state. Created attachment 110134 [details] [review] idle gmbus harder in takeover A quick patch for you to test. Thanks for new stuff to try out, I'm finally back at the setup. No change though, mostly bad runs and a few good ones with either no debug param, or drm.debug=6 or drm.debug=0xe, applied over 3.17.4 as well as 3.18-rc8. Daniel, did you forget to add the new location for the i2c reset? It just looks like you moved the function in the file as-is... Given a new version, I would have time today for another round of testing on my weekend setup. Andreas, did you try Daniel's patch? On looking again I see I missed the fact he added a new function call in there that could help. (In reply to Jesse Barnes from comment #51) > Andreas, did you try Daniel's patch? Yes I tried, results are in comment #48 - unfortunately no success. However, once again I have high hopes for a new kernel version - 4.0.0 RCs look good so far! I've had a few reboots today on the troublesome setup, and while the failure was reproduceable once more with 3.17.8 (including the lastest patch), all good so far with 4.0.0. It looks as if the non-detection cases have been replaced by wrong-resolution detections there - much easier to live with, if it stays that way. Well, we'd still like to fix any resolution detection problems! Sounds like maybe a failed EDID read. Can you file a new bug for that with logs against 4.0-rc? Thanks, Jesse |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.