Created attachment 109341 [details] dmesg log showing delay Boot is delayed and the text console is flickering & distorted after commit commit 773538e86081d146e0020435d614f4b96996c1f9 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Thu Sep 4 14:54:56 2014 +0300 drm/i915: Reset power sequencer pipe tracking when disp2d is off has been introduced. As can be seen from the kernel logs there are delays of several seconds where power to eDP is turned on and off. Once the system is up the fbdev text console is flickering. The display is an eDP connected panel, surprisingly an externally connected (DVI) display flickers as well.
Created attachment 109343 [details] dmesg log of kernel prior to offending patch
PCI ID of GPU: 8086:0f31
You have this? commit 1e74a324465e5a4f8e3ee7c423aef92c54de8a72 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Tue Oct 28 16:15:51 2014 +0200 drm/i915: Initialize PPS timestamps on vlv/chv
(In reply to Ville Syrjala from comment #3) > You have this? > > commit 1e74a324465e5a4f8e3ee7c423aef92c54de8a72 > Author: Ville Syrjälä <ville.syrjala@linux.intel.com> > Date: Tue Oct 28 16:15:51 2014 +0200 > > drm/i915: Initialize PPS timestamps on vlv/chv Yup. HEAD of my build is at commit ea361c1700ccb5acdc0f5970fd4d64c94c6aa6ee Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Nov 12 10:28:50 2014 +0100 drm-intel-nightly: 2014y-11m-12d-09h-28m-33s UTC integration manifest
(In reply to Egbert Eich from comment #4) > (In reply to Ville Syrjala from comment #3) > > You have this? > > > > commit 1e74a324465e5a4f8e3ee7c423aef92c54de8a72 > > Author: Ville Syrjälä <ville.syrjala@linux.intel.com> > > Date: Tue Oct 28 16:15:51 2014 +0200 > > > > drm/i915: Initialize PPS timestamps on vlv/chv > > Yup. > > HEAD of my build is at > > commit ea361c1700ccb5acdc0f5970fd4d64c94c6aa6ee > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Wed Nov 12 10:28:50 2014 +0100 > > drm-intel-nightly: 2014y-11m-12d-09h-28m-33s UTC integration manifest The log is from an older kernel since it doesn't have the port information in the vdd debug messages. But I guess you're sayng it still happens with the more recent kernels too. Apart from the massive delay there's one other WTF in the logs: [ 3.840671] [drm:intel_dp_init_panel_power_sequencer_registers] panel power sequencer register settings: PP_ON 0x87d00001, PP_OFF 0x1f40001, PP_DIV 0x270f06 [ 5.068936] [drm:intel_dp_init_panel_power_sequencer_registers] panel power sequencer register settings: PP_ON 0x80000001, PP_OFF 0x1, PP_DIV 0x270f00 I'm left wondering what happened to the delays we had in those registers. Somehow they got zeroed after we already computed them to !=0 values. Althoguh that issue was already present in the "good" log. Also can you make edp_panel_vdd_on() dump a backtrace so we can see where it's coming from. I guess it's some AUX stuff (EDID read perhaps). I can't see why it would suddenly take forever. Especially since it doesn't appear to time out or anything. I don't see any "disabling display" power well debug messages in the log, so at least resetting pps_pipe doesn't seem to happen anywhere. The other part of the patch adds the power_domain get/put around the pps_mutex, but since we have init power enabled here, these should be mostly nops. It does mean we grab and release the power domain mutex a few extra times as well, so there's some overhead, but I would think there would have to some serious contention on the lock to get this kind of effect. Sadly I can't reproduce this here.
Created attachment 109670 [details] [review] [PATCH] drm/i915: Try to avoid pps_{lock,unlock}() on DP ports Attempt at reducing the locking overhead for regular DP AUX transfers. In case the extra locking we now do is really the culprit, can you test this one?
*** Bug 86202 has been marked as a duplicate of this bug. ***
(In reply to Ville Syrjala from comment #5) > The log is from an older kernel since it doesn't have the port information > in the vdd debug messages. But I guess you're sayng it still happens with > the more recent kernels too. The dump was from the very version that introduced this regression but I'm seeingthe same behavior on recent intel nightly kernels (as of last week). I will generate more dumps on more recent kernels. > > Apart from the massive delay there's one other WTF in the logs: > [ 3.840671] [drm:intel_dp_init_panel_power_sequencer_registers] panel > power sequencer register settings: PP_ON 0x87d00001, PP_OFF 0x1f40001, > PP_DIV 0x270f06 > [ 5.068936] [drm:intel_dp_init_panel_power_sequencer_registers] panel > power sequencer register settings: PP_ON 0x80000001, PP_OFF 0x1, PP_DIV > 0x270f00 > > I'm left wondering what happened to the delays we had in those registers. > Somehow they got zeroed after we already computed them to !=0 values. > Althoguh that issue was already present in the "good" log. Right. I did notice those delays as well hadn't paid much attention to them, yet so I didn't check the logs for them explicitly. As it looks like a v3.13 kernel (with a regression fix for DP on top) is the most stable version on this hardware. I've seen other regressions introduced before the mentioned patch (for instance even without this patch the fb console is not entirely stable) but I haven't bisected these, yet. > Also can you make edp_panel_vdd_on() dump a backtrace so we can see where > it's coming from. I guess it's some AUX stuff (EDID read perhaps). I can't > see why it would suddenly take forever. Especially since it doesn't appear > to time out or anything. Will do! > I don't see any "disabling display" power well debug messages in the log, so > at least resetting pps_pipe doesn't seem to happen anywhere. The other part > of the patch adds the power_domain get/put around the pps_mutex, but since > we have init power enabled here, these should be mostly nops. It does mean > we grab and release the power domain mutex a few extra times as well, so > there's some overhead, but I would think there would have to some serious > contention on the lock to get this kind of effect. > > Sadly I can't reproduce this here. I will play with this some more and also check more recent kernels. (In reply to Ville Syrjala from comment #6) > Created attachment 109670 [details] [review] [review] > [PATCH] drm/i915: Try to avoid pps_{lock,unlock}() on DP ports > > Attempt at reducing the locking overhead for regular DP AUX transfers. > > In case the extra locking we now do is really the culprit, can you test this > one? Sure!
This issue is driving me nuts. I've built drm-intel-nightly yesterday but unable to reproduce the issue - in fact everything got initialized flawlessly. The patch in comment #6 didn't play a role as a. The long delay between calls to intel_dp_init_panel_power_sequencer_registers() did not exists.edp_panel_vdd_off_sync b. edp_panel_vdd_off_sync() was called a lot less often. In fact the only evidence I had that I was not dreaming last week when I openend this ticket was the log in https://bugs.freedesktop.org/attachment.cgi?id=109342 which clearly indicated that those issues existed in a kernel based on 3.18-rc4. Comparing this log with the ones I generated yesterday revealed that the drm modules were loaded a lot later during the boot process. I've now been able to rebuild a kernel which exhibits this issue. I will now go on comparing the configurations of the two builds. So far it seems that for the 'bad' kernel build the patch in comment #6 at least fixed the problem in the EDID read out which can be seen in https://bugs.freedesktop.org/attachment.cgi?id=109342 as without it I can easily reproduce the bogus EDIDs. So for now, let me figure out what is so different in the two builds of exactly the same kernel version.
Created attachment 109950 [details] [review] Fix: When enabling panel VDD cancel pending disable worker I found the time to debug this. There are two issues: 1. In edp_panel_vdd_on() we don't cancel a vdd_off worker which may be pending. This may lead to the worker firing right in the middle of an EDID transfer. While this is most likely not what is expected it didn't cause the EDID to be misread - at least for my panel. Attached os a patch that attempts to fix this. 2. The long delay happens when the HZ is set to 250. I don't see it when it is set to 1000. Here is what's happening: on eDP pps_lock()/unlock() gets called on each low level transfer. This function can be quite expensive however: when checking if the power well is enabled vlv_sideband_rw() gets called this performs its tests in a wait_for() loop. Since these tests don't seem to succeed the first time around the test is redone after an 1 ms wait. With HZ set to 250 the granularity is higher 1 ms however and those waits add up. This can be seen easily by replacing wait_for() by wait_for_atomic() which retests after a 0 wait. In this case the long delay period disappears even for HZ=250. It is however waistful to perfrom those operations at the lowest level at intel_dp_aux_ch(). It would be better to wrap the entire i2c transfer by single calls to these functions. The current design of drm_dp_helper.c doesn't allow this however. I'm going to propose a patchset for this to the ML.
The wait_for debugging seems to point at a potential solution. I think Ville is debugging this along with some internal bugs, so I'll assign to him.
(In reply to Jesse Barnes from comment #11) > The wait_for debugging seems to point at a potential solution. I think > Ville is debugging this along with some internal bugs, so I'll assign to him. -ENOCONTXT I may be missing some discussion here. The idea to avoid the numerous calls to pps_lock()/unlock() during an EDID transfer by locking on the outer instead of the inner layer to me still seems to have some merit. Daniel and Ville were afraid that the slightly simplistic solution I proposed won't scale when new DPCP transfers are added. Thus it was proposed to add the pps_lock()/unlock() operations to the outer layer where appropriate and extend the locking/unlocking itself with some ref counter to avoid deadlocking by calling these operations on the outer and inner layer. Not sure if this idea is void now or still worth pursuing - Ville?
Created attachment 110958 [details] [review] [PATCH] drm/i915: Kill check_power_well() calls Thanks to Jani we may now have a real idea why it gets so slow with the get/put. This patch should hopefully fix it. Please test. Skipping the pps_lock/unlock for DP ports might still be worthwile though.
Presumed fixed by commit 7f1241ed1a06b4846ad7a2a57eb088b757e58e16 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Thu Dec 18 11:44:06 2014 +0200 drm/i915: Kill check_power_well() calls in drm-intel-next-fixes (and -nightly). Thanks for the report, please do reopen if the problem persists with this.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.