The test igt@kms_flip@basic-flip-vs-dpms generated a dmesg error (but still passed) when running on fi-skl-6700hq on CI_DRM_2640: [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training [drm:intel_dp_start_link_train [i915]] Link Training failed at link rate = 270000, lane count = 2 This is possibly related to the following message a little above in the logs (up to the warning reported by piglit): [...] [ 420.613916] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 420.622421] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 420.630911] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 420.630939] [drm:drm_dp_dpcd_access] Too many retries, giving up. First error: -110 [ 420.634122] [drm:intel_dp_sink_dpms [i915]] failed to enable sink power state [...] [ 421.172326] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 421.180857] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 421.180869] [drm:drm_dp_dpcd_access] Too many retries, giving up. First error: -110 [ 421.180893] [drm:intel_dp_set_signal_levels [i915]] Using signal levels 00000000 [ 421.180911] [drm:intel_dp_set_signal_levels [i915]] Using vswing level 0 [ 421.180971] [drm:intel_dp_set_signal_levels [i915]] Using pre-emphasis level 0 [ 421.180991] [drm:intel_dp_program_link_training_pattern [i915]] Using DP training pattern TPS1 [...] [ 421.412348] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 421.420917] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 421.429499] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 421.438068] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 421.446637] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 421.455206] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7d4003ff [ 421.455218] [drm:drm_dp_dpcd_access] Too many retries, giving up. First error: -110 [ 421.455240] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training [ 421.455261] [drm:intel_dp_start_link_train [i915]] Link Training failed at link rate = 270000, lane count = 2 [ 421.455353] [drm:intel_dp_modeset_retry_work_fn [i915]] [CONNECTOR:48:eDP-1]
I think it still passed because now we handle link failures and so it probably executed the link training fallback and passed at the lower link rate/lane count.
(In reply to Manasi from comment #1) > I think it still passed because now we handle link failures and so it > probably executed the link training fallback and passed at the lower link > rate/lane count. It did, but that changed the result of the piglit test from pass to dmesg-warn, so we likely have to filter these strings out in IGT ;)
The same happened on pre-merge: https://intel-gfx-ci.01.org/CI/Patchwork_4772/fi-skl-6700hq/igt@kms_busy@basic-flip-default-a.html
(In reply to Manasi from comment #1) > I think it still passed because now we handle link failures and so it > probably executed the link training fallback and passed at the lower link > rate/lane count. In theory true, in practice there's 0 igt testcases that exercise this, which is a gap we should fix. Before we've fixed that gap in test coverage I don't think hiding these is a good idea.
CI_DRM_2640, CI_DRM_2641, CI_DRM_2642, CI_DRM_2644, CI_DRM_2646, CI_DRM_2650 - Dmesg warn on: [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training CI_DRM_2643 - Pass CI_DRM_2645, CI_DRM_2647, CI_DRM_2648, CI_DRM_2649, CI_DRM_2651 - Fail: (kms_flip:3948) igt-kms-CRITICAL: Failed assertion: drmModeConnectorSetProperty(fd, connector->connector_id, dpms, mode) == 0 It appears as if someone filed both failing test and dmesg-warning to this bug.
(In reply to Marta Löfstedt from comment #5) > CI_DRM_2640, CI_DRM_2641, CI_DRM_2642, CI_DRM_2644, CI_DRM_2646, CI_DRM_2650 > - Dmesg warn on: [drm:intel_dp_start_link_train [i915]] *ERROR* failed to > enable link training > > CI_DRM_2643 - Pass > > CI_DRM_2645, CI_DRM_2647, CI_DRM_2648, CI_DRM_2649, CI_DRM_2651 - Fail: > (kms_flip:3948) igt-kms-CRITICAL: Failed assertion: > drmModeConnectorSetProperty(fd, connector->connector_id, dpms, mode) == 0 > > It appears as if someone filed both failing test and dmesg-warning to this > bug. This is what happen when you reduce coverage, you don't see new errors :s Thanks for reporting this, I'll file a new bug!
Actually this one is failing because it is not able to do the drm_dp_dpcd_write over AUX is not able to write all the bytes for the DP_TRAINING_PATTERN_SET in intel_dp_set_link_train function. So it never starts link training and returns false. So this is a different case than link training failure handling. Martin could you please attach a complete dmesg log for this test on SKL 6700hq? Manasi
(In reply to Manasi from comment #7) > Actually this one is failing because it is not able to do the > drm_dp_dpcd_write over AUX is not able to write all the bytes for the > DP_TRAINING_PATTERN_SET in intel_dp_set_link_train function. So it never > starts link training and returns false. So this is a different case than > link training failure handling. Well it is *handled* in intel_dp_start_link_train() as if link training failure happened. The question is, would it have worked if we still plunged on with channel equalisation regardless of errors?
(In reply to Jani Nikula from comment #8) > Well it is *handled* in intel_dp_start_link_train() as if link training > failure happened. The question is, would it have worked if we still plunged > on with channel equalisation regardless of errors? See bug 101167.
But if the aux transactions are failing then it is a real failure and we should not try the fallback I think. So how do we differentiate between this error and the actual link training error. Manasi
(In reply to Manasi from comment #10) > But if the aux transactions are failing then it is a real failure and we > should not try the fallback I think. > So how do we differentiate between this error and the actual link training > error. > > Manasi Could we set a work item that would wait for 250ms and try again the mode-set, and only set the link-training issue at this point? I mean, if a panel does not answer to AUX requests, it is probably not up yet, right?
Created attachment 131702 [details] dmesg when only dmesg-warn
(In reply to Manasi from comment #7) ... > Martin could you please attach a complete dmesg log for this test on SKL > 6700hq? > > Manasi I added dmesg from: https://intel-gfx-ci.01.org/CI/CI_DRM_2640/fi-skl-6700hq/igt@kms_flip@basic-flip-vs-dpms.html This is when we only have dmesg-warn which this bug is about, when we also fail the test please see bug: https://bugs.freedesktop.org/show_bug.cgi?id=101167
(In reply to Martin Peres from comment #11) > (In reply to Manasi from comment #10) > > But if the aux transactions are failing then it is a real failure and we > > should not try the fallback I think. > > So how do we differentiate between this error and the actual link training > > error. > > > > Manasi > > Could we set a work item that would wait for 250ms and try again the > mode-set, and only set the link-training issue at this point? > > I mean, if a panel does not answer to AUX requests, it is probably not up > yet, right? Hi Martin, But that should happen even before starting the link training, if the AUX is not up yet, then we should not go ahead and do the clock recovery phase at all, why not wait here for 250ms and then find some way to just fail without attempting link training. I am trying to find out at what point can we first check for dp aux transactions and wait longer to try again? Manasi
(In reply to Manasi from comment #14) > (In reply to Martin Peres from comment #11) > > (In reply to Manasi from comment #10) > > > But if the aux transactions are failing then it is a real failure and we > > > should not try the fallback I think. > > > So how do we differentiate between this error and the actual link training > > > error. > > > > > > Manasi > > > > Could we set a work item that would wait for 250ms and try again the > > mode-set, and only set the link-training issue at this point? > > > > I mean, if a panel does not answer to AUX requests, it is probably not up > > yet, right? > > Hi Martin, > > But that should happen even before starting the link training, if the AUX is > not up yet, then we should not go ahead and do the clock recovery phase at > all, why not wait here for 250ms and then find some way to just fail without > attempting link training. I fully agree with the beginning. We should not go to the clock recovery phase if we did not even go through link training at all. However, not sure we can just wait there for 250ms before re-trying. This is a question for a kernel developer, as I never even read this code. > > I am trying to find out at what point can we first check for dp aux > transactions and wait longer to try again? Yes, that sounds like the right thing to do! Maybe Jani can chime in here :)
One peculiar thing see in some of the logs at least is a double long HPD when we turn on the panel. I wonder if there's some kind of transient glitch on the VDD line or something that causes to panel to do some kind of off->on->off->on transition. To get some more debug information maybe someone could hack up a test patch that checks the state of the HPD pin in a few places during the modeset; Eg. just after the panel supposedly powered on and just before starting link training, and maybe in a few places in between. What kind of machine is this? I'm wondering of the panel power delays might be off somehow, eg. the power cycle delay might be too short and the panel doesn't like it if we power it back up too soon...
(In reply to Manasi from comment #14) > I am trying to find out at what point can we first check for dp aux > transactions and wait longer to try again? Can you explain how we'd end up in link training if native aux was failing altogether? It must have worked for some time before link training, right? Ville's explanation would cover this as well.
I found another skl lap-top to test on. The issue can not be reproduced. So, at least we now have some facts that something is different with this specific machine.
(In reply to Ville Syrjala from comment #16) > What kind of machine is this? I'm wondering of the panel power delays might > be off somehow, eg. the power cycle delay might be too short and the panel > doesn't like it if we power it back up too soon... Ville, information about the lab machines is here: https://intel-gfx-ci.01.org/CI/hardware.html fi-skl-6700hq Toshiba Satellite P50-C-18C Skylake i7-6700hq HD Graphics 530 / GT2 eDP (HDMI)
Since, we don't have batteries in the lap-tops in the lab. I decided to test the far fetched idea that maybe the powering-up timing on this device could somehow be affected by battery being present or not. But this turned out to give no difference in the results.
(In reply to Ville Syrjala from comment #16) > One peculiar thing see in some of the logs at least is a double long HPD > when we turn on the panel. I wonder if there's some kind of transient glitch > on the VDD line or something that causes to panel to do some kind of > off->on->off->on transition. > > To get some more debug information maybe someone could hack up a test patch > that checks the state of the HPD pin in a few places during the modeset; Eg. > just after the panel supposedly powered on and just before starting link > training, and maybe in a few places in between. > > What kind of machine is this? I'm wondering of the panel power delays might > be off somehow, eg. the power cycle delay might be too short and the panel > doesn't like it if we power it back up too soon... Ville, I have looked through a lot of the dmesg and this double long HPD thing appear to always happen before we get the aux timeout and after that the link training dmesg-warn (and after that the bug 101167 issue), actually I have even seen 3 long HPDs in a short time.
(In reply to Jani Nikula from comment #17) > (In reply to Manasi from comment #14) > > I am trying to find out at what point can we first check for dp aux > > transactions and wait longer to try again? > > Can you explain how we'd end up in link training if native aux was failing > altogether? It must have worked for some time before link training, right? > > Ville's explanation would cover this as well. Actually it fails inside the function intel_dp_link_training_clock_reciovery() when it tries to write the TP1 and errors out saying "Failed to enable link training". Yes but it worked fine for DPCD reads over AUX, so that's why I am confused as to where we need to add the delay for the AUX reads? Also, shouldn't the spec for that specific panel or the eDP spec talk about how much delay should be given for the AUX read/writes? Any thoughts on double checking this in the spec? Manasi
(In reply to Manasi from comment #22) > (In reply to Jani Nikula from comment #17) > > (In reply to Manasi from comment #14) > > > I am trying to find out at what point can we first check for dp aux > > > transactions and wait longer to try again? > > > > Can you explain how we'd end up in link training if native aux was failing > > altogether? It must have worked for some time before link training, right? > > > > Ville's explanation would cover this as well. > > Actually it fails inside the function > intel_dp_link_training_clock_reciovery() when it tries to write the TP1 and > errors out saying "Failed to enable link training". > Yes but it worked fine for DPCD reads over AUX, so that's why I am confused > as to where we need to add the delay for the AUX reads? > > Also, shouldn't the spec for that specific panel or the eDP spec talk about > how much delay should be given for the AUX read/writes? > Any thoughts on double checking this in the spec? > > Manasi Manasi, I believe that one theory is that device have actually become broken/unstable and that we should try to prove this somehow. If you or Ville has some ideas on how to prove/disprove this, could you try this on the try-bot, or provide a patch that I can run on the machine early in the morning when the CI is usually idle.
That's is why I was trying to see if we could find an exact same machine with the same eDP panel here and if we cant reproduce then we know that may be that device is unstable. Do you have the remote login details for that system? Else I will work on a patch to give to you for testing. Manasi
Created attachment 131941 [details] [review] Test Patch to see if powering up the panel again helps
Hi Marta, After looking through the dmesg logs, looks like in the modeset in intel_ddi_pre_enable_dp(), after the edp panel is powered on, it gets two /three HPD pulses after which it shows AUX ch timeouts. That could be because the panel is not powered on or eDP connector is disconnected. The attached patch adds debug print for checking the connector status here and also powers up panel again just before link training. Could you please test it out and see what happens and attach the debug log? Manasi (In reply to Marta Löfstedt from comment #23) > (In reply to Manasi from comment #22) > > (In reply to Jani Nikula from comment #17) > > > (In reply to Manasi from comment #14) > > > > I am trying to find out at what point can we first check for dp aux > > > > transactions and wait longer to try again? > > > > > > Can you explain how we'd end up in link training if native aux was failing > > > altogether? It must have worked for some time before link training, right? > > > > > > Ville's explanation would cover this as well. > > > > Actually it fails inside the function > > intel_dp_link_training_clock_reciovery() when it tries to write the TP1 and > > errors out saying "Failed to enable link training". > > Yes but it worked fine for DPCD reads over AUX, so that's why I am confused > > as to where we need to add the delay for the AUX reads? > > > > Also, shouldn't the spec for that specific panel or the eDP spec talk about > > how much delay should be given for the AUX read/writes? > > Any thoughts on double checking this in the spec? > > > > Manasi > > Manasi, I believe that one theory is that device have actually become > broken/unstable and that we should try to prove this somehow. > > If you or Ville has some ideas on how to prove/disprove this, could you try > this on the try-bot, or provide a patch that I can run on the machine early > in the morning when the CI is usually idle.
Created attachment 131945 [details] dmesg for Mansis test patch Manasi, here is dmesg with you testpatch. igt test started: [ 89.255054] [IGT] kms_flip: executing subtest started: [ 89.273659] [IGT] kms_flip: starting subtest basic-flip-vs-dpms first everything looks fine: "[ 89.575307] [drm:intel_hpd_irq_handler] Received HPD interrupt on PIN 4 - cnt: 0 ... [ 90.345461] [drm:intel_ddi_pre_enable] Manasi Debug [CONNECTOR:48:eDP-1] status connected [ 90.345763] [drm:edp_panel_vdd_on] PP_STATUS: 0x80000008 PP_CONTROL: 0x0000000b [ 90.346891] [drm:intel_dp_set_signal_levels] Using signal levels 00000000" This HPD also goes fine: [ 91.624826] [drm:intel_hpd_irq_handler] digital hpd port A - long [ 91.689787] [drm:intel_ddi_pre_enable] Manasi Debug [CONNECTOR:48:eDP-1] status connected Then we run into the usual issue, i.e. double HPDs: "[ 92.925623] [drm:intel_hpd_irq_handler] Received HPD interrupt on PIN 4 - cnt: 1 ... [ 92.961714] [drm:intel_hpd_irq_handler] Received HPD interrupt on PIN 4 - cnt: 2 ... [ 92.990129] [drm:intel_ddi_pre_enable] Manasi Debug [CONNECTOR:48:eDP-1]" Then the usual aux timeouts: "[ 92.990367] [drm:edp_panel_vdd_on] Turning eDP port A VDD on [ 92.990414] [drm:edp_panel_vdd_on] PP_STATUS: 0x80000008 PP_CONTROL: 0x0000000b [ 92.998844] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7c1003ff" and then: [ 94.616008] [drm:intel_dp_start_link_train] *ERROR* failed to enable link training
Hi Marta, Yes that didn't help since the driver thinks that the eDP panel is already ON, but may be its turning on and off many times generating multiple HPD pulses. So just disregard the patch. One more thing I want you to try is in the igt test itself (kms_flip.c), in the run_test_on_crtc_set() function, I see that kmstest_unset_all_crtcs() is called which actually powers off the panel. Could you add a usleep() call here to sleep for a few ms before actually doing a modeset where it will power the eDP panel back on. This might give some time to the panel power sequencer to stabilize before turning it back on. This is one of the tests, I have few more things to debug in mind and I will send details regarding the second test soon.
Hi Marta, This is the second test I want you to do with the attached patch Manasi
Created attachment 131965 [details] [review] debug_patch_0614
(In reply to Ville Syrjala from comment #16) > One peculiar thing see in some of the logs at least is a double long HPD > when we turn on the panel. I wonder if there's some kind of transient glitch > on the VDD line or something that causes to panel to do some kind of > off->on->off->on transition. > > To get some more debug information maybe someone could hack up a test patch > that checks the state of the HPD pin in a few places during the modeset; Eg. > just after the panel supposedly powered on and just before starting link > training, and maybe in a few places in between. > > What kind of machine is this? I'm wondering of the panel power delays might > be off somehow, eg. the power cycle delay might be too short and the panel > doesn't like it if we power it back up too soon... Hi Ville, I want to add the check for HPD live status after the intel_edp_panel_on() call in intel_ddi_pre_enable_dp() and right before intel_dp_start_link_train(). I am looking at the register PORT_HOTPLUG_STAT in the bspec that gives the live status. But it does not have a bit for Port A, how do I check the live status for Port A?
(In reply to Manasi from comment #28) > Hi Marta, > > Yes that didn't help since the driver thinks that the eDP panel is already > ON, but may be its turning on and off many times generating multiple HPD > pulses. So just disregard the patch. > > One more thing I want you to try is in the igt test itself (kms_flip.c), in > the run_test_on_crtc_set() function, I see that kmstest_unset_all_crtcs() is > called which actually powers off the panel. Could you add a usleep() call > here to sleep for a few ms before actually doing a modeset where it will > power the eDP panel back on. > This might give some time to the panel power sequencer to stabilize before > turning it back on. > > This is one of the tests, I have few more things to debug in mind and I will > send details regarding the second test soon. I tested sleeping for 50ms, but is doesn't help. Also, the issue can also start at: igt@kms_busy@basic-flip-default-*.
(In reply to Manasi from comment #29) > Hi Marta, > > This is the second test I want you to do with the attached patch > > Manasi With the patch that force the eDP to not be turned off, the test pass. I believe the try-bot would be faster than the Marta-bot for this type of testing ;) Marta
Hi Marta, Could you apply Ville's patch : https://patchwork.freedesktop.org/patch/161828/ and then my debug patch attached? Then I want you to give me two dmesg logs: 1. Boot the system and capture the dmesg log (Hopefully you get some display on edp panel) 2. Boot the system and run only 1 subtest from kms_flip.c that is giving this error , so just run the flip-vs-dpms test - collect dmesg 3. Boot the system and run some other subtest from kms_flip.c that is not having this error/warning - collect dmesg 4. Boot the system and do a full kms_flip.c run and collect dmesg. Thanks you so much for helping out. I will try to stay up later in the night so I can be online when you are testing. Manasi
Created attachment 131988 [details] [review] debug_patch_0615
After finishing the previous tests, could you please do the same tests with Ville's Patch, my debug patch and the third patch (debug_with_panel_power_cycle) attached above? Thanks a lot for your time. Manasi
Created attachment 131989 [details] [review] debug_with_power_cycle_0615
Created attachment 131996 [details] dmesg 2 Dmesg for Manasis requests: 1. Boot the system and capture the dmesg log (Hopefully you get some display on edp panel) 2. Boot the system and run only 1 subtest from kms_flip.c that is giving this error , so just run the flip-vs-dpms test - collect dmesg
Created attachment 131997 [details] Dmesg 3 Dmesg for Manasis request: 3. Boot the system and run some other subtest from kms_flip.c that is not having this error/warning - collect dmesg
Created attachment 131998 [details] Dmesg 4 Dmesg for Manasis request: 4. Boot the system and do a full kms_flip.c run and collect dmesg.
Created attachment 131999 [details] dmesg 2 debug_with_panel_power_cycle dmesg for Manasis request: After finishing the previous tests, could you please do the same tests with Ville's Patch, my debug patch and the third patch (debug_with_panel_power_cycle) attached above?
Manasi, I added a tweak to you last patch: diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c index 2750633..dff8830 100644 --- a/drivers/gpu/drm/i915/intel_ddi.c +++ b/drivers/gpu/drm/i915/intel_ddi.c @@ -1669,6 +1669,7 @@ static void intel_ddi_pre_enable_dp(struct intel_encoder *encoder, DRM_DEBUG_KMS("\nManasi Debug: HPD Live status after turning Panel On = %d", test); if (!test) { DRM_DEBUG_KMS("\nManasi: HPD not active still so power off and power on"); + intel_dp->want_panel_vdd = true; intel_edp_panel_off(intel_dp); test = intel_digital_port_connected(dev_priv, dig_port); DRM_DEBUG_KMS("\nManasi Debug: HPD Live status after turning Panel On = %d", test); Now the test pass.
Created attachment 132000 [details] Ville + 2 Manasi + tweak This is dmesg where the test passed a couple of times.
NOTE: I have now managed to hang the machine hard with the patches: Ville + 2 x Manasi + tweak. So, I assume the turn off then on again needs to be done properly. I have also managed to fail the test with above patches.
Thanks for tests, I will analyze all the collected logs now.
You said the test passes with these, does it hang as well? (In reply to Marta Löfstedt from comment #44) > NOTE: I have now managed to hang the machine hard with the patches: > Ville + 2 x Manasi + tweak. > > So, I assume the turn off then on again needs to be done properly. > > I have also managed to fail the test with above patches.
(In reply to Marta Löfstedt from comment #38) > Created attachment 131996 [details] > dmesg 2 > > Dmesg for Manasis requests: > > 1. Boot the system and capture the dmesg log (Hopefully you get some display > on edp panel) > 2. Boot the system and run only 1 subtest from kms_flip.c that is giving > this error , so just run the flip-vs-dpms test - collect dmesg Thanks Marta. Does this include the entire log when the system is booting up? If it is then it is a problem because I do not see the DMESG statements in intel_dp_init_panel_power_sequencer() which means it might not have initialized the delays for power on and off correctly. Can you make sure this includes the dmesg log at the system boot time when it will initialize the edp panel with the VBT values. Manasi
(In reply to Marta Löfstedt from comment #43) > Created attachment 132000 [details] > Ville + 2 Manasi + tweak > > This is dmesg where the test passed a couple of times. I donno how the test passed here since the dmesg log still shoes no difference and still see the AUX CH timeouts. Is this the correct log? Manai
I have sent several patch sets to trybot for trying out several experiments with panel power delays. Lets see how the results look. Manasi
I tried a few things on Trybot, but I am guessing the system probably needs a hard reboot or something for the edp panel to be back on Marta, Could you first test the attached patches set 1? And then also test patches Set 2? Manasi
Created attachment 132103 [details] [review] 1_Patch_set_1
Created attachment 132104 [details] [review] 2_Patch_Set_1
Created attachment 132105 [details] [review] 3_Patch_Set_1
Created attachment 132106 [details] [review] 1_Patch_Set_2
Created attachment 132107 [details] [review] 2_Patch_Set_2
Created attachment 132108 [details] [review] 3_Patch_Set_3
Comment on attachment 132108 [details] [review] 3_Patch_Set_3 This is the 3 rd patch from Patch set 2
Actually I figured the Patchwork link would probably be the best: Test 1: https://patchwork.freedesktop.org/series/26090/ Test2: https://patchwork.freedesktop.org/series/26091/
Here's an update on this bug: Found that this specific panel requires a higher panel power cycle delay no less than 800msecs which is 300ms greater than the eDP min value. With this value it is able to power cycle the edp panel correctly and can turn it back on correctly without any aux ch timeouts. I tried this fix on Trybot multiple times and consistently got all Success on 6700hq system: https://patchwork.freedesktop.org/series/26172/ I am working on a patch that would add it in the quirk database for this panel. JaniN, Ville let me know what you think about this quirk? Manasi
(In reply to Manasi from comment #59) > Here's an update on this bug: > > Found that this specific panel requires a higher panel power cycle delay no > less than 800msecs which is 300ms greater than the eDP min value. With this > value it is able to power cycle the edp panel correctly and can turn it back > on correctly without any aux ch timeouts. > > I tried this fix on Trybot multiple times and consistently got all Success > on 6700hq system: > https://patchwork.freedesktop.org/series/26172/ > > I am working on a patch that would add it in the quirk database for this > panel. > > JaniN, Ville let me know what you think about this quirk? > > Manasi A quirk for this panels sound good to me. We have run the test on another eDP SKL laptop and this issue couldn't be reproduced.
I am working on adding this to the EDID quirk database. Marta, could you please give me EDID hexdump on that system? cd /sys/class/drm/card0-edp/ hexdump -C edid I will need EDID to find the correct Product ID and Vendor ID to add to the quirk database. Thanks Manasi
00000000 00 ff ff ff ff ff ff 00 30 e4 70 04 00 00 00 00 |........0.p.....| 00000010 00 18 01 04 95 23 13 78 ea dc 95 a3 58 55 a0 26 |.....#.x....XU.&| 00000020 0d 50 54 00 00 00 01 01 01 01 01 01 01 01 01 01 |.PT.............| 00000030 01 01 01 01 01 01 2e 36 80 a0 70 38 1f 40 30 20 |.......6..p8.@0 | 00000040 35 00 59 c2 10 00 00 1a 00 00 00 00 00 00 00 00 |5.Y.............| 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 fe 00 4c |...............L| 00000060 47 20 44 69 73 70 6c 61 79 0a 20 20 00 00 00 fe |G Display. ....| 00000070 00 4c 50 31 35 36 57 46 36 2d 53 50 41 31 00 7b |.LP156WF6-SPA1.{| 00000080
Couple of observations I made: [ 5.001356] [drm:intel_pps_dump_state [i915]] cur t1_t3 0 t8 0 t9 0 t10 500 t11_t12 6000 [ 5.001412] [drm:intel_pps_dump_state [i915]] vbt t1_t3 2000 t8 2000 t9 2000 t10 500 t11_t12 5000 So the BIOS doesn't program the t1+t3 delay at all. Not sure why. The VBT seems to have these pps delays for most of the panel indexes: Power Sequence: T3 2000 T7 10 T9 2000 T10 500 T12 5000 except panel 7 (which is what we end up using: Power Sequence: T3 2000 T7 2000 T9 2000 T10 500 T12 5000 So only T7 is changed, which shouldn't matter anyway. The panel spec would appear to list the following requirements: T1 0.5-10 ms T2/T3 0-200 ms T11 <= 10ms T12 >= 120 ms So it looks like the values we use should be fine in theory.
Quirk patches have been submitted to the intel-gfx M-L and it is passing all the CI tests now. Needs to be reviewed by Ville
This fixed the issue: c99a259 drm/i915/edp: Add a T12 panel delay quirk to fix DP AUX CH timeouts
No need to ask QA to verify, so forcing to CLOSED.
Oh well, this is still visible, even without the AUX timeouts: https://intel-gfx-ci.01.org/CI/CI_DRM_2801/fi-skl-6700hq/igt@kms_busy@basic-flip-default-b.html
There is AUX timeouts in this as well, but the delay is set to 800 ms as per the quirk. So for now I have started to believe that this system is probably really unstable and is giving AUX timeouts without any certain pattern/behavior. We should remove this system from CI
(In reply to Manasi from comment #68) > There is AUX timeouts in this as well, but the delay is set to 800 ms as per > the quirk. So for now I have started to believe that this system is probably > really unstable and is giving AUX timeouts without any certain > pattern/behavior. We should remove this system from CI Isn't that a good test for IGT then? Right now, IGT assumes that it can always apply a resolution, and that is not true for DP. We may end up getting rid of the system, but what I see right now is software that is not ready for the new requirements forced upon us by the display port.
But the error is seen because of AUX timeouts and this is happening because the systems looks unstable, because despite the edp panel power cycle delays as per the spec and increasing them as per the quirk, we still see the AUX timeouts. And since this is very specific to this system and this test, this is more of an issue with the system and not the SW. Manasi
Martin, is this still seen on CI? I tried increasing the delay further and I see it only runs 3 IGT tests and those are all passing on SKL 6700hq system. Manasi
(In reply to Manasi from comment #71) > Martin, is this still seen on CI? I tried increasing the delay further and I > see it only runs 3 IGT tests and those are all passing on SKL 6700hq system. > > Manasi Indeed, they have all been passing since the run after I re-opened the bug. Maybe there was more than one run in the queue and I got confused? Well, let's hope so, but I am sorry for wasting your time by re-opening the bug too quickly :s I'll close the bug!
Aaannnnd it's back... https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2903/fi-skl-6700hq/igt@kms_busy@basic-flip-default-a.html
add one more: igt@kms_busy@basic-flip-a
From the logs it looks like the quirk is getting applied but still there are aux channel timeouts. Ia m going to try increasing the power cycle delay even more and see what happens. This system is connected to Trybot right so I can do some tests by pushing patches to trybot? Manasi
According to this change on https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3023/commits_short.log But still fails: https://intel-gfx-ci.01.org/tree/drm-tip/fi-skl-6700hq.html
But now system has been quite stable. Any reason why?
Lets resolve now and wait to pop up again.
Again unfortunately.
Hmm, why is this system so flaky... Do you have the latest logs where you saw this again? I will start working on this bug as the topmost priority. Manasi
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3055/fi-skl-6700hq/igt@kms_flip@basic-flip-vs-dpms.html
Is it possible to remove the system from CI and give me a remote access to this so I can run this specific IGT test on that since the error could be in the way the test is power cycling the panel. Manasi
Series merged https://patchwork.freedesktop.org/series/31361/
Unfortunately this issue appear to be back on IGT_4057: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_busy@basic-flip-b.html dmesg-warn: [ 302.302951] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training then this is folloed by dmesg-fail or warn on on the following tests: (kms_busy:3998) igt-kms-WARNING: connector 59/eDP-1 has no modes [ 310.115358] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_busy@basic-flip-c.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_cursor_legacy@basic-flip-after-cursor-atomic.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_cursor_legacy@basic-flip-after-cursor-legacy.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_cursor_legacy@basic-flip-after-cursor-varying-size.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_cursor_legacy@basic-flip-before-cursor-atomic.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_cursor_legacy@basic-flip-before-cursor-legacy.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_cursor_legacy@basic-flip-before-cursor-varying-size.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@bad-nb-words-1.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@bad-nb-words-3.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@bad-pipe.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@bad-source.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@hang-read-crc-pipe-b.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-b.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-b-frame-sequence.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-c.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-c-frame-sequence.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@read-crc-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@read-crc-pipe-b.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@read-crc-pipe-c.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@read-crc-pipe-c-frame-sequence.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_psr_sink_crc@psr_basic.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_setmode@basic-clone-single-crtc.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@kms_sink_crc_basic.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/fi-skl-6700hq/igt@pm_backlight@basic-brightness.html
The issue was not reproduced on IGT_4058, so I don't believe this is a simple regression.
I really feel that this system SKL 6700hq is unstable since this happens because the panel doesnt turn on inspite of the quirk for increasing the panel power cycle delay. With that delay it works for few days and then needs increased delay.
Is this happening on a KBL system? Could you attach the logs please? Manasi
I will close this again since the issue has not been seen since that bad run.
Unfortunately I have to reopen this again: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3628/fi-skl-6700hq/igt@kms_flip@basic-flip-vs-dpms.html
Here the issue happened again: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4146/issues.html
I will close this again since it hasn't happened for a long time.
Reopened since the issue was hit while running the shards testlist on BAT machines: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1/fi-skl-6700hq/igt@kms_flip@basic-flip-vs-modeset.html [ 594.867729] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training [ 594.867808] [drm:intel_dp_start_link_train [i915]] *ERROR* [CONNECTOR:71:eDP-1] Link Training failed at link rate = 270000, lane count = 2
I am going to archive this old bug from cibuglog and create a new one.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.