When running the test igt@kms_setmode@basic-clone-single-crtc on fi-skl-6700hq, starting from CI_DRM_2744, we get the following message in the test's stderr:
(kms_setmode:4033) igt-kms-WARNING: no modes for connector 48
Full logs: https://intel-gfx-ci.01.org/CI/CI_DRM_2744/fi-skl-6700hq/igt@firstname.lastname@example.org
Connector 48 from the logs is eDP-1.
What's interesting is that I see this:
[ 392.746337] [drm:drm_mode_prune_invalid] Not using 1920x1080 mode: CLOCK_HIGH
Some more debugging from dmesg..
[ 382.469405] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:48:eDP-1]
[ 382.469523] [drm:intel_dp_detect [i915]] [CONNECTOR:48:eDP-1]
[ 382.469656] [drm:intel_power_well_enable [i915]] enabling DC off
[ 382.469777] [drm:gen9_set_dc_state [i915]] Setting DC state from 02 to 00
[ 382.469871] [drm:intel_dp_detect [i915]] Display Port TPS3 support: source yes, sink yes
[ 382.469947] [drm:intel_dp_print_rates [i915]] source rates: 162000, 216000, 270000, 324000, 432000, 540000
[ 382.470015] [drm:intel_dp_print_rates [i915]] sink rates: 162000, 270000
[ 382.470081] [drm:intel_dp_print_rates [i915]] common rates: 162000, 270000
[ 382.470249] [drm:edp_panel_vdd_on [i915]] Turning eDP port A VDD on
[ 382.470501] [drm:edp_panel_vdd_on [i915]] PP_STATUS: 0x80000008 PP_CONTROL: 0x0000000f
[ 382.471733] [drm:drm_dp_read_desc] DP sink: OUI 00-22-b9 dev-ID sivarT HW-rev 0.0 SW-rev 0.0 quirks 0x0000
[ 382.472916] [drm:drm_edid_to_eld] ELD: no CEA Extension found
[ 382.472944] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:48:eDP-1] probed modes :
[ 382.472970] [drm:drm_mode_debug_printmodeline] Modeline 49:"1920x1080" 60 138700 1920 1968 2000 2080 1080 1083 1088 1111 0x48 0x9
Ok, looks good.
Now getting weird HPD and tons of dp_aux_ch timeouts..
But what I also noticed..
$ grep ERROR dmesg-during.log
[ 385.088911] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[ 390.392461] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[ 391.236144] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status
[ 407.184193] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status
[ 417.936164] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status
[ 420.311740] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status
So I guess we fail with warn, without adding the dmesg error?
It should be dmesg-fail I think, from piglit/framework/dmesg.py
When adding a quick test for this in igt/tests/meta_test.c, it seems the bug is in intel-CI. The piglit html summary is correctly generated and shows it as dmesg-fail.
Patch sent https://patchwork.freedesktop.org/series/26130/
Manasi, now that you fixed the failing platform, could you have a look into this IGT bug?
Basically, IGT does not know how to deal with the kernel pruning modes. Do we want to simply ignore this possibility or do we want to be more robust to it?
At the very least, it would be nice to add a debug message saying that the modes may all have been pruned, according to the DP spec.
*** Bug 101519 has been marked as a duplicate of this bug. ***
Assigning Manasi to this issue, because it is a fallout of the link-status patch.
We may argue all we want about whether the platform is bad or not, but the tests are for sure wrong since they don't check for their dependencies. Sorry to through you under the bus Manasi, but this is only a problem after your patch.
I see the old logs, do you have the dmesg logs for the most recent testing after the T12 delay fix patch went in.
Do we still see AUX timeouts? Like I mentioned, the problem here is those aux timeouts. We should not have those since for an eDP panel we should never fail link training. These aux timeouts are putting the system in an unexpected state.
(In reply to Manasi from comment #8)
> I see the old logs, do you have the dmesg logs for the most recent testing
> after the T12 delay fix patch went in.
> Do we still see AUX timeouts? Like I mentioned, the problem here is those
> aux timeouts. We should not have those since for an eDP panel we should
> never fail link training. These aux timeouts are putting the system in an
> unexpected state.
Sure, it should not happen, but the failure mode taken is also wrong.
Why are we pruning the mode when we are failing the enable the link training? It should only prune a mode if it failed to perform the actual link training.
You still need to send the hotplug even though, to ask the userspace to re-do the modeset.
And also, we should NEVER prune the last mode. It is not a problem on DP, but it is on eDP. The quickest fix would be to never prune modes on eDP since there is only one mode anyway.
Here are newer logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2903/fi-skl-6700hq/igt@email@example.com
Please get familiar with the results we export on https://intel-gfx-ci.01.org, the landing page describes everything. If it is confusing, talk to me about it on IRC.
Probably a direct consequence of bug #101144 and the dp link status patch then nuking the mode list.
Time to revert link status handling since it doesn't work?
No i think time to fix the AUX CH timeouts for good.
That's the real cause not the DP link status patch.
Although might be a good idea to remove the link status and fallback handling if its eDP since in case of eDP link training should not fail and if it does then we just fail no need to rtry with different mode sinec there is only 1 mode.
(In reply to Manasi from comment #12)
> No i think time to fix the AUX CH timeouts for good.
> That's the real cause not the DP link status patch.
> Although might be a good idea to remove the link status and fallback
> handling if its eDP since in case of eDP link training should not fail and
> if it does then we just fail no need to rtry with different mode sinec there
> is only 1 mode.
Yes, nuke the mode pruning from eDP all together. This will allow us to un-blacklist a lot of tests for this platform and we'll be able to start checking what to do with the AUX channel issue.
Systems has been quite stable lately.
(In reply to Jani Saarinen from comment #14)
> Systems has been quite stable lately.
Has anyone commited anything that would affect the machine?
not sure. Asked on intel-gfx ml that too ;)
I resolve now and wait to pop up again.
So the patch that I submitted to fix the T12 delay and increase it further to 900ms got merged (SHA: 5b2eff59160e) so that could have fixed the issue of AUX timeouts on SKL system.
Came back again on CI_DRM_3055
Series now merged:
No Jani, this bug has not been addressed yet.
Manasi thinks it does....
Manasi, did you land the patch that prevents pruning the last mode?
Ok so this bug was mainly caused by the UAX timeouts seen on the panel and the mdoe getting pruned because of that.
So this bug should be fixed because the patch series that prevents the AUX timeouts got merged.
As a precaution, we should not prune the preferred mode on eDP.
I had submitted a patch for that as well:
And I am stillw aiting to get some feedback on this since I really want to know how to handle the case where the preferred mode cannot be handled by lowered link rate and we don't prune it but then in the next modeset we get encoder config failure since the requested BW > available BW.
What to do in that case?