Bug 101518 - [BAT][SKL] WARNING: no modes for connector 48 when running igt@kms_setmode@basic-clone-single-crtc
Summary: [BAT][SKL] WARNING: no modes for connector 48 when running igt@kms_setmode@ba...
Status: REOPENED
Alias: None
Product: DRI
Classification: Unclassified
Component: IGT (show other bugs)
Version: DRI git
Hardware: Other All
: high critical
Assignee: Manasi
QA Contact:
URL:
Whiteboard:
Keywords:
: 101519 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-06-20 13:38 UTC by Martin Peres
Modified: 2018-06-19 13:53 UTC (History)
2 users (show)

See Also:
i915 platform: ALL
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2017-06-20 13:38:03 UTC
When running the test igt@kms_setmode@basic-clone-single-crtc on fi-skl-6700hq, starting from CI_DRM_2744, we get the following message in the test's stderr:
(kms_setmode:4033) igt-kms-WARNING: no modes for connector 48

Full logs: https://intel-gfx-ci.01.org/CI/CI_DRM_2744/fi-skl-6700hq/igt@kms_setmode@basic-clone-single-crtc.html
Comment 1 Maarten Lankhorst 2017-06-21 08:57:29 UTC
Connector 48 from the logs is eDP-1.

What's interesting is that I see this:
[  392.746337] [drm:drm_mode_prune_invalid] Not using 1920x1080 mode: CLOCK_HIGH

Some more debugging from dmesg..

[  382.469405] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:48:eDP-1]
[  382.469523] [drm:intel_dp_detect [i915]] [CONNECTOR:48:eDP-1]
[  382.469656] [drm:intel_power_well_enable [i915]] enabling DC off
[  382.469777] [drm:gen9_set_dc_state [i915]] Setting DC state from 02 to 00
[  382.469871] [drm:intel_dp_detect [i915]] Display Port TPS3 support: source yes, sink yes
[  382.469947] [drm:intel_dp_print_rates [i915]] source rates: 162000, 216000, 270000, 324000, 432000, 540000
[  382.470015] [drm:intel_dp_print_rates [i915]] sink rates: 162000, 270000
[  382.470081] [drm:intel_dp_print_rates [i915]] common rates: 162000, 270000
[  382.470249] [drm:edp_panel_vdd_on [i915]] Turning eDP port A VDD on
[  382.470501] [drm:edp_panel_vdd_on [i915]] PP_STATUS: 0x80000008 PP_CONTROL: 0x0000000f
[  382.471733] [drm:drm_dp_read_desc] DP sink: OUI 00-22-b9 dev-ID sivarT HW-rev 0.0 SW-rev 0.0 quirks 0x0000
[  382.472916] [drm:drm_edid_to_eld] ELD: no CEA Extension found
[  382.472944] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:48:eDP-1] probed modes :
[  382.472970] [drm:drm_mode_debug_printmodeline] Modeline 49:"1920x1080" 60 138700 1920 1968 2000 2080 1080 1083 1088 1111 0x48 0x9

Ok, looks good.

Now getting weird HPD and tons of dp_aux_ch timeouts..

But what I also noticed..

$ grep ERROR dmesg-during.log 
[  385.088911] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[  390.392461] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[  391.236144] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status
[  407.184193] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status
[  417.936164] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status
[  420.311740] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status

So I guess we fail with warn, without adding the dmesg error?
Comment 2 Maarten Lankhorst 2017-06-21 09:06:52 UTC
It should be dmesg-fail I think, from piglit/framework/dmesg.py
Comment 3 Maarten Lankhorst 2017-06-21 09:24:27 UTC
When adding a quick test for this in igt/tests/meta_test.c, it seems the bug is in intel-CI. The piglit html summary is correctly generated and shows it as dmesg-fail.
Comment 4 Maarten Lankhorst 2017-06-21 09:32:50 UTC
Patch sent https://patchwork.freedesktop.org/series/26130/
Comment 5 Martin Peres 2017-07-05 07:50:09 UTC
Manasi, now that you fixed the failing platform, could you have a look into this IGT bug?

Basically, IGT does not know how to deal with the kernel pruning modes. Do we want to simply ignore this possibility or do we want to be more robust to it?

At the very least, it would be nice to add a debug message saying that the modes may all have been pruned, according to the DP spec.
Comment 6 Martin Peres 2017-07-05 07:51:44 UTC
*** Bug 101519 has been marked as a duplicate of this bug. ***
Comment 7 Martin Peres 2017-07-31 13:08:19 UTC
Assigning Manasi to this issue, because it is a fallout of the link-status patch.

We may argue all we want about whether the platform is bad or not, but the tests are for sure wrong since they don't check for their dependencies. Sorry to through you under the bus Manasi, but this is only a problem after your patch.
Comment 8 Manasi 2017-08-03 18:25:45 UTC
I see the old logs, do you have the dmesg logs for the most recent testing after the T12 delay fix patch went in.
Do we still see AUX timeouts? Like I mentioned, the problem here is those aux timeouts. We should not have those since for an eDP panel we should never fail link training. These aux timeouts are putting the system in an unexpected state.
Comment 9 Martin Peres 2017-08-03 20:13:14 UTC
(In reply to Manasi from comment #8)
> I see the old logs, do you have the dmesg logs for the most recent testing
> after the T12 delay fix patch went in.
> Do we still see AUX timeouts? Like I mentioned, the problem here is those
> aux timeouts. We should not have those since for an eDP panel we should
> never fail link training. These aux timeouts are putting the system in an
> unexpected state.

Sure, it should not happen, but the failure mode taken is also wrong.

Why are we pruning the mode when we are failing the enable the link training? It should only prune a mode if it failed to perform the actual link training.

You still need to send the hotplug even though, to ask the userspace to re-do the modeset.

And also, we should NEVER prune the last mode. It is not a problem on DP, but it is on eDP. The quickest fix would be to never prune modes on eDP since there is only one mode anyway.
Comment 10 Martin Peres 2017-08-03 20:15:41 UTC
Here are newer logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2903/fi-skl-6700hq/igt@kms_busy@basic-flip-default-a.html

Please get familiar with the results we export on https://intel-gfx-ci.01.org, the landing page describes everything. If it is confusing, talk to me about it on IRC.
Comment 11 Daniel Vetter 2017-08-07 09:05:50 UTC
Probably a direct consequence of bug #101144 and the dp link status patch then nuking the mode list.

Time to revert link status handling since it doesn't work?
Comment 12 Manasi 2017-08-15 19:46:54 UTC
No i think time to fix the AUX CH timeouts for good.
That's the real cause not the DP link status patch.
Although might be a good idea to remove the link status and fallback handling if its eDP since in case of eDP link training should not fail and if it does then we just fail no need to rtry with different mode sinec there is only 1 mode.

Manasi
Comment 13 Martin Peres 2017-08-15 20:12:34 UTC
(In reply to Manasi from comment #12)
> No i think time to fix the AUX CH timeouts for good.
> That's the real cause not the DP link status patch.
> Although might be a good idea to remove the link status and fallback
> handling if its eDP since in case of eDP link training should not fail and
> if it does then we just fail no need to rtry with different mode sinec there
> is only 1 mode.
> 
> Manasi

Yes, nuke the mode pruning from eDP all together. This will allow us to un-blacklist a lot of tests for this platform and we'll be able to start checking what to do with the AUX channel issue.
Comment 14 Jani Saarinen 2017-09-05 11:52:46 UTC
Systems has been quite stable lately.
Comment 15 Martin Peres 2017-09-05 13:02:17 UTC
(In reply to Jani Saarinen from comment #14)
> Systems has been quite stable lately.

Has anyone commited anything that would affect the machine?
Comment 16 Jani Saarinen 2017-09-05 14:22:50 UTC
not sure. Asked on intel-gfx ml that too ;)
Comment 17 Jani Saarinen 2017-09-05 14:31:42 UTC
I resolve now and wait to pop up again.
Comment 18 Manasi 2017-09-05 18:49:53 UTC
So the patch that I submitted to fix the T12 delay and increase it further to 900ms got merged (SHA: 5b2eff59160e) so that could have fixed the issue of AUX timeouts on SKL system.
Comment 19 Jani Saarinen 2017-09-07 17:35:55 UTC
Came back again on  CI_DRM_3055
https://intel-gfx-ci.01.org/tree/drm-tip/fi-skl-6700hq.html
Comment 20 Jani Saarinen 2017-10-05 06:59:37 UTC
Series now merged:
https://patchwork.freedesktop.org/series/31361/
Comment 21 Martin Peres 2017-10-05 08:18:22 UTC
No Jani, this bug has not been addressed yet.
Comment 22 Jani Saarinen 2017-10-06 07:47:56 UTC
Manasi thinks it does....
Comment 23 Martin Peres 2017-10-06 09:53:30 UTC
Manasi, did you land the patch that prevents pruning the last mode?
Comment 24 Manasi 2017-10-06 19:19:24 UTC
Ok so this bug was mainly caused by the UAX timeouts seen on the panel and the mdoe getting pruned because of that.
So this bug should be fixed because the patch series that prevents the AUX timeouts got merged.
As a precaution, we should not prune the preferred mode on eDP.
I had submitted a patch for that as well:
https://patchwork.freedesktop.org/series/31102/
And I am stillw aiting to get some feedback on this since I really want to know how to handle the case where the preferred mode cannot be handled by lowered link rate and we don't prune it but then in the next modeset we get encoder config failure since the requested BW > available BW.
What to do in that case?

Manasi
Comment 25 Jani Saarinen 2018-05-23 10:23:24 UTC
Ping, what to do with this?
Comment 26 Martin Peres 2018-05-23 21:34:54 UTC
(In reply to Jani Saarinen from comment #25)
> Ping, what to do with this?

The patch Manasi was proposing needs to land. Then we also need to make IGT more resistant against modes disappearing (skip instead of fail).
Comment 27 Martin Peres 2018-06-19 13:53:07 UTC
(In reply to Martin Peres from comment #26)
> (In reply to Jani Saarinen from comment #25)
> > Ping, what to do with this?
> 
> The patch Manasi was proposing needs to land. Then we also need to make IGT
> more resistant against modes disappearing (skip instead of fail).

Manasi, could you please prioritise this bug? We will be expecting an update weekly on this issue until this is resolved...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.