Bug 101518

Summary:	[BAT][SKL] WARNING: no modes for connector 48 when running igt@kms_setmode@basic-clone-single-crtc
Product:	DRI	Reporter:	Martin Peres <martin.peres>
Component:	IGT	Assignee:	Manasi <manasi.d.navare>
Status:	CLOSED FIXED	QA Contact:
Severity:	critical
Priority:	medium	CC:	intel-gfx-bugs, manasi.d.navare
Version:	DRI git
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:	ALL	i915 features:

Description Martin Peres 2017-06-20 13:38:03 UTC

When running the test igt@kms_setmode@basic-clone-single-crtc on fi-skl-6700hq, starting from CI_DRM_2744, we get the following message in the test's stderr:
(kms_setmode:4033) igt-kms-WARNING: no modes for connector 48

Full logs: https://intel-gfx-ci.01.org/CI/CI_DRM_2744/fi-skl-6700hq/igt@kms_setmode@basic-clone-single-crtc.html

Comment 1 Maarten Lankhorst 2017-06-21 08:57:29 UTC

Connector 48 from the logs is eDP-1.

What's interesting is that I see this:
[  392.746337] [drm:drm_mode_prune_invalid] Not using 1920x1080 mode: CLOCK_HIGH

Some more debugging from dmesg..

[  382.469405] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:48:eDP-1]
[  382.469523] [drm:intel_dp_detect [i915]] [CONNECTOR:48:eDP-1]
[  382.469656] [drm:intel_power_well_enable [i915]] enabling DC off
[  382.469777] [drm:gen9_set_dc_state [i915]] Setting DC state from 02 to 00
[  382.469871] [drm:intel_dp_detect [i915]] Display Port TPS3 support: source yes, sink yes
[  382.469947] [drm:intel_dp_print_rates [i915]] source rates: 162000, 216000, 270000, 324000, 432000, 540000
[  382.470015] [drm:intel_dp_print_rates [i915]] sink rates: 162000, 270000
[  382.470081] [drm:intel_dp_print_rates [i915]] common rates: 162000, 270000
[  382.470249] [drm:edp_panel_vdd_on [i915]] Turning eDP port A VDD on
[  382.470501] [drm:edp_panel_vdd_on [i915]] PP_STATUS: 0x80000008 PP_CONTROL: 0x0000000f
[  382.471733] [drm:drm_dp_read_desc] DP sink: OUI 00-22-b9 dev-ID sivarT HW-rev 0.0 SW-rev 0.0 quirks 0x0000
[  382.472916] [drm:drm_edid_to_eld] ELD: no CEA Extension found
[  382.472944] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:48:eDP-1] probed modes :
[  382.472970] [drm:drm_mode_debug_printmodeline] Modeline 49:"1920x1080" 60 138700 1920 1968 2000 2080 1080 1083 1088 1111 0x48 0x9

Ok, looks good.

Now getting weird HPD and tons of dp_aux_ch timeouts..

But what I also noticed..

$ grep ERROR dmesg-during.log 
[  385.088911] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[  390.392461] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[  391.236144] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status
[  407.184193] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status
[  417.936164] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status
[  420.311740] [drm:intel_dp_check_link_status [i915]] *ERROR* Failed to get link status

So I guess we fail with warn, without adding the dmesg error?

Comment 2 Maarten Lankhorst 2017-06-21 09:06:52 UTC

It should be dmesg-fail I think, from piglit/framework/dmesg.py

Comment 3 Maarten Lankhorst 2017-06-21 09:24:27 UTC

When adding a quick test for this in igt/tests/meta_test.c, it seems the bug is in intel-CI. The piglit html summary is correctly generated and shows it as dmesg-fail.

Comment 4 Maarten Lankhorst 2017-06-21 09:32:50 UTC

Patch sent https://patchwork.freedesktop.org/series/26130/

Comment 5 Martin Peres 2017-07-05 07:50:09 UTC

Manasi, now that you fixed the failing platform, could you have a look into this IGT bug?

Basically, IGT does not know how to deal with the kernel pruning modes. Do we want to simply ignore this possibility or do we want to be more robust to it?

At the very least, it would be nice to add a debug message saying that the modes may all have been pruned, according to the DP spec.

Comment 6 Martin Peres 2017-07-05 07:51:44 UTC

*** Bug 101519 has been marked as a duplicate of this bug. ***

Comment 7 Martin Peres 2017-07-31 13:08:19 UTC

Assigning Manasi to this issue, because it is a fallout of the link-status patch.

We may argue all we want about whether the platform is bad or not, but the tests are for sure wrong since they don't check for their dependencies. Sorry to through you under the bus Manasi, but this is only a problem after your patch.

Comment 8 Manasi 2017-08-03 18:25:45 UTC

I see the old logs, do you have the dmesg logs for the most recent testing after the T12 delay fix patch went in.
Do we still see AUX timeouts? Like I mentioned, the problem here is those aux timeouts. We should not have those since for an eDP panel we should never fail link training. These aux timeouts are putting the system in an unexpected state.

Comment 9 Martin Peres 2017-08-03 20:13:14 UTC

(In reply to Manasi from comment #8)
> I see the old logs, do you have the dmesg logs for the most recent testing
> after the T12 delay fix patch went in.
> Do we still see AUX timeouts? Like I mentioned, the problem here is those
> aux timeouts. We should not have those since for an eDP panel we should
> never fail link training. These aux timeouts are putting the system in an
> unexpected state.

Sure, it should not happen, but the failure mode taken is also wrong.

Why are we pruning the mode when we are failing the enable the link training? It should only prune a mode if it failed to perform the actual link training.

You still need to send the hotplug even though, to ask the userspace to re-do the modeset.

And also, we should NEVER prune the last mode. It is not a problem on DP, but it is on eDP. The quickest fix would be to never prune modes on eDP since there is only one mode anyway.

Comment 10 Martin Peres 2017-08-03 20:15:41 UTC

Here are newer logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2903/fi-skl-6700hq/igt@kms_busy@basic-flip-default-a.html

Please get familiar with the results we export on https://intel-gfx-ci.01.org, the landing page describes everything. If it is confusing, talk to me about it on IRC.

Comment 11 Daniel Vetter 2017-08-07 09:05:50 UTC

Probably a direct consequence of bug #101144 and the dp link status patch then nuking the mode list.

Time to revert link status handling since it doesn't work?

Comment 12 Manasi 2017-08-15 19:46:54 UTC

No i think time to fix the AUX CH timeouts for good.
That's the real cause not the DP link status patch.
Although might be a good idea to remove the link status and fallback handling if its eDP since in case of eDP link training should not fail and if it does then we just fail no need to rtry with different mode sinec there is only 1 mode.

Manasi

Comment 13 Martin Peres 2017-08-15 20:12:34 UTC

(In reply to Manasi from comment #12)
> No i think time to fix the AUX CH timeouts for good.
> That's the real cause not the DP link status patch.
> Although might be a good idea to remove the link status and fallback
> handling if its eDP since in case of eDP link training should not fail and
> if it does then we just fail no need to rtry with different mode sinec there
> is only 1 mode.
> 
> Manasi

Yes, nuke the mode pruning from eDP all together. This will allow us to un-blacklist a lot of tests for this platform and we'll be able to start checking what to do with the AUX channel issue.

Comment 14 Jani Saarinen 2017-09-05 11:52:46 UTC

Systems has been quite stable lately.

Comment 15 Martin Peres 2017-09-05 13:02:17 UTC

(In reply to Jani Saarinen from comment #14)
> Systems has been quite stable lately.

Has anyone commited anything that would affect the machine?

Comment 16 Jani Saarinen 2017-09-05 14:22:50 UTC

not sure. Asked on intel-gfx ml that too ;)

Comment 17 Jani Saarinen 2017-09-05 14:31:42 UTC

I resolve now and wait to pop up again.

Comment 18 Manasi 2017-09-05 18:49:53 UTC

So the patch that I submitted to fix the T12 delay and increase it further to 900ms got merged (SHA: 5b2eff59160e) so that could have fixed the issue of AUX timeouts on SKL system.

Comment 19 Jani Saarinen 2017-09-07 17:35:55 UTC

Came back again on  CI_DRM_3055
https://intel-gfx-ci.01.org/tree/drm-tip/fi-skl-6700hq.html

Comment 20 Jani Saarinen 2017-10-05 06:59:37 UTC

Series now merged:
https://patchwork.freedesktop.org/series/31361/

Comment 21 Martin Peres 2017-10-05 08:18:22 UTC

No Jani, this bug has not been addressed yet.

Comment 22 Jani Saarinen 2017-10-06 07:47:56 UTC

Manasi thinks it does....

Comment 23 Martin Peres 2017-10-06 09:53:30 UTC

Manasi, did you land the patch that prevents pruning the last mode?

Comment 24 Manasi 2017-10-06 19:19:24 UTC

Ok so this bug was mainly caused by the UAX timeouts seen on the panel and the mdoe getting pruned because of that.
So this bug should be fixed because the patch series that prevents the AUX timeouts got merged.
As a precaution, we should not prune the preferred mode on eDP.
I had submitted a patch for that as well:
https://patchwork.freedesktop.org/series/31102/
And I am stillw aiting to get some feedback on this since I really want to know how to handle the case where the preferred mode cannot be handled by lowered link rate and we don't prune it but then in the next modeset we get encoder config failure since the requested BW > available BW.
What to do in that case?

Manasi

Comment 25 Jani Saarinen 2018-05-23 10:23:24 UTC

Ping, what to do with this?

Comment 26 Martin Peres 2018-05-23 21:34:54 UTC

(In reply to Jani Saarinen from comment #25)
> Ping, what to do with this?

The patch Manasi was proposing needs to land. Then we also need to make IGT more resistant against modes disappearing (skip instead of fail).

Comment 27 Martin Peres 2018-06-19 13:53:07 UTC

(In reply to Martin Peres from comment #26)
> (In reply to Jani Saarinen from comment #25)
> > Ping, what to do with this?
> 
> The patch Manasi was proposing needs to land. Then we also need to make IGT
> more resistant against modes disappearing (skip instead of fail).

Manasi, could you please prioritise this bug? We will be expecting an update weekly on this issue until this is resolved...

Comment 28 Manasi 2018-06-22 01:51:37 UTC

The patch for not pruning the modes for eDP got merged already in drm-tip:

drm/i915/edp: Do not do link training fallback or prune modes on EDP

Could you check if this failure is still seen?

I also have a new patch to handle link training on EDP in a better way on the M-L:
https://patchwork.freedesktop.org/patch/223573/

This needs a respin as per Jani's comments and I am working on it.

Regards
Manasi

Comment 29 Martin Peres 2018-06-22 10:07:04 UTC

(In reply to Manasi from comment #28)
> The patch for not pruning the modes for eDP got merged already in drm-tip:
> 
> drm/i915/edp: Do not do link training fallback or prune modes on EDP
> 
> Could you check if this failure is still seen?
> 
> I also have a new patch to handle link training on EDP in a better way on
> the M-L:
> https://patchwork.freedesktop.org/patch/223573/
> 
> This needs a respin as per Jani's comments and I am working on it.
> 
> Regards
> Manasi

Thanks Manasi! Dropping the priority now as the likeliness of this issue to be hit ever again is low.

Let's close the bug after we finish reviewing IGT tests assuming that modes cannot disappear. Could you try to hack something to prune modes when running a test and seeing how IGT tests react to it? This is something you can do locally.

Comment 30 Manasi 2019-01-10 23:56:02 UTC

Hi Martin,

Can we close this bug since the patches to fix this have already been upstreamed

Manasi

Comment 31 Lakshmi 2019-02-19 08:34:10 UTC

Closing this bug as fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.