Bug 105338 - Regression - system fails to boot with Link Rate Fallback
Summary: Regression - system fails to boot with Link Rate Fallback
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: high major
Assignee: Manasi
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords: bisected, regression
Depends on:
Blocks:
 
Reported: 2018-03-04 18:54 UTC by alexander.wilson
Modified: 2018-10-24 07:33 UTC (History)
4 users (show)

See Also:
i915 platform: BDW
i915 features: display/DP


Attachments
Bisect log of kernel (2.77 KB, text/plain)
2018-03-04 18:54 UTC, alexander.wilson
no flags Details
Xorg.log of nonfunctioning v4.16-rc4 (31.76 KB, text/plain)
2018-03-08 00:36 UTC, alexander.wilson
no flags Details
systemd journal log of nonfunctioning v4.16-rc4 (114.01 KB, text/plain)
2018-03-08 00:36 UTC, alexander.wilson
no flags Details
systemd journal log of functioning v4.17-rc2 (93.29 KB, text/plain)
2018-04-24 20:27 UTC, alexander.wilson
no flags Details
dmesg for 4.17-rc2 (85.26 KB, text/x-log)
2018-04-25 15:31 UTC, alexander.wilson
no flags Details
dmesg.log from latest 4.17rc2 (84.95 KB, text/x-log)
2018-04-25 19:10 UTC, alexander.wilson
no flags Details
dmesg of patched 4.18-rc1 kernel (99.08 KB, text/x-log)
2018-06-24 22:28 UTC, alexander.wilson
no flags Details

Description alexander.wilson 2018-03-04 18:54:05 UTC
Created attachment 137780 [details]
Bisect log of kernel

Commit 9301397a63b3bf1090dffe846c6f1c8efa032236 and the related commit 713946d16f45ad0509434970ae6ff71529faab4b cause the linux kernel boot process to fail on a Dell Chromebook 13 running Arch Linux.

I bisected the kernel and the log is attached. Reverting those two commits fixes the regression.
Comment 1 Elizabeth 2018-03-05 15:43:10 UTC
Is it possible to get xorg.logs and/or kern.logs from the working and non-working kernels? Thank you.
Comment 2 alexander.wilson 2018-03-08 00:35:14 UTC
I'm suddenly having trouble patching the current kernel, will continue on that. For now I'm attaching the Xorg and systemd journal logs for v4.16-rc4 (problematic on current computer)
Comment 3 alexander.wilson 2018-03-08 00:36:03 UTC
Created attachment 137878 [details]
Xorg.log of nonfunctioning v4.16-rc4
Comment 4 alexander.wilson 2018-03-08 00:36:52 UTC
Created attachment 137879 [details]
systemd journal log of nonfunctioning v4.16-rc4
Comment 5 Elizabeth 2018-03-28 21:54:24 UTC
As reference:

commit 9301397a63b3bf1090dffe846c6f1c8efa032236
Author: Manasi Navare <manasi.d.navare@intel.com>
Date:   Thu Apr 6 16:44:19 2017 +0300

    drm/i915: Implement Link Rate fallback on Link training failure

commit 713946d16f45ad0509434970ae6ff71529faab4b
Author: Manasi Navare <manasi.d.navare@intel.com>
Date:   Thu Oct 26 14:52:00 2017 -0700

    drm/i915: Cancel the modeset retry work during modeset cleanup
Comment 6 Jani Saarinen 2018-03-29 07:10:57 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 7 alexander.wilson 2018-04-06 01:39:55 UTC
As to the mass-comment above, the problem persists with the latest pre-upstream tree. In general, if more tests or logs are needed from my system, please don't hesitate to ask.
Comment 8 Jani Saarinen 2018-04-24 06:59:00 UTC
Jani, Manasi, options here?
Comment 9 Jani Nikula 2018-04-24 08:29:18 UTC
(In reply to alexander.wilson from comment #4)
> Created attachment 137879 [details]
> systemd journal log of nonfunctioning v4.16-rc4

That's a completely different problem, fixed by

commit a95845ba184b854106972f5d8f50354c2d272c06
Author: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Date:   Thu Apr 5 06:51:15 2018 -0300

    media: v4l2-core: fix size of devnode_nums[] bitarray

and also backported to v4.16.

The original report here is a bit short on details, but it's possible this has been fixed by

commit a306343bcd7df89d9d45a601929e26866e7b7a81
Author: Manasi Navare <manasi.d.navare@intel.com>
Date:   Thu Oct 12 12:13:38 2017 -0700

    drm/i915/edp: Do not do link training fallback or prune modes on EDP

and also backported to v4.15+.

Please retry with up-to-date kernels.
Comment 10 alexander.wilson 2018-04-24 20:27:29 UTC
Created attachment 139072 [details]
systemd journal log of functioning v4.17-rc2

The latest kernel does indeed run. There is still a related error logged by journald:

kernel: [drm:intel_dp_start_link_train [i915]] *ERROR* [CONNECTOR:65:eDP-1] Link Training failed at link rate >

I am unsure what the consequences of this are, if any. The full journal log is attached.
Comment 11 Jani Nikula 2018-04-25 06:26:32 UTC
(In reply to alexander.wilson from comment #10)
> Created attachment 139072 [details]
> systemd journal log of functioning v4.17-rc2
> 
> The latest kernel does indeed run. There is still a related error logged by
> journald:
> 
> kernel: [drm:intel_dp_start_link_train [i915]] *ERROR* [CONNECTOR:65:eDP-1]
> Link Training failed at link rate >
> 
> I am unsure what the consequences of this are, if any. The full journal log
> is attached.

Please add drm.debug=14 module parameter, and reproduce. Please also make sure the log lines are not cut off. For example, get the dmesg using 'dmesg > dmesg.log' and attach that.
Comment 12 alexander.wilson 2018-04-25 15:31:10 UTC
Created attachment 139104 [details]
dmesg for 4.17-rc2

OK, here is the dmesg with the drm.debug=14 kernel param on the same kernel as the my last log. I'll compile the latest kernel and test that today.
Comment 13 alexander.wilson 2018-04-25 19:10:53 UTC
Created attachment 139109 [details]
dmesg.log from latest 4.17rc2

Replacing dmesg with latest kernel
Comment 14 Manasi 2018-04-25 19:29:06 UTC
Looking at the logs, it looks like on boot the optimum values of link parameters are link rate = 27000 and lane count = 2 but with these it fails in clock recovery phase after 5 retries.
That is why you see the debug message "Link training failed" but then it still recovers and brings up the display after enabling the pipe. It is probably one of those panels that do not handle the voltage swing values according to the spec and link training fails but we still have the display.

Jani, looks like the hack of retrying clock recovery 5 times and then giving up needs to be added back which was removed during the compliance efforts. I can give a test patch that adds these retries before declaring Link failure. What are your thoughts here?

Manasi
Comment 15 Jani Saarinen 2018-05-04 12:22:19 UTC
Jani, any advice to Manasi's comment?
Comment 16 alexander.wilson 2018-05-13 15:23:45 UTC
The status of the bug is currently NEEDINFO. Is there anything more I can provide of use? Current logs show the same link training failure.
Comment 17 Francesco Balestrieri 2018-05-14 12:33:08 UTC
No more info is needed from the reporter AFAICT, back to ASSIGNED.
Comment 18 Manasi 2018-06-22 01:54:31 UTC
I rewrote the link training fallback in this patch:
https://patchwork.freedesktop.org/patch/223573/

Could you try this patch to see if it fixes the issue?

Regards
Manasi
Comment 19 alexander.wilson 2018-06-24 22:28:37 UTC
Created attachment 140308 [details]
dmesg of patched 4.18-rc1 kernel

The patch appears to work! I'm attaching the dmesg.log just in case. Thank you for your effort on this. How might I be able to follow this patch to see when it gets in the mainline and / or LTS?
Comment 20 Manasi 2018-08-15 18:15:51 UTC
Thats great!
I will rebase this patch and submit to Intel-GFX mailing list. If you are subscribed to that, you can track the status there and also give a Tested-By tag if you could.

Manasi
Comment 21 alexander.wilson 2018-08-15 20:07:56 UTC
Great, I've just subscribed to the mailing list. For the Tested-By tag, is this something I would commit to the patch as submitted?
Comment 22 Jani Saarinen 2018-08-16 09:47:21 UTC
(In reply to alexander.wilson from comment #21)
> Great, I've just subscribed to the mailing list. For the Tested-By tag, is
> this something I would commit to the patch as submitted?

You just reply via email with 
Tested-By: your name
Comment 23 Lakshmi 2018-09-10 06:23:03 UTC
Manasi, any updates here?
Comment 24 Manasi 2018-09-10 16:00:32 UTC
I am working on the patch to address the review comments from Jani Nikula to also compare with the downclock mode and disconnecting the downclock mode from drrs mode. 
Lets keep this as assigned for now and then close it once this patch gets upstreamed.

Manasi
Comment 25 Jani Nikula 2018-10-24 07:32:49 UTC
Presumed fixed by

commit 1e712535c51ab025ebc776d4405683d81521996d
Author: Manasi Navare <manasi.d.navare@intel.com>
Date:   Tue Oct 9 14:28:04 2018 -0700

    drm/i915/dp: Link train Fallback on eDP only if fallback link BW can fit panel's native mode

thanks for the report and testing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.