Bug 109398 - booting gives black screen [drm:intel_dp_start_link_train] [CONNECTOR:67:eDP-1] Link Training failed at link rate = 162000, lane count = 2
Summary: booting gives black screen [drm:intel_dp_start_link_train] [CONNECTOR:67:eDP-...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium blocker
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev,
Keywords: regression
: 109399 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-01-20 11:02 UTC by Seppe
Modified: 2019-03-07 09:37 UTC (History)
3 users (show)

See Also:
i915 platform: BDW
i915 features: display/eDP


Attachments
log files and computer information (203.84 KB, application/gzip)
2019-01-20 11:02 UTC, Seppe
no flags Details
journal booting the kernel minus two commits (223.98 KB, text/plain)
2019-01-28 20:33 UTC, Seppe
no flags Details

Description Seppe 2019-01-20 11:02:56 UTC
Created attachment 143169 [details]
log files and computer information

Booting gives black screen with flickering. No recovery possible.


I am using the 5.0.0-rc2 kernel from:
> git clone git://anongit.freedesktop.org/drm-tip
I have incuded a WiFi driver and btrfs to the .config made with "make
defconfig".


I have used the "splash=silent quiet showopts drm.debug=0xe log_buf_len=4M"
kernel parameter to boot. The kernel boots into a flashing screen and the
computer can not be used. No image is visable. Text terminals (ctrl-alt-F1)
are also not visable. The flickering is mostly at the bottem of the screen,
once every two seconds (or something like that time).

The exact proces is the following:
- grub loads
- the kernel boots
- screen goes black, start flashing. White flashes about every second.
- the Opensuse tumblewheet logo appears which shows motion
- screen goes black with small flashes of white at the bottem of the screen.


This behavior is not present when running the opensuse leap 15.0 kernel :
kernel-default-4.12.14-lp150.11.4.x86_64.rpm
http://download.opensuse.org/distribution/leap/15.0/repo/oss/x86_64/kernel-default-4.12.14-lp150.11.4.x86_64.rpm.mirrorlist

I also made a commend about this bug at
https://bugzilla.suse.com/show_bug.cgi?id=1119621#c15
Since i have tested it on the drm-tip kernel, i have reported the (same?) bug
here.  

Note that the hardware i am using is a chromebook (Dell). The firmware is
having bugs, which i can not solve. This can also interfere here. 


In the logs is a file i made after the next (normal) boot:
# sudo journalctl --boot=-1 > journalctl_boot.txt

I could not make contact with the machine remotely, since wifi did not came
up. I therefore could only make a log afterwards. Additional log files are
therefore not included.

In the 'journalctl_boot.txt' i see the following line repeatetly:
[drm:intel_dp_start_link_train] [CONNECTOR:67:eDP-1] Link Training failed at link rate = 162000, lane count = 2
This with a whole lot of other lines. There is no clear error or warning in
the log.


Included is also a file "dmesg_before.txt". This is from a boot when additionaly
the "i915.fastboot=1" kernel parameter is set. Then the computer boots
correct. This file is the same as the file included in an other
bug-report. There the command "xset dpms force off" triggers the same
state. The file "dmesg_before.txt" is from before that command is given, thus
is from a normal running state.

Also includes are the machine info and lspci info, made when the machine is
running with the "i915.fastboot=1" kernel parameter set.


> inxi -bxx
System:    Host: linux-6axq Kernel: 5.0.0-rc2test-g1fb7b31e074e x86_64 bits: 64 compiler: gcc v: 8.2.1 Console: N/A 
           dm: LightDM Distro: openSUSE Tumbleweed 20190115 
Machine:   Type: Desktop System: GOOGLE product: Lulu v: Pilot serial: 123456789 Chassis: type: 3 serial: N/A 
           Mobo: N/A model: N/A serial: N/A BIOS: coreboot v: N/A date: 03/28/2016 
Battery:   ID-1: BAT0 charge: 41.2 Wh condition: 70.1/67.0 Wh (105%) volts: 11.6/11.4 model: SMP-LIS DELL MJ serial: 0340 
           status: Discharging 
CPU:       Dual Core: Intel Core i3-5005U type: MT MCP arch: Broadwell speed: 1743 MHz min/max: 500/2000 MHz 
Graphics:  Device-1: Intel HD Graphics 5500 driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:1616 
           Display: server: X.org 1.20.3 driver: intel tty: 180x31 
           Message: Advanced graphics data unavailable in console for root. 
Network:   Device-1: Intel Wireless 7260 driver: iwlwifi v: kernel port: 1840 bus ID: 01:00.0 chip ID: 8086:08b1 
Drives:    Local Storage: total: 111.79 GiB used: 71.60 GiB (64.0%) 
Info:      Processes: 189 Uptime: N/A Memory: 3.78 GiB used: 410.2 MiB (10.6%) Init: systemd v: 239 runlevel: 5 
           target: graphical.target Compilers: gcc: 8.2.1 alt: 8 Shell: dump-info-befor inxi: 3.0.29
Comment 1 Seppe 2019-01-20 11:15:27 UTC
I have made a new bug report of a bug triggered by the command:
> xset dpms force off
The behavior and logs of the bug are similar. I think this is related.
This is https://bugs.freedesktop.org/show_bug.cgi?id=109399
Comment 2 Seppe 2019-01-22 19:16:21 UTC
I think this is a duplicate of:
https://bugs.freedesktop.org/show_bug.cgi?id=109215
Comment 3 Seppe 2019-01-26 08:54:38 UTC
solution (on the mainline kernel):
git revert 49218c83e25b6f0708f246b07d570b2c43a98223

The problem is caused by the commit 49218c83e25b6f0708f246b07d570b2c43a98223
"drm/i915/dp: Link train Fallback on eDP only if fallback link BW can fit panel's native mode".

I have not dig into this code, but only tried is reverting helps. It does. The problem is gone without this commit.
Comment 4 Seppe 2019-01-26 08:56:43 UTC
*** Bug 109399 has been marked as a duplicate of this bug. ***
Comment 5 Jani Saarinen 2019-01-28 07:13:41 UTC
Manasi, patch is yours. Jani, this seems regression.
Comment 6 Jani Nikula 2019-01-28 10:25:09 UTC
(In reply to Seppe from comment #0)
> Created attachment 143169 [details]
> log files and computer information

Please attach plain text files, one attachment per file.
Comment 7 Jani Nikula 2019-01-28 10:42:40 UTC
Regression of a fix to another regression, makes your head explode.

When we started to handle clock recovery failures on link training, we knew there were eDP panels out there that failed the initial clock recovery but passed it in the channel equalization phase. We no longer have that failure path within channel equalization.

We failed to root cause the original problem, and added two layers of duct tape on top. So here we are.

---

Please try reverting *both*

1e712535c51a ("drm/i915/dp: Link train Fallback on eDP only if fallback link BW can fit panel's native mode")

and

c0cfb10d9e1d ("drm/i915/edp: Do not do link training fallback or prune modes on EDP")

How does that work?

Please do *not* use the i915.fastboot parameter for any further testing.
Comment 8 Jani Nikula 2019-01-28 10:43:38 UTC
And please add drm.debug=14 module parameter, attach dmesg all the way from boot to the problem.
Comment 9 Seppe 2019-01-28 20:33:14 UTC
Created attachment 143244 [details]
journal booting the kernel minus two commits
Comment 10 Seppe 2019-01-28 20:35:04 UTC
On the stable-kernel i did:

> git checkout -b branch.4.20.4.test v4.20.4
> git revert 1e712535c51ab025ebc776d4405683d81521996d
Revert "drm/i915/dp: Link train Fallback on eDP only if fallback link BW can fit panel's native mode"
> git revert c0cfb10d9e1de490e36d3b9d4228c0ea0ca30677
Revert "drm/i915/edp: Do not do link training fallback or prune modes on EDP" revert

I build this kernel with the default .config options for the graphics. 
Booted with the kernel parameters: splash=silent quiet showopts drm.debug=14 log_buf_len=4M

The result:
Two white flashes, then a black screen. 
No logo, no nothing.

I could not login remotely. I therefore attached the full log from the journal.
Comment 11 Seppe 2019-01-28 20:43:34 UTC
The log.txt file shows:

Jan 28 21:01:22 linux-6axq kernel: [drm:intel_power_well_enable [i915]] enabling always-on
Jan 28 21:01:25 linux-6axq kernel: [drm:intel_power_well_disable [i915]] disabling always-on
Jan 28 21:02:00 linux-6axq kernel: [drm:intel_power_well_enable [i915]] enabling always-on
Jan 28 21:02:03 linux-6axq kernel: [drm:intel_power_well_disable [i915]] disabling always-on
Jan 28 21:03:00 linux-6axq kernel: [drm:intel_power_well_enable [i915]] enabling always-on
Jan 28 21:03:03 linux-6axq kernel: [drm:intel_power_well_disable [i915]] disabling always-on
Jan 28 21:04:00 linux-6axq kernel: [drm:intel_power_well_enable [i915]] enabling always-on
Jan 28 21:04:03 linux-6axq kernel: [drm:intel_power_well_disable [i915]] disabling always-on
Jan 28 21:04:30 linux-6axq kernel: [drm:intel_power_well_enable [i915]] enabling always-on
Jan 28 21:04:33 linux-6axq kernel: [drm:intel_power_well_disable [i915]] disabling always-on


When closing the lid of the laptop, the computer enters state S3. Still the battery consumption is higher than expected. The log then shows the same disable / enable sequence. I assume this is causing the higher than expected battery drain. I thought i would look into this later, but it might be related.
Comment 12 Seppe 2019-01-28 20:49:01 UTC
(In reply to Jani Nikula from comment #6)
> (In reply to Seppe from comment #0)
> > Created attachment 143169 [details]
> > log files and computer information
> 
> Please attach plain text files, one attachment per file.

I have now included a text file. 

Do you want the previous attachment (the tar.zip files) attached as separate files? Or did you mean just the new ones?
Comment 13 Manasi 2019-01-31 19:51:52 UTC
You cannot absolutely revert the "drm/i915/edp: Do not do link training fallback or prune modes on EDP" this patch since now with reverting this, it is falling back to lower value and pruning out the only native mode on the panel.

I think clearly this is a panel where we need to follow a sequence of retrying the CR after 5 failures on Channel EQ. To test this we will need to revert all the patches that fix the CR and Channel EQ sequencing for compliance.

Let me also meanwhile try to create a patch that retries CR after Channel EQ failure.

Manasi
Comment 14 Seppe 2019-03-05 21:22:01 UTC
For me this can be marked as solved


I have changed/updated the BIOS to an UEFI one (and also updated the system). This because the kernel sometimes failed to load a hibernate image. Errors in dmesg then pointing at BIOS errors. These were the following two errors:

: Hibernate inconsistent memory map detected!
: PM: Image mismatch: architecture specific data

I was using the BIOS from the "Install/Update RW_LEGACY Firmware" option from Mr Chromebox ChromeOS Firmware Utility Script (https://mrchromebox.tech/#fwscript). I replaced this BIOS with "Full ROM firmware" option from the same script. This essentially replaced the whole BIOS with a new UEFI based one. This required me to remove the write-protect screw from my Dell Chromebook.

After reinstalling the system (required because of the UEFI system) i found that the new BIOS solved my hibernate problem, but also the black screen problem described on the page here. This was surprising for me, since the RW_LEGACY option is a normal option to use and i could not find bad side effects of it on the internet. I did not thought that changing the BIOS would also solve the "Black screen" problem described here. Else i would have tried this earlier.

I can now conclude that also changing something in the BIOS, solves this problem, apart from the "git revert 49218c83e25b6f0708f246b07d570b2c43a98223" solution.

I will no longer be able to reproduce the error. Only by flashing back the original BIOS and reinstall the system, this could be done. This is to much work for me. 

I can not change the status to "solved", i therefore leave it as it is.

If additional information is required i can provide it.
Comment 15 Lakshmi 2019-03-07 09:37:50 UTC
Thanks for your feedback Seppe.
To proceed further we need a reproducer. Until then I can close this issue.
Please reopen the issue if it occurs with latest drmtip. (https://cgit.freedesktop.org/drm-tip)

Remember to attach the full dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.