Bug 107503 - Native screen resolution not working correctly over HDMI LSPCON (i915)
Summary: Native screen resolution not working correctly over HDMI LSPCON (i915)
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: highest blocker
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-07 00:59 UTC by Nicholas Stommel
Modified: 2018-08-20 11:26 UTC (History)
4 users (show)

See Also:
i915 platform: KBL, SKL
i915 features: display/HDMI, display/LSPCON


Attachments
dmesg-drm-tip (64.48 KB, text/plain)
2018-08-10 00:02 UTC, Nicholas Stommel
no flags Details
bad-boot-xrandr (6.16 KB, text/plain)
2018-08-10 00:07 UTC, Nicholas Stommel
no flags Details
good-boot-xrandr (8.34 KB, text/plain)
2018-08-10 00:07 UTC, Nicholas Stommel
no flags Details
i915_display_info (bad boot 1280x800) (3.77 KB, text/plain)
2018-08-11 17:18 UTC, Nicholas Stommel
no flags Details
i915_display_info (bad boot 1680x1050) (5.10 KB, text/plain)
2018-08-11 17:19 UTC, Nicholas Stommel
no flags Details
i915_display_info (good boot 1920x1080) (5.66 KB, text/plain)
2018-08-11 17:20 UTC, Nicholas Stommel
no flags Details
Add a retry loop to lspcon_wait_mode (1.10 KB, patch)
2018-08-12 14:47 UTC, Fredrik Schön
no flags Details | Splinter Review
Increase timeout in lspcon_wait_mode (624 bytes, patch)
2018-08-12 18:04 UTC, Fredrik Schön
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Nicholas Stommel 2018-08-07 00:59:28 UTC
When booting Fedora, Ubuntu, OpenSUSE, and really any distribution running Linux kernel 4.16 and above, the Intel integrated graphics card (in my case the Kabylake HD 630) does not correctly enable the connected monitor's native resolution over HDMI on LSPCON. Any kernel 4.15 and below does not cause this issue. This issue is currently confirmed but lacks an actual kernel bug report on the Redhat bugzilla here: https://bugzilla.redhat.com/show_bug.cgi?id=1570392

dmesg logs show: 
[drm:intel_dp_get_link_train_fallback_values [i915]] *ERROR* Link Training Unsuccessful 

Followed by reapeating lines of:
[drm:lspcon_wait_mode [i915]] *ERROR* LSPCON mode hasn't settled 

In particular, 1280x1800 and 1680x1050 are the two resolutions I find myself limited to when the kernel boots and doesn't correctly use the full 1920x1080 native resolution of the monitor (in this case, a Samsung CF591). This issue does not, however, appear related to this particular monitor as two others I have tried end up with the same issue. Nor is it a cable problem, I switched that out several times as well. After much testing, the issue seems very non-deterministic and fairly random indeed. Grub and Plymouth don't appear to be culprits either.

Basically, the kernel isn't correctly setting full HD resolution of the monitor around *half* the time. Booting Fedora 28 or Ubuntu 18.04 with kernel version 4.16 or newer on my machine with Intel graphics using the HDMI LSPCON connector is like tossing a die: I either get Full HD, those two variants of stretched and incorrect low resolution, or (rarely, but still) failure to boot at all. 

After some confusion, I determined that the HDMI option board on the system motherboard of the HP Elitedesk 800 G3 DM is, in fact, using an LSPCON converter from DP->HDMI using this helpful info from Intel i915 dev Imre Deak: "There are two ways to connect HDMI to the APL RVPs: via the DDI1 DP++ plug with an DP->HDMI dongle, or via the DDI0 HDMI plug which is connected to the SoC through the LSPCON converter. You seem to be using the second scenario with LSPCON being in the protocol converter mode (configured as such by BIOS). In that case the connection will show up as a DP connector." 

Other users on machines including Intel NUCs have confirmed this issue with HDMI over LSPCON resulting in incorrect resolution on boot. It appears a regression occurred in LSPCON handling through kernel 4.16 and above.
Comment 1 Imre Deak 2018-08-07 10:21:15 UTC
Could you try if the problem is present in the latest drm-tip kernel:

git://anongit.freedesktop.org/drm-tip  (drm-tip branch)

Please attach a full dmesg log booting with this kernel and the drm.debug=0x1e kernel parameter.

Also could you try bisecting the problem between 4.15 and 4.16 kernel versions?
Comment 2 Nicholas Stommel 2018-08-09 18:12:39 UTC
Huh, after 4.17.7 I cannot replicate the problem. 4.18-rc8 as well as drm-tip (built from today 08/09) are both okay. I rebooted repeatedly selecting drm-tip for around half an hour (seriously) and not a single instance did this strange screen-resolution issue occur. Bisecting the kernel should no longer be necessary, it appears the problem is resolved in drm-tip and the latest release candidate build of 4.18. 

I will mark this issue as resolved for now. If anyone manages to replicate the bug in drm-tip or in the latest rc kernel, please provide Imre Deak on https://bugs.freedesktop.org/show_bug.cgi?id=107503 with a full dmesg log from boot with the "drm.debug=0x1e log_buf_len=10M" kernel parameters. I am now marking this bug as resolved/fixed.
Comment 3 Francesco Balestrieri 2018-08-09 19:07:04 UTC
Thank you for verifying. Closing.
Comment 4 Nicholas Stommel 2018-08-10 00:02:42 UTC
Created attachment 141030 [details]
dmesg-drm-tip

Well, it seems I spoke too soon. This issue is *very* much still present on drm-tip, but for some odd reason I could not reproduce it with drm.debug=0x1e. Fredrik from https://bugzilla.redhat.com/show_bug.cgi?id=1570392 was also unable to reproduce the bug using the "drm.debug=0x1e" kernel parameter. Without using the debugging parameter, the incorrect native resolution bug most certainly happens.

-- Occurrence: Happens roughly over a third of the time, resolution is incorrect on boot.
-- Chipset: Intel® Core™ i7-7700T CPU using Intel® HD Graphics 630 (Kaby Lake GT2)
-- System architecture: x86_64
-- Kernel version (drm-tip) 4.18.0-994-generic
-- Linux distribution: Ubuntu 18.04.1 LTS (also confirmed on Fedora 28, appears distro-independent)
-- Machine: HP Elitedesk 800 G3 DM 35W
-- Display connector: LSPCON HDMI

I have attached the dmesg from boot without the drm.debug=0x1e kernel parameter, I will keep trying to see if I can replicate the bug with this parameter to get the full debugging dmesg. Here you can clearly see in the dmesg sans debugging parameter the lines:

[drm:lspcon_wait_mode [i915]] *ERROR* LSPCON mode hasn't settled
[drm:intel_dp_get_link_train_fallback_values [i915]] *ERROR* Link Training Unsuccessful
Comment 5 Nicholas Stommel 2018-08-10 00:07:08 UTC
Created attachment 141031 [details]
bad-boot-xrandr

I am also attaching the output of xrandr --verbose from a 'good' boot (correct native full HD resolution) and a 'bad' boot (incorrect 1280x800 resolution)
Comment 6 Nicholas Stommel 2018-08-10 00:07:39 UTC
Created attachment 141032 [details]
good-boot-xrandr
Comment 7 Nicholas Stommel 2018-08-10 00:21:32 UTC
Fredrik made a good point about why we might be having difficulties reproducing the bug with the drm.debug=0x1e parameter:

(In reply to Hector Martin from comment #21)
> If the issue is timing-related (which is quite likely given the randomness
> and the error message associated with it) then it's quite possible that
> enabling additional debugging might slow things down enough to stop it from
> happening.

This very well might be a timing related bug, as the drm.debug=0x1e parameter writes hundreds of lines of feedback to the kernel log very quickly. I can confirm this bug is very much indeed still a problem. I have posted my dmesg sans the debugging parameter as well as the output of xrandr --verbose from a good and bad boot. The bug is *very* reproducible without adding any extra kernel parameters on Fedora 28 and Ubuntu 18.04. I will do my best in attempting to find at which point the bug started happening,
Comment 8 Nicholas Stommel 2018-08-10 00:24:38 UTC
...although bisecting the kernel to determine what is wrong, especially when this bug could likely be a timing-related non-deterministic event, may be largely beyond my capabilities unfortunately.
Comment 9 Nicholas Stommel 2018-08-10 01:58:26 UTC
Okay, I can confirm that this bug exists on and started with 4.16-rc1, and cannot be replicated with the any of the 4.15 series up to and including 4.15.18. I'm afraid I cannot pinpoint what exactly went wrong between 4.15.x and 4.16-rc1, as the sheer volume of changes for the first release candidate build alone are massive. I would appreciate if someone could assist and look into this further, as it does constitute a major blocking-level problem and remains present even on drm-tip.
Comment 10 Jani Saarinen 2018-08-10 06:00:40 UTC
Can you also attach output of /sys/kernel/debug/dri/0/i915_display_info?
Comment 11 Nicholas Stommel 2018-08-11 17:18:12 UTC
Created attachment 141045 [details]
i915_display_info (bad boot 1280x800)
Comment 12 Nicholas Stommel 2018-08-11 17:19:08 UTC
Created attachment 141046 [details]
i915_display_info (bad boot 1680x1050)
Comment 13 Nicholas Stommel 2018-08-11 17:20:13 UTC
Created attachment 141047 [details]
i915_display_info (good boot 1920x1080)
Comment 14 Nicholas Stommel 2018-08-11 17:27:11 UTC
Okay, I have attached the output of /sys/kernel/debug/dri/0/i915_display_info for all three cases here: for a 'good boot' where full 1920x1080 native resolution is correctly used and for the two cases of 'bad boots' where the non-native resolutions 1280x800 and 1680x1050 are used. I am currently seeing this issue happen actually closer to *half* of all boots using HDMI LSPCON even on drm-tip, so this does constitute a fairly severe problem.
Comment 15 Nicholas Stommel 2018-08-11 17:37:40 UTC
Even worse, the problem happens and is irreversible after a 'good boot' if xrandr is used to change atomic settings like "Broadcast RGB" as in the following (note how five display modes completely disappear):

$ xrandr
Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 8192 x 8192
DP-1 disconnected (normal left inverted right x axis y axis)
HDMI-1 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
HDMI-2 disconnected (normal left inverted right x axis y axis)
DP-3 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 598mm x 336mm
   1920x1080     60.00*+  50.00    59.94  
   1680x1050     59.88  
   1600x900      60.00  
   1280x1024     60.02  
   1440x900      59.90  
   1280x800      59.91  
   1280x720      60.00    50.00    59.94  
   1024x768      70.07    60.00  
   800x600       72.19    60.32    56.25  
   720x576       50.00  
   720x480       60.00    59.94  
   640x480       66.67    60.00    59.94  
   720x400       70.08  

$ xrandr --output DP-3 --set "Broadcast RGB" "Full"
X Error of failed request:  BadMatch (invalid parameter attributes)
  Major opcode of failed request:  140 (RANDR)
  Minor opcode of failed request:  21 (RRSetCrtcConfig)
  Serial number of failed request:  61
  Current serial number in output stream:  61

$ xrandr
Screen 0: minimum 320 x 200, current 1280 x 800, maximum 8192 x 8192
DP-1 disconnected (normal left inverted right x axis y axis)
HDMI-1 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
HDMI-2 disconnected (normal left inverted right x axis y axis)
DP-3 connected primary 1280x800+0+0 (normal left inverted right x axis y axis) 598mm x 336mm
   1280x800      59.91* 
   1024x768      60.00  
   800x600       72.19    60.32    56.25  
   720x576       50.00  
   720x480       60.00    59.94  
   640x480       66.67    60.00    59.94  
   720x400       70.08  

$ xrandr --output DP-3 --mode 1920x1080
xrandr: cannot find mode 1920x1080
Comment 16 Nicholas Stommel 2018-08-11 17:47:41 UTC
Interestingly, this behavior does not always happen, suggesting that it may be a timing issue indeed. The seemingly random nature of it is confounding. Using the kernel parameter "drm.debug=0x1e", this behavior cannot be replicated for some reason. Booting without that parameter, the issue is very reliably reproduced, much closer to (if not over) half of all boots on 4.16-rc1 all the way to drm-tip.
Comment 17 Fredrik Schön 2018-08-12 14:47:50 UTC
Created attachment 141049 [details] [review]
Add a retry loop to lspcon_wait_mode

Patch attached. It probably needs both testing and polish, but it seems to work for me.
Comment 18 Fredrik Schön 2018-08-12 17:03:00 UTC
I have attempted an analysis of this bug at https://bugzilla.redhat.com/show_bug.cgi?id=1570392#c25
Comment 19 Fredrik Schön 2018-08-12 18:04:55 UTC
Created attachment 141051 [details] [review]
Increase timeout in lspcon_wait_mode

So I overlooked this much simpler fix. 7/7 boots OK, no errors logged.
Comment 20 Nicholas Stommel 2018-08-14 02:27:13 UTC
Fredrik, can you confirm whether changing xrandr settings after after applying your patch and successfully booting in the correct resolution results in display modes disappearing permanently for the connection? Like ending up on 1280x800 or 1680x1050? See comment #15, try flipping through settings repeatedly like color range or refresh rate. Because if that happens after your patch, we have a bigger problem on our hands that may be a result of underlying kernel changes in completely different places than the i915 module. If so, we have a mysterious and particularly nasty case which could involve various parts of the Intel graphics stack and/or changes in the kernel.
Comment 21 Fredrik Schön 2018-08-14 15:40:18 UTC
Confirmed.

I can reproduce your testcase on my machine.

schon@localhost ~]$ xrandr
Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 8192 x 8192
DP-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 521mm x 293mm
   1920x1080     60.00*+  50.00    59.94  
   1680x1050     59.88  
   1600x900      60.00  
   1280x1024     60.02  
   1440x900      59.90  
   1280x800      59.91  
   1280x720      60.00    50.00    59.94  
   1024x768      70.07    60.00  
   800x600       72.19    60.32    56.25  
   720x576       50.00  
   720x480       60.00    59.94  
   640x480       72.81    66.67    60.00    59.94  
   720x400       70.08  
DP-2 disconnected (normal left inverted right x axis y axis)
HDMI-1 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
HDMI-2 disconnected (normal left inverted right x axis y axis)
[schon@localhost ~]$ xrandr | wc -l
19
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full"
X Error of failed request:  BadMatch (invalid parameter attributes)
  Major opcode of failed request:  139 (RANDR)
  Minor opcode of failed request:  21 (RRSetCrtcConfig)
  Serial number of failed request:  61
  Current serial number in output stream:  61
[schon@localhost ~]$ xrandr | wc -l
18
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Automatic"
[schon@localhost ~]$ xrandr | wc -l
18
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full"
[schon@localhost ~]$ xrandr | wc -l
18
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Automatic"
X Error of failed request:  BadMatch (invalid parameter attributes)
  Major opcode of failed request:  139 (RANDR)
  Minor opcode of failed request:  21 (RRSetCrtcConfig)
  Serial number of failed request:  61
  Current serial number in output stream:  61
[schon@localhost ~]$ xrandr | wc -l
13
[schon@localhost ~]$ 

With the increased timeout patch applied modes are no longer dropped.

[schon@localhost ~]$ xrandr
Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 8192 x 8192
DP-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 521mm x 293mm
   1920x1080     60.00*+  50.00    59.94  
   1680x1050     59.88  
   1600x900      60.00  
   1280x1024     60.02  
   1440x900      59.90  
   1280x800      59.91  
   1280x720      60.00    50.00    59.94  
   1024x768      70.07    60.00  
   800x600       72.19    60.32    56.25  
   720x576       50.00  
   720x480       60.00    59.94  
   640x480       72.81    66.67    60.00    59.94  
   720x400       70.08  
DP-2 disconnected (normal left inverted right x axis y axis)
HDMI-1 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
HDMI-2 disconnected (normal left inverted right x axis y axis)
[schon@localhost ~]$ xrandr | wc -l
19
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full"
[schon@localhost ~]$ xrandr | wc -l
19
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Automatic"
[schon@localhost ~]$ xrandr | wc -l
19
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full"
[schon@localhost ~]$ xrandr | wc -l
19
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Automatic"
[schon@localhost ~]$ xrandr | wc -l
19
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full"
[schon@localhost ~]$ xrandr | wc -l
19
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Automatic"
[schon@localhost ~]$ xrandr | wc -l
19
[schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full"
[schon@localhost ~]$ xrandr | wc -l
19
[schon@localhost ~]$
Comment 22 Nicholas Stommel 2018-08-15 03:11:08 UTC
Ah, Fredrik, your patch works well for me! No weird stuff in X and I haven't had a single bad boot where display resolution is incorrect after applying it. Good stuff. I think increasing the wait time to ensure nothing weird happens over LSPCON is just about the only real solution here.
Comment 23 Jani Saarinen 2018-08-16 10:54:44 UTC
Reference: https://patchwork.freedesktop.org/series/48183/
Comment 24 Jani Saarinen 2018-08-20 10:51:47 UTC
Last series: https://patchwork.freedesktop.org/series/48414/
Comment 25 Jani Nikula 2018-08-20 11:26:12 UTC
commit 59f1c8ab30d6f9042562949f42cbd3f3cf69de94
Author: Fredrik Schön <fredrikschon@gmail.com>
Date:   Fri Aug 17 22:07:28 2018 +0200

    drm/i915: Increase LSPCON timeout


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.