When booting Fedora, Ubuntu, OpenSUSE, and really any distribution running Linux kernel 4.16 and above, the Intel integrated graphics card (in my case the Kabylake HD 630) does not correctly enable the connected monitor's native resolution over HDMI on LSPCON. Any kernel 4.15 and below does not cause this issue. This issue is currently confirmed but lacks an actual kernel bug report on the Redhat bugzilla here: https://bugzilla.redhat.com/show_bug.cgi?id=1570392 dmesg logs show: [drm:intel_dp_get_link_train_fallback_values [i915]] *ERROR* Link Training Unsuccessful Followed by reapeating lines of: [drm:lspcon_wait_mode [i915]] *ERROR* LSPCON mode hasn't settled In particular, 1280x1800 and 1680x1050 are the two resolutions I find myself limited to when the kernel boots and doesn't correctly use the full 1920x1080 native resolution of the monitor (in this case, a Samsung CF591). This issue does not, however, appear related to this particular monitor as two others I have tried end up with the same issue. Nor is it a cable problem, I switched that out several times as well. After much testing, the issue seems very non-deterministic and fairly random indeed. Grub and Plymouth don't appear to be culprits either. Basically, the kernel isn't correctly setting full HD resolution of the monitor around *half* the time. Booting Fedora 28 or Ubuntu 18.04 with kernel version 4.16 or newer on my machine with Intel graphics using the HDMI LSPCON connector is like tossing a die: I either get Full HD, those two variants of stretched and incorrect low resolution, or (rarely, but still) failure to boot at all. After some confusion, I determined that the HDMI option board on the system motherboard of the HP Elitedesk 800 G3 DM is, in fact, using an LSPCON converter from DP->HDMI using this helpful info from Intel i915 dev Imre Deak: "There are two ways to connect HDMI to the APL RVPs: via the DDI1 DP++ plug with an DP->HDMI dongle, or via the DDI0 HDMI plug which is connected to the SoC through the LSPCON converter. You seem to be using the second scenario with LSPCON being in the protocol converter mode (configured as such by BIOS). In that case the connection will show up as a DP connector." Other users on machines including Intel NUCs have confirmed this issue with HDMI over LSPCON resulting in incorrect resolution on boot. It appears a regression occurred in LSPCON handling through kernel 4.16 and above.
Could you try if the problem is present in the latest drm-tip kernel: git://anongit.freedesktop.org/drm-tip (drm-tip branch) Please attach a full dmesg log booting with this kernel and the drm.debug=0x1e kernel parameter. Also could you try bisecting the problem between 4.15 and 4.16 kernel versions?
Huh, after 4.17.7 I cannot replicate the problem. 4.18-rc8 as well as drm-tip (built from today 08/09) are both okay. I rebooted repeatedly selecting drm-tip for around half an hour (seriously) and not a single instance did this strange screen-resolution issue occur. Bisecting the kernel should no longer be necessary, it appears the problem is resolved in drm-tip and the latest release candidate build of 4.18. I will mark this issue as resolved for now. If anyone manages to replicate the bug in drm-tip or in the latest rc kernel, please provide Imre Deak on https://bugs.freedesktop.org/show_bug.cgi?id=107503 with a full dmesg log from boot with the "drm.debug=0x1e log_buf_len=10M" kernel parameters. I am now marking this bug as resolved/fixed.
Thank you for verifying. Closing.
Created attachment 141030 [details] dmesg-drm-tip Well, it seems I spoke too soon. This issue is *very* much still present on drm-tip, but for some odd reason I could not reproduce it with drm.debug=0x1e. Fredrik from https://bugzilla.redhat.com/show_bug.cgi?id=1570392 was also unable to reproduce the bug using the "drm.debug=0x1e" kernel parameter. Without using the debugging parameter, the incorrect native resolution bug most certainly happens. -- Occurrence: Happens roughly over a third of the time, resolution is incorrect on boot. -- Chipset: Intel® Core™ i7-7700T CPU using Intel® HD Graphics 630 (Kaby Lake GT2) -- System architecture: x86_64 -- Kernel version (drm-tip) 4.18.0-994-generic -- Linux distribution: Ubuntu 18.04.1 LTS (also confirmed on Fedora 28, appears distro-independent) -- Machine: HP Elitedesk 800 G3 DM 35W -- Display connector: LSPCON HDMI I have attached the dmesg from boot without the drm.debug=0x1e kernel parameter, I will keep trying to see if I can replicate the bug with this parameter to get the full debugging dmesg. Here you can clearly see in the dmesg sans debugging parameter the lines: [drm:lspcon_wait_mode [i915]] *ERROR* LSPCON mode hasn't settled [drm:intel_dp_get_link_train_fallback_values [i915]] *ERROR* Link Training Unsuccessful
Created attachment 141031 [details] bad-boot-xrandr I am also attaching the output of xrandr --verbose from a 'good' boot (correct native full HD resolution) and a 'bad' boot (incorrect 1280x800 resolution)
Created attachment 141032 [details] good-boot-xrandr
Fredrik made a good point about why we might be having difficulties reproducing the bug with the drm.debug=0x1e parameter: (In reply to Hector Martin from comment #21) > If the issue is timing-related (which is quite likely given the randomness > and the error message associated with it) then it's quite possible that > enabling additional debugging might slow things down enough to stop it from > happening. This very well might be a timing related bug, as the drm.debug=0x1e parameter writes hundreds of lines of feedback to the kernel log very quickly. I can confirm this bug is very much indeed still a problem. I have posted my dmesg sans the debugging parameter as well as the output of xrandr --verbose from a good and bad boot. The bug is *very* reproducible without adding any extra kernel parameters on Fedora 28 and Ubuntu 18.04. I will do my best in attempting to find at which point the bug started happening,
...although bisecting the kernel to determine what is wrong, especially when this bug could likely be a timing-related non-deterministic event, may be largely beyond my capabilities unfortunately.
Okay, I can confirm that this bug exists on and started with 4.16-rc1, and cannot be replicated with the any of the 4.15 series up to and including 4.15.18. I'm afraid I cannot pinpoint what exactly went wrong between 4.15.x and 4.16-rc1, as the sheer volume of changes for the first release candidate build alone are massive. I would appreciate if someone could assist and look into this further, as it does constitute a major blocking-level problem and remains present even on drm-tip.
Can you also attach output of /sys/kernel/debug/dri/0/i915_display_info?
Created attachment 141045 [details] i915_display_info (bad boot 1280x800)
Created attachment 141046 [details] i915_display_info (bad boot 1680x1050)
Created attachment 141047 [details] i915_display_info (good boot 1920x1080)
Okay, I have attached the output of /sys/kernel/debug/dri/0/i915_display_info for all three cases here: for a 'good boot' where full 1920x1080 native resolution is correctly used and for the two cases of 'bad boots' where the non-native resolutions 1280x800 and 1680x1050 are used. I am currently seeing this issue happen actually closer to *half* of all boots using HDMI LSPCON even on drm-tip, so this does constitute a fairly severe problem.
Even worse, the problem happens and is irreversible after a 'good boot' if xrandr is used to change atomic settings like "Broadcast RGB" as in the following (note how five display modes completely disappear): $ xrandr Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 8192 x 8192 DP-1 disconnected (normal left inverted right x axis y axis) HDMI-1 disconnected (normal left inverted right x axis y axis) DP-2 disconnected (normal left inverted right x axis y axis) HDMI-2 disconnected (normal left inverted right x axis y axis) DP-3 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 598mm x 336mm 1920x1080 60.00*+ 50.00 59.94 1680x1050 59.88 1600x900 60.00 1280x1024 60.02 1440x900 59.90 1280x800 59.91 1280x720 60.00 50.00 59.94 1024x768 70.07 60.00 800x600 72.19 60.32 56.25 720x576 50.00 720x480 60.00 59.94 640x480 66.67 60.00 59.94 720x400 70.08 $ xrandr --output DP-3 --set "Broadcast RGB" "Full" X Error of failed request: BadMatch (invalid parameter attributes) Major opcode of failed request: 140 (RANDR) Minor opcode of failed request: 21 (RRSetCrtcConfig) Serial number of failed request: 61 Current serial number in output stream: 61 $ xrandr Screen 0: minimum 320 x 200, current 1280 x 800, maximum 8192 x 8192 DP-1 disconnected (normal left inverted right x axis y axis) HDMI-1 disconnected (normal left inverted right x axis y axis) DP-2 disconnected (normal left inverted right x axis y axis) HDMI-2 disconnected (normal left inverted right x axis y axis) DP-3 connected primary 1280x800+0+0 (normal left inverted right x axis y axis) 598mm x 336mm 1280x800 59.91* 1024x768 60.00 800x600 72.19 60.32 56.25 720x576 50.00 720x480 60.00 59.94 640x480 66.67 60.00 59.94 720x400 70.08 $ xrandr --output DP-3 --mode 1920x1080 xrandr: cannot find mode 1920x1080
Interestingly, this behavior does not always happen, suggesting that it may be a timing issue indeed. The seemingly random nature of it is confounding. Using the kernel parameter "drm.debug=0x1e", this behavior cannot be replicated for some reason. Booting without that parameter, the issue is very reliably reproduced, much closer to (if not over) half of all boots on 4.16-rc1 all the way to drm-tip.
Created attachment 141049 [details] [review] Add a retry loop to lspcon_wait_mode Patch attached. It probably needs both testing and polish, but it seems to work for me.
I have attempted an analysis of this bug at https://bugzilla.redhat.com/show_bug.cgi?id=1570392#c25
Created attachment 141051 [details] [review] Increase timeout in lspcon_wait_mode So I overlooked this much simpler fix. 7/7 boots OK, no errors logged.
Fredrik, can you confirm whether changing xrandr settings after after applying your patch and successfully booting in the correct resolution results in display modes disappearing permanently for the connection? Like ending up on 1280x800 or 1680x1050? See comment #15, try flipping through settings repeatedly like color range or refresh rate. Because if that happens after your patch, we have a bigger problem on our hands that may be a result of underlying kernel changes in completely different places than the i915 module. If so, we have a mysterious and particularly nasty case which could involve various parts of the Intel graphics stack and/or changes in the kernel.
Confirmed. I can reproduce your testcase on my machine. schon@localhost ~]$ xrandr Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 8192 x 8192 DP-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 521mm x 293mm 1920x1080 60.00*+ 50.00 59.94 1680x1050 59.88 1600x900 60.00 1280x1024 60.02 1440x900 59.90 1280x800 59.91 1280x720 60.00 50.00 59.94 1024x768 70.07 60.00 800x600 72.19 60.32 56.25 720x576 50.00 720x480 60.00 59.94 640x480 72.81 66.67 60.00 59.94 720x400 70.08 DP-2 disconnected (normal left inverted right x axis y axis) HDMI-1 disconnected (normal left inverted right x axis y axis) DP-3 disconnected (normal left inverted right x axis y axis) HDMI-2 disconnected (normal left inverted right x axis y axis) [schon@localhost ~]$ xrandr | wc -l 19 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full" X Error of failed request: BadMatch (invalid parameter attributes) Major opcode of failed request: 139 (RANDR) Minor opcode of failed request: 21 (RRSetCrtcConfig) Serial number of failed request: 61 Current serial number in output stream: 61 [schon@localhost ~]$ xrandr | wc -l 18 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Automatic" [schon@localhost ~]$ xrandr | wc -l 18 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full" [schon@localhost ~]$ xrandr | wc -l 18 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Automatic" X Error of failed request: BadMatch (invalid parameter attributes) Major opcode of failed request: 139 (RANDR) Minor opcode of failed request: 21 (RRSetCrtcConfig) Serial number of failed request: 61 Current serial number in output stream: 61 [schon@localhost ~]$ xrandr | wc -l 13 [schon@localhost ~]$ With the increased timeout patch applied modes are no longer dropped. [schon@localhost ~]$ xrandr Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 8192 x 8192 DP-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 521mm x 293mm 1920x1080 60.00*+ 50.00 59.94 1680x1050 59.88 1600x900 60.00 1280x1024 60.02 1440x900 59.90 1280x800 59.91 1280x720 60.00 50.00 59.94 1024x768 70.07 60.00 800x600 72.19 60.32 56.25 720x576 50.00 720x480 60.00 59.94 640x480 72.81 66.67 60.00 59.94 720x400 70.08 DP-2 disconnected (normal left inverted right x axis y axis) HDMI-1 disconnected (normal left inverted right x axis y axis) DP-3 disconnected (normal left inverted right x axis y axis) HDMI-2 disconnected (normal left inverted right x axis y axis) [schon@localhost ~]$ xrandr | wc -l 19 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full" [schon@localhost ~]$ xrandr | wc -l 19 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Automatic" [schon@localhost ~]$ xrandr | wc -l 19 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full" [schon@localhost ~]$ xrandr | wc -l 19 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Automatic" [schon@localhost ~]$ xrandr | wc -l 19 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full" [schon@localhost ~]$ xrandr | wc -l 19 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Automatic" [schon@localhost ~]$ xrandr | wc -l 19 [schon@localhost ~]$ xrandr --output DP-1 --set "Broadcast RGB" "Full" [schon@localhost ~]$ xrandr | wc -l 19 [schon@localhost ~]$
Ah, Fredrik, your patch works well for me! No weird stuff in X and I haven't had a single bad boot where display resolution is incorrect after applying it. Good stuff. I think increasing the wait time to ensure nothing weird happens over LSPCON is just about the only real solution here.
Reference: https://patchwork.freedesktop.org/series/48183/
Last series: https://patchwork.freedesktop.org/series/48414/
commit 59f1c8ab30d6f9042562949f42cbd3f3cf69de94 Author: Fredrik Schön <fredrikschon@gmail.com> Date: Fri Aug 17 22:07:28 2018 +0200 drm/i915: Increase LSPCON timeout
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.