Bug 107583 - Linux: REGRESSION: drm: cmdline EDID override mechanisms broken since 4.15
Summary: Linux: REGRESSION: drm: cmdline EDID override mechanisms broken since 4.15
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high normal
Assignee: harish.chegondi
QA Contact: Intel GFX Bugs mailing list
URL: https://bugs.debian.org/906180
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-15 15:51 UTC by Paul Wise
Modified: 2019-06-12 10:51 UTC (History)
4 users (show)

See Also:
i915 platform: ALL
i915 features: display/Other


Attachments
dmesg with the patch included (100.96 KB, text/plain)
2018-08-15 15:51 UTC, Paul Wise
no flags Details
dmesg without the patch included (105.66 KB, text/plain)
2018-08-15 15:52 UTC, Paul Wise
no flags Details
tarball with bisect log, configs, dmesg logs and results (420.34 KB, application/x-xz)
2018-09-04 06:45 UTC, Paul Wise
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Wise 2018-08-15 15:51:59 UTC
Created attachment 141115 [details]
dmesg with the patch included

Forwarding https://bugs.debian.org/906180

linux.git commit 53fd40a90f3c caused the Linux kernel cmdline EDID override mechanism to no longer override the EDID properly.

I have an external monitor that used to have a valid EDID but after replugging it at one point, the EDID became empty. I recovered a copy of it from backups of the Xorg logs. To get a valid EDID I then override the empty EDID using the backup copy and the drm.edid_firmware (or drm_kms_firmware.edid_firmware) parameter.

With the EDID correctly overridden, the VGA-1 EDID file looks like the following, the Linux virtual consoles are higher resolution, GNOME/xrandr are able to use the full resolution of the monitor (1680x1050) and GNOME is able to determine the make/model (it is an LG FLATRON W2242PM but GNOME says Goldstar Company Ltd 23"):

$ hd /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-VGA-1/edid
00000000  00 ff ff ff ff ff ff 00  1e 6d 77 56 64 78 09 00  |.........mwVdx..|
00000010  08 13 01 03 6a 31 20 78  ea ae c5 a2 57 4a 9c 25  |....j1 x....WJ.%|
00000020  12 50 54 a7 6b 80 95 0f  95 00 81 80 81 40 71 4f  |.PT.k........@qO|
00000030  01 01 01 01 01 01 7c 2e  90 a0 60 1a 1e 40 30 20  |......|...`..@0 |
00000040  36 00 da 28 11 00 00 1a  21 39 90 30 62 1a 27 40  |6..(....!9.0b.'@|
00000050  68 b0 36 00 da 28 11 00  00 1c 00 00 00 fd 00 38  |h.6..(.........8|
00000060  4b 1e 53 0f 00 0a 20 20  20 20 20 20 00 00 00 fc  |K.S...      ....|
00000070  00 57 32 32 34 32 0a 20  20 20 20 20 20 20 00 5d  |.W2242.       .]|
00000080

Without the override working correctly, the EDID file is empty, the Linux virtual consoles are lower resolution and only the standard modes (1024x768, 800x600 and 848x480) are available in xrandr/GNOME and GNOME settings is not able to determine the make/model of the monitor.

In addition to using the EDID override, I can manually run some xrandr commands but that isn't as convenient and doesn't alter the resolution of the Linux virtual consoles:

xrandr  --output VGA-1 --newmode 1680x1050_60.0  146.25  1680 1784 1960 2240  1050 1053 1059 1089 -hsync +vsync
xrandr --addmode VGA-1 1680x1050_60.0
xrandr --output VGA-1 --mode 1680x1050_60.0
xrandr --output VGA-1 --auto

Adding video=VGA-1:e to the Linux kernel command-line parameters seems to fix the EDID override problem.

I will attach dmesg files with the drm.debug=14 Linux kernel command-line parameter.
Comment 1 Paul Wise 2018-08-15 15:52:32 UTC
Created attachment 141116 [details]
dmesg without the patch included
Comment 2 Jani Nikula 2018-08-16 06:33:39 UTC
The logs reference v4.13 but commits 53fd40a90f3c ("drm: handle override and firmware EDID at drm_do_get_edid() level") and ac6c35a4d8c7 ("drm: add backwards compatibility support for drm_kms_helper.edid_firmware") were added in v4.15.

The dmesg from comment #1 has 
[   13.716143] drm: unknown parameter 'edid_firmware' ignored
while the dmesg from comment #0 has
[   14.264421] drm_kms_helper: unknown parameter 'edid_firmware' ignored

Make sure you have CONFIG_DRM_LOAD_EDID_FIRMWARE=y.

Please check your config, and ensure you're running kernel and modules from a proper v4.15+ build.
Comment 3 Jani Saarinen 2018-08-16 06:55:01 UTC
You could also test to try to reproduce the error using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot.

What system you have?
Comment 4 Jani Nikula 2018-08-16 07:21:10 UTC
Note to my future self: intel_crt_detect() bypasses DDC detect if hotplug detect works.
Comment 5 Lakshmi 2018-08-30 07:16:34 UTC
Paul, do you still have the issue even after following Jani's instructions with latest drmtip?
Comment 6 Paul Wise 2018-08-30 07:19:39 UTC
I haven't yet had time to go through the instructions, sorry about that.
Comment 7 Paul Wise 2018-09-02 09:02:30 UTC
I'm now working on a new bisect in a more systematic way to verify my earlier results, test drm-tip and include all the requested info. This will take a while because my systems do not have SSD nor fast CPUs.

Some replies to clarify some things in the meantime:

In a fresh clone of linux.git, the commits are described as being based on v4.13 RCs, I guess that is why the version numbers seem strange:

$ git describe --tags 53fd40a90f3c
v4.13-rc5-841-g53fd40a90f3c
$ git describe --tags ac6c35a4d8c7
v4.13-rc5-842-gac6c35a4d8c7

The system I am testing this on is a Lenovo X201 Tablet (laptop with touchscreen) with an Intel GPU:

00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02)

I also have a second system with an NVIDIA GPU using nouveau. I'll test that using the same Linux builds once I am done with the laptop.

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK107 [GeForce GT 740] [10de:0fc8] (rev a1)
Comment 8 Paul Wise 2018-09-04 06:45:37 UTC
Created attachment 141437 [details]
tarball with bisect log, configs, dmesg logs and results

I have re-done the bisect in a more organised way and come up with the same result:

53fd40a90f3c0bdad86ec266ee5df833f54ace39 is the first bad commit

I also built and tested drm-tip (1b18cb66428c) and ac6c35a4d8c7.

I have attached a tarball containing bisect log, configs, dmesg logs and results for each of the commits with the parameters requested.
Comment 9 Jani Nikula 2018-09-04 08:38:32 UTC
(In reply to Paul Wise from comment #7)
> In a fresh clone of linux.git, the commits are described as being based on
> v4.13 RCs, I guess that is why the version numbers seem strange:
> 
> $ git describe --tags 53fd40a90f3c
> v4.13-rc5-841-g53fd40a90f3c

Ah. The development branch the commit was in was based on v4.13-rc5. However according to 'git tag --contains 53fd40a90f3c' it was merged to Linus' upstream in v4.15-rc1.
Comment 10 Lakshmi 2018-09-26 08:10:51 UTC
Paul, Do you still have the issue?
Comment 11 Paul Wise 2018-09-26 15:53:13 UTC
Have there been any new commits that could have fixed this?

I'm not subscribed to LKML or anywhere other than this bug report but AFAICT there has not been any indication that the issue has been confirmed, investigated, cause determined or fix created.
Comment 12 Paul Wise 2018-09-26 15:58:44 UTC
BTW, do you need any more info from me? If not, the NEEDINFO tag needs removing.
Comment 13 Ville Syrjala 2018-09-27 12:34:31 UTC
I believe we have all the information we need at this time. Just need to figure out some way to untangle the drm_get_edid()+override vs. drm_probe_ddc() ball of yarn.
Comment 14 Paul Wise 2018-12-21 23:50:45 UTC
I'd like to delete the bisect results (code, kernels etc) from my local disk. Is that fine or am I likely to need them again at some point?
Comment 15 Jani Nikula 2018-12-28 09:32:20 UTC
(In reply to Paul Wise from comment #14)
> I'd like to delete the bisect results (code, kernels etc) from my local
> disk. Is that fine or am I likely to need them again at some point?

Go ahead. We've got the info, thanks.
Comment 16 Jani Saarinen 2019-04-22 14:44:21 UTC
Jani, Ville, any easy way to proceed or find time?
Comment 17 Dave Airlie 2019-05-27 19:51:21 UTC
I probably don't need to point out how unacceptable it is to have functionality regressions that are lasting this long, can we get a plan for this asap?
Comment 18 Jani Nikula 2019-05-28 10:04:21 UTC
The problem here and in the AST case [1] is that drm_helper_probe_detect() correctly reports connected, but the DDC probe in drm_get_edid() fails. Here the connected status comes from hotplug detect, in AST it's always connected.

Using the DDC probe even with override/firmware EDID is useful in cases where DDC works, but the EDID data is corrupt.

The idea was that one would use video=VGA-1:e to force the connector on when DDC does not work at all. In the AST case where the connector status is always connected, there is no functional regression, but it does require adding another parameter. The same with all cases where detection is based solely on DDC probe in general, and you have to use connector force anyway.

In this bug, the hotplug detect works, and the connected status from that takes precedence over DDC. Which means that in this bug, the requirement to use video=VGA-1:e regresses the hotplug functionality.

The override/firmware EDID mechanism was moved to the low level to transparently fix a plethora of bugs where, confusingly, both the real EDID and the override/firmware EDID were used. For example, it was impossible to override audio details.

I think there's a conflicting set of valid requirements here, and the only way to preserve functionality for all cases is to add another level of connector force probe, which skips DDC probe in case of override/firmware EDID.

[1] http://mid.mail-archive.com/alpine.DEB.2.20.1905262211270.24390@whs-18.cs.helsinki.fi
Comment 19 harish.chegondi 2019-06-01 07:28:44 UTC
(In reply to Jani Nikula from comment #18)
> 
> I think there's a conflicting set of valid requirements here, and the only
> way to preserve functionality for all cases is to add another level of
> connector force probe, which skips DDC probe in case of override/firmware
> EDID.

If I understand Jani's comment correctly, in drm_get_edid(), DDC probe should be skipped if the connector force is unspecified and EDID has to be overridden. When I analyzed the code further, I felt the check that does DDC probe if connector force is unspecified can be eliminated and will also fix this bug. I mentioned my proposed change below. I also submitted this change to trybot and it was successful.

--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -1724,9 +1724,6 @@ struct edid *drm_get_edid(struct drm_connector *connector,
	if (connector->force == DRM_FORCE_OFF)
 		return NULL;
 
-	if (connector->force == DRM_FORCE_UNSPECIFIED && !drm_probe_ddc(adapter))
-		return NULL;
-
 	edid = drm_do_get_edid(connector, drm_do_probe_ddc_edid, adapter);
 	if (edid)
 		drm_get_displayid(connector, edid);
Comment 20 harish.chegondi 2019-06-01 07:38:13 UTC
(In reply to harish.chegondi from comment #19)
> -	if (connector->force == DRM_FORCE_UNSPECIFIED && !drm_probe_ddc(adapter))
> -		return NULL;
> -
>  	edid = drm_do_get_edid(connector, drm_do_probe_ddc_edid, adapter);

I think the above if statement can be removed as all drm_probe_ddc() does is call drm_do_probe_ddc_edid(). drm_do_get_edid() in the above case first handles override EDID if necessary and calls drm_do_probe_ddc_edid() which was passed as a function pointer to drm_do_get_edid().
Comment 21 harish.chegondi 2019-06-04 06:21:21 UTC
Paul,

I have a patch that I think will fix this bug. Will you be able to check it out on your system?

Thanks!
Comment 22 Paul Wise 2019-06-04 06:29:30 UTC
If you aren't able to test it then I can do it. It will take a while as my computer is fairly old and slow.

Is the patch above what you wanted me to test?
Comment 23 Jani Nikula 2019-06-04 07:30:52 UTC
(In reply to harish.chegondi from comment #19)
> (In reply to Jani Nikula from comment #18)
> > 
> > I think there's a conflicting set of valid requirements here, and the only
> > way to preserve functionality for all cases is to add another level of
> > connector force probe, which skips DDC probe in case of override/firmware
> > EDID.
> 
> If I understand Jani's comment correctly, in drm_get_edid(), DDC probe
> should be skipped if the connector force is unspecified and EDID has to be
> overridden. When I analyzed the code further, I felt the check that does DDC
> probe if connector force is unspecified can be eliminated and will also fix
> this bug. I mentioned my proposed change below. I also submitted this change
> to trybot and it was successful.
> 
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -1724,9 +1724,6 @@ struct edid *drm_get_edid(struct drm_connector
> *connector,
> 	if (connector->force == DRM_FORCE_OFF)
>  		return NULL;
>  
> -	if (connector->force == DRM_FORCE_UNSPECIFIED && !drm_probe_ddc(adapter))
> -		return NULL;
> -
>  	edid = drm_do_get_edid(connector, drm_do_probe_ddc_edid, adapter);
>  	if (edid)
>  		drm_get_displayid(connector, edid);

This loses the ability to use DDC probe for detecting display, leading to excessive failures in drm_do_get_edid() if the display isn't there.
Comment 24 harish.chegondi 2019-06-07 01:09:50 UTC
(In reply to Paul Wise from comment #22)
> If you aren't able to test it then I can do it. It will take a while as my
> computer is fairly old and slow.
> 
> Is the patch above what you wanted me to test?

Paul, Thanks for your offer to help me test this patch. I am currently addressing review comments to the patch. I will let you know once it is ready for you to test. Thank You!
Comment 25 harish.chegondi 2019-06-07 01:10:05 UTC
https://patchwork.freedesktop.org/series/61702/
Comment 26 harish.chegondi 2019-06-07 01:10:56 UTC
I am re-working the patch to address Jani's comments.
Comment 27 Jani Nikula 2019-06-07 11:03:50 UTC
Please try these two patches: https://patchwork.freedesktop.org/series/61764/
Comment 28 Paul Wise 2019-06-08 01:03:18 UTC
I'm building 5.2.0-rc3 with the two patches now and will test it once the build is done. Hopefully I'll be done today.
Comment 29 Paul Wise 2019-06-08 05:13:16 UTC
I've tested the two patches on top of Linux v5.2-rc3 and the EDID override works correctly on an Intel Ironlake GPU connected to a monitor that lost its EDID. I'll test with an nVidia GPU & nouveau later today.
Comment 30 Paul Wise 2019-06-08 05:49:38 UTC
While testing I noticed a couple of things:

While everything the GUI is the correct resolution, GNOME is unable to
identify the monitor vendor or model. This is a regression from the
previous edid override functionality. It looks like this is because the
edid file in /sys is not populated with the EDID override data.

I got a crash due to null pointer dereference at one point, I'll try to
track down when this happens.
Comment 31 Jani Nikula 2019-06-10 09:35:28 UTC
(In reply to Paul Wise from comment #30)
> While everything the GUI is the correct resolution, GNOME is unable to
> identify the monitor vendor or model. This is a regression from the
> previous edid override functionality. It looks like this is because the
> edid file in /sys is not populated with the EDID override data.

I've sent an update that should fix this; the patchwork link above has v2 now.
Comment 32 Paul Wise 2019-06-11 03:35:40 UTC
I'm building and testing v2 now.
Comment 33 Paul Wise 2019-06-11 08:29:29 UTC
I confirmed that v2 fixed the EDID override data.
Comment 34 Jani Nikula 2019-06-12 10:51:32 UTC
Thanks for the report and testing, sorry for taking so long to figure this out.

commit 48eaeb7664c76139438724d520a1ea4a84a3ed92
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Mon Jun 10 12:30:54 2019 +0300

    drm: add fallback override/firmware EDID modes workaround


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.