Created attachment 141115 [details] dmesg with the patch included Forwarding https://bugs.debian.org/906180 linux.git commit 53fd40a90f3c caused the Linux kernel cmdline EDID override mechanism to no longer override the EDID properly. I have an external monitor that used to have a valid EDID but after replugging it at one point, the EDID became empty. I recovered a copy of it from backups of the Xorg logs. To get a valid EDID I then override the empty EDID using the backup copy and the drm.edid_firmware (or drm_kms_firmware.edid_firmware) parameter. With the EDID correctly overridden, the VGA-1 EDID file looks like the following, the Linux virtual consoles are higher resolution, GNOME/xrandr are able to use the full resolution of the monitor (1680x1050) and GNOME is able to determine the make/model (it is an LG FLATRON W2242PM but GNOME says Goldstar Company Ltd 23"): $ hd /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-VGA-1/edid 00000000 00 ff ff ff ff ff ff 00 1e 6d 77 56 64 78 09 00 |.........mwVdx..| 00000010 08 13 01 03 6a 31 20 78 ea ae c5 a2 57 4a 9c 25 |....j1 x....WJ.%| 00000020 12 50 54 a7 6b 80 95 0f 95 00 81 80 81 40 71 4f |.PT.k........@qO| 00000030 01 01 01 01 01 01 7c 2e 90 a0 60 1a 1e 40 30 20 |......|...`..@0 | 00000040 36 00 da 28 11 00 00 1a 21 39 90 30 62 1a 27 40 |6..(....!9.0b.'@| 00000050 68 b0 36 00 da 28 11 00 00 1c 00 00 00 fd 00 38 |h.6..(.........8| 00000060 4b 1e 53 0f 00 0a 20 20 20 20 20 20 00 00 00 fc |K.S... ....| 00000070 00 57 32 32 34 32 0a 20 20 20 20 20 20 20 00 5d |.W2242. .]| 00000080 Without the override working correctly, the EDID file is empty, the Linux virtual consoles are lower resolution and only the standard modes (1024x768, 800x600 and 848x480) are available in xrandr/GNOME and GNOME settings is not able to determine the make/model of the monitor. In addition to using the EDID override, I can manually run some xrandr commands but that isn't as convenient and doesn't alter the resolution of the Linux virtual consoles: xrandr --output VGA-1 --newmode 1680x1050_60.0 146.25 1680 1784 1960 2240 1050 1053 1059 1089 -hsync +vsync xrandr --addmode VGA-1 1680x1050_60.0 xrandr --output VGA-1 --mode 1680x1050_60.0 xrandr --output VGA-1 --auto Adding video=VGA-1:e to the Linux kernel command-line parameters seems to fix the EDID override problem. I will attach dmesg files with the drm.debug=14 Linux kernel command-line parameter.
Created attachment 141116 [details] dmesg without the patch included
The logs reference v4.13 but commits 53fd40a90f3c ("drm: handle override and firmware EDID at drm_do_get_edid() level") and ac6c35a4d8c7 ("drm: add backwards compatibility support for drm_kms_helper.edid_firmware") were added in v4.15. The dmesg from comment #1 has [ 13.716143] drm: unknown parameter 'edid_firmware' ignored while the dmesg from comment #0 has [ 14.264421] drm_kms_helper: unknown parameter 'edid_firmware' ignored Make sure you have CONFIG_DRM_LOAD_EDID_FIRMWARE=y. Please check your config, and ensure you're running kernel and modules from a proper v4.15+ build.
You could also test to try to reproduce the error using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot. What system you have?
Note to my future self: intel_crt_detect() bypasses DDC detect if hotplug detect works.
Paul, do you still have the issue even after following Jani's instructions with latest drmtip?
I haven't yet had time to go through the instructions, sorry about that.
I'm now working on a new bisect in a more systematic way to verify my earlier results, test drm-tip and include all the requested info. This will take a while because my systems do not have SSD nor fast CPUs. Some replies to clarify some things in the meantime: In a fresh clone of linux.git, the commits are described as being based on v4.13 RCs, I guess that is why the version numbers seem strange: $ git describe --tags 53fd40a90f3c v4.13-rc5-841-g53fd40a90f3c $ git describe --tags ac6c35a4d8c7 v4.13-rc5-842-gac6c35a4d8c7 The system I am testing this on is a Lenovo X201 Tablet (laptop with touchscreen) with an Intel GPU: 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) I also have a second system with an NVIDIA GPU using nouveau. I'll test that using the same Linux builds once I am done with the laptop. 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK107 [GeForce GT 740] [10de:0fc8] (rev a1)
Created attachment 141437 [details] tarball with bisect log, configs, dmesg logs and results I have re-done the bisect in a more organised way and come up with the same result: 53fd40a90f3c0bdad86ec266ee5df833f54ace39 is the first bad commit I also built and tested drm-tip (1b18cb66428c) and ac6c35a4d8c7. I have attached a tarball containing bisect log, configs, dmesg logs and results for each of the commits with the parameters requested.
(In reply to Paul Wise from comment #7) > In a fresh clone of linux.git, the commits are described as being based on > v4.13 RCs, I guess that is why the version numbers seem strange: > > $ git describe --tags 53fd40a90f3c > v4.13-rc5-841-g53fd40a90f3c Ah. The development branch the commit was in was based on v4.13-rc5. However according to 'git tag --contains 53fd40a90f3c' it was merged to Linus' upstream in v4.15-rc1.
Paul, Do you still have the issue?
Have there been any new commits that could have fixed this? I'm not subscribed to LKML or anywhere other than this bug report but AFAICT there has not been any indication that the issue has been confirmed, investigated, cause determined or fix created.
BTW, do you need any more info from me? If not, the NEEDINFO tag needs removing.
I believe we have all the information we need at this time. Just need to figure out some way to untangle the drm_get_edid()+override vs. drm_probe_ddc() ball of yarn.
I'd like to delete the bisect results (code, kernels etc) from my local disk. Is that fine or am I likely to need them again at some point?
(In reply to Paul Wise from comment #14) > I'd like to delete the bisect results (code, kernels etc) from my local > disk. Is that fine or am I likely to need them again at some point? Go ahead. We've got the info, thanks.
Jani, Ville, any easy way to proceed or find time?
I probably don't need to point out how unacceptable it is to have functionality regressions that are lasting this long, can we get a plan for this asap?
The problem here and in the AST case [1] is that drm_helper_probe_detect() correctly reports connected, but the DDC probe in drm_get_edid() fails. Here the connected status comes from hotplug detect, in AST it's always connected. Using the DDC probe even with override/firmware EDID is useful in cases where DDC works, but the EDID data is corrupt. The idea was that one would use video=VGA-1:e to force the connector on when DDC does not work at all. In the AST case where the connector status is always connected, there is no functional regression, but it does require adding another parameter. The same with all cases where detection is based solely on DDC probe in general, and you have to use connector force anyway. In this bug, the hotplug detect works, and the connected status from that takes precedence over DDC. Which means that in this bug, the requirement to use video=VGA-1:e regresses the hotplug functionality. The override/firmware EDID mechanism was moved to the low level to transparently fix a plethora of bugs where, confusingly, both the real EDID and the override/firmware EDID were used. For example, it was impossible to override audio details. I think there's a conflicting set of valid requirements here, and the only way to preserve functionality for all cases is to add another level of connector force probe, which skips DDC probe in case of override/firmware EDID. [1] http://mid.mail-archive.com/alpine.DEB.2.20.1905262211270.24390@whs-18.cs.helsinki.fi
(In reply to Jani Nikula from comment #18) > > I think there's a conflicting set of valid requirements here, and the only > way to preserve functionality for all cases is to add another level of > connector force probe, which skips DDC probe in case of override/firmware > EDID. If I understand Jani's comment correctly, in drm_get_edid(), DDC probe should be skipped if the connector force is unspecified and EDID has to be overridden. When I analyzed the code further, I felt the check that does DDC probe if connector force is unspecified can be eliminated and will also fix this bug. I mentioned my proposed change below. I also submitted this change to trybot and it was successful. --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -1724,9 +1724,6 @@ struct edid *drm_get_edid(struct drm_connector *connector, if (connector->force == DRM_FORCE_OFF) return NULL; - if (connector->force == DRM_FORCE_UNSPECIFIED && !drm_probe_ddc(adapter)) - return NULL; - edid = drm_do_get_edid(connector, drm_do_probe_ddc_edid, adapter); if (edid) drm_get_displayid(connector, edid);
(In reply to harish.chegondi from comment #19) > - if (connector->force == DRM_FORCE_UNSPECIFIED && !drm_probe_ddc(adapter)) > - return NULL; > - > edid = drm_do_get_edid(connector, drm_do_probe_ddc_edid, adapter); I think the above if statement can be removed as all drm_probe_ddc() does is call drm_do_probe_ddc_edid(). drm_do_get_edid() in the above case first handles override EDID if necessary and calls drm_do_probe_ddc_edid() which was passed as a function pointer to drm_do_get_edid().
Paul, I have a patch that I think will fix this bug. Will you be able to check it out on your system? Thanks!
If you aren't able to test it then I can do it. It will take a while as my computer is fairly old and slow. Is the patch above what you wanted me to test?
(In reply to harish.chegondi from comment #19) > (In reply to Jani Nikula from comment #18) > > > > I think there's a conflicting set of valid requirements here, and the only > > way to preserve functionality for all cases is to add another level of > > connector force probe, which skips DDC probe in case of override/firmware > > EDID. > > If I understand Jani's comment correctly, in drm_get_edid(), DDC probe > should be skipped if the connector force is unspecified and EDID has to be > overridden. When I analyzed the code further, I felt the check that does DDC > probe if connector force is unspecified can be eliminated and will also fix > this bug. I mentioned my proposed change below. I also submitted this change > to trybot and it was successful. > > --- a/drivers/gpu/drm/drm_edid.c > +++ b/drivers/gpu/drm/drm_edid.c > @@ -1724,9 +1724,6 @@ struct edid *drm_get_edid(struct drm_connector > *connector, > if (connector->force == DRM_FORCE_OFF) > return NULL; > > - if (connector->force == DRM_FORCE_UNSPECIFIED && !drm_probe_ddc(adapter)) > - return NULL; > - > edid = drm_do_get_edid(connector, drm_do_probe_ddc_edid, adapter); > if (edid) > drm_get_displayid(connector, edid); This loses the ability to use DDC probe for detecting display, leading to excessive failures in drm_do_get_edid() if the display isn't there.
(In reply to Paul Wise from comment #22) > If you aren't able to test it then I can do it. It will take a while as my > computer is fairly old and slow. > > Is the patch above what you wanted me to test? Paul, Thanks for your offer to help me test this patch. I am currently addressing review comments to the patch. I will let you know once it is ready for you to test. Thank You!
https://patchwork.freedesktop.org/series/61702/
I am re-working the patch to address Jani's comments.
Please try these two patches: https://patchwork.freedesktop.org/series/61764/
I'm building 5.2.0-rc3 with the two patches now and will test it once the build is done. Hopefully I'll be done today.
I've tested the two patches on top of Linux v5.2-rc3 and the EDID override works correctly on an Intel Ironlake GPU connected to a monitor that lost its EDID. I'll test with an nVidia GPU & nouveau later today.
While testing I noticed a couple of things: While everything the GUI is the correct resolution, GNOME is unable to identify the monitor vendor or model. This is a regression from the previous edid override functionality. It looks like this is because the edid file in /sys is not populated with the EDID override data. I got a crash due to null pointer dereference at one point, I'll try to track down when this happens.
(In reply to Paul Wise from comment #30) > While everything the GUI is the correct resolution, GNOME is unable to > identify the monitor vendor or model. This is a regression from the > previous edid override functionality. It looks like this is because the > edid file in /sys is not populated with the EDID override data. I've sent an update that should fix this; the patchwork link above has v2 now.
I'm building and testing v2 now.
I confirmed that v2 fixed the EDID override data.
Thanks for the report and testing, sorry for taking so long to figure this out. commit 48eaeb7664c76139438724d520a1ea4a84a3ed92 Author: Jani Nikula <jani.nikula@intel.com> Date: Mon Jun 10 12:30:54 2019 +0300 drm: add fallback override/firmware EDID modes workaround
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.