Summary: | Linux: REGRESSION: drm: cmdline EDID override mechanisms broken since 4.15 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Paul Wise <pabs3> | ||||||||
Component: | DRM/Intel | Assignee: | harish.chegondi | ||||||||
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||
Severity: | normal | ||||||||||
Priority: | high | CC: | harish.chegondi, intel-gfx-bugs, jani.nikula, lakshminarayana.vudum | ||||||||
Version: | unspecified | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
URL: | https://bugs.debian.org/906180 | ||||||||||
Whiteboard: | Triaged, ReadyForDev | ||||||||||
i915 platform: | ALL | i915 features: | display/Other | ||||||||
Attachments: |
|
Description
Paul Wise
2018-08-15 15:51:59 UTC
Created attachment 141116 [details]
dmesg without the patch included
The logs reference v4.13 but commits 53fd40a90f3c ("drm: handle override and firmware EDID at drm_do_get_edid() level") and ac6c35a4d8c7 ("drm: add backwards compatibility support for drm_kms_helper.edid_firmware") were added in v4.15. The dmesg from comment #1 has [ 13.716143] drm: unknown parameter 'edid_firmware' ignored while the dmesg from comment #0 has [ 14.264421] drm_kms_helper: unknown parameter 'edid_firmware' ignored Make sure you have CONFIG_DRM_LOAD_EDID_FIRMWARE=y. Please check your config, and ensure you're running kernel and modules from a proper v4.15+ build. You could also test to try to reproduce the error using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot. What system you have? Note to my future self: intel_crt_detect() bypasses DDC detect if hotplug detect works. Paul, do you still have the issue even after following Jani's instructions with latest drmtip? I haven't yet had time to go through the instructions, sorry about that. I'm now working on a new bisect in a more systematic way to verify my earlier results, test drm-tip and include all the requested info. This will take a while because my systems do not have SSD nor fast CPUs. Some replies to clarify some things in the meantime: In a fresh clone of linux.git, the commits are described as being based on v4.13 RCs, I guess that is why the version numbers seem strange: $ git describe --tags 53fd40a90f3c v4.13-rc5-841-g53fd40a90f3c $ git describe --tags ac6c35a4d8c7 v4.13-rc5-842-gac6c35a4d8c7 The system I am testing this on is a Lenovo X201 Tablet (laptop with touchscreen) with an Intel GPU: 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) I also have a second system with an NVIDIA GPU using nouveau. I'll test that using the same Linux builds once I am done with the laptop. 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK107 [GeForce GT 740] [10de:0fc8] (rev a1) Created attachment 141437 [details]
tarball with bisect log, configs, dmesg logs and results
I have re-done the bisect in a more organised way and come up with the same result:
53fd40a90f3c0bdad86ec266ee5df833f54ace39 is the first bad commit
I also built and tested drm-tip (1b18cb66428c) and ac6c35a4d8c7.
I have attached a tarball containing bisect log, configs, dmesg logs and results for each of the commits with the parameters requested.
(In reply to Paul Wise from comment #7) > In a fresh clone of linux.git, the commits are described as being based on > v4.13 RCs, I guess that is why the version numbers seem strange: > > $ git describe --tags 53fd40a90f3c > v4.13-rc5-841-g53fd40a90f3c Ah. The development branch the commit was in was based on v4.13-rc5. However according to 'git tag --contains 53fd40a90f3c' it was merged to Linus' upstream in v4.15-rc1. Paul, Do you still have the issue? Have there been any new commits that could have fixed this? I'm not subscribed to LKML or anywhere other than this bug report but AFAICT there has not been any indication that the issue has been confirmed, investigated, cause determined or fix created. BTW, do you need any more info from me? If not, the NEEDINFO tag needs removing. I believe we have all the information we need at this time. Just need to figure out some way to untangle the drm_get_edid()+override vs. drm_probe_ddc() ball of yarn. I'd like to delete the bisect results (code, kernels etc) from my local disk. Is that fine or am I likely to need them again at some point? (In reply to Paul Wise from comment #14) > I'd like to delete the bisect results (code, kernels etc) from my local > disk. Is that fine or am I likely to need them again at some point? Go ahead. We've got the info, thanks. Jani, Ville, any easy way to proceed or find time? I probably don't need to point out how unacceptable it is to have functionality regressions that are lasting this long, can we get a plan for this asap? The problem here and in the AST case [1] is that drm_helper_probe_detect() correctly reports connected, but the DDC probe in drm_get_edid() fails. Here the connected status comes from hotplug detect, in AST it's always connected. Using the DDC probe even with override/firmware EDID is useful in cases where DDC works, but the EDID data is corrupt. The idea was that one would use video=VGA-1:e to force the connector on when DDC does not work at all. In the AST case where the connector status is always connected, there is no functional regression, but it does require adding another parameter. The same with all cases where detection is based solely on DDC probe in general, and you have to use connector force anyway. In this bug, the hotplug detect works, and the connected status from that takes precedence over DDC. Which means that in this bug, the requirement to use video=VGA-1:e regresses the hotplug functionality. The override/firmware EDID mechanism was moved to the low level to transparently fix a plethora of bugs where, confusingly, both the real EDID and the override/firmware EDID were used. For example, it was impossible to override audio details. I think there's a conflicting set of valid requirements here, and the only way to preserve functionality for all cases is to add another level of connector force probe, which skips DDC probe in case of override/firmware EDID. [1] http://mid.mail-archive.com/alpine.DEB.2.20.1905262211270.24390@whs-18.cs.helsinki.fi (In reply to Jani Nikula from comment #18) > > I think there's a conflicting set of valid requirements here, and the only > way to preserve functionality for all cases is to add another level of > connector force probe, which skips DDC probe in case of override/firmware > EDID. If I understand Jani's comment correctly, in drm_get_edid(), DDC probe should be skipped if the connector force is unspecified and EDID has to be overridden. When I analyzed the code further, I felt the check that does DDC probe if connector force is unspecified can be eliminated and will also fix this bug. I mentioned my proposed change below. I also submitted this change to trybot and it was successful. --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -1724,9 +1724,6 @@ struct edid *drm_get_edid(struct drm_connector *connector, if (connector->force == DRM_FORCE_OFF) return NULL; - if (connector->force == DRM_FORCE_UNSPECIFIED && !drm_probe_ddc(adapter)) - return NULL; - edid = drm_do_get_edid(connector, drm_do_probe_ddc_edid, adapter); if (edid) drm_get_displayid(connector, edid); (In reply to harish.chegondi from comment #19) > - if (connector->force == DRM_FORCE_UNSPECIFIED && !drm_probe_ddc(adapter)) > - return NULL; > - > edid = drm_do_get_edid(connector, drm_do_probe_ddc_edid, adapter); I think the above if statement can be removed as all drm_probe_ddc() does is call drm_do_probe_ddc_edid(). drm_do_get_edid() in the above case first handles override EDID if necessary and calls drm_do_probe_ddc_edid() which was passed as a function pointer to drm_do_get_edid(). Paul, I have a patch that I think will fix this bug. Will you be able to check it out on your system? Thanks! If you aren't able to test it then I can do it. It will take a while as my computer is fairly old and slow. Is the patch above what you wanted me to test? (In reply to harish.chegondi from comment #19) > (In reply to Jani Nikula from comment #18) > > > > I think there's a conflicting set of valid requirements here, and the only > > way to preserve functionality for all cases is to add another level of > > connector force probe, which skips DDC probe in case of override/firmware > > EDID. > > If I understand Jani's comment correctly, in drm_get_edid(), DDC probe > should be skipped if the connector force is unspecified and EDID has to be > overridden. When I analyzed the code further, I felt the check that does DDC > probe if connector force is unspecified can be eliminated and will also fix > this bug. I mentioned my proposed change below. I also submitted this change > to trybot and it was successful. > > --- a/drivers/gpu/drm/drm_edid.c > +++ b/drivers/gpu/drm/drm_edid.c > @@ -1724,9 +1724,6 @@ struct edid *drm_get_edid(struct drm_connector > *connector, > if (connector->force == DRM_FORCE_OFF) > return NULL; > > - if (connector->force == DRM_FORCE_UNSPECIFIED && !drm_probe_ddc(adapter)) > - return NULL; > - > edid = drm_do_get_edid(connector, drm_do_probe_ddc_edid, adapter); > if (edid) > drm_get_displayid(connector, edid); This loses the ability to use DDC probe for detecting display, leading to excessive failures in drm_do_get_edid() if the display isn't there. (In reply to Paul Wise from comment #22) > If you aren't able to test it then I can do it. It will take a while as my > computer is fairly old and slow. > > Is the patch above what you wanted me to test? Paul, Thanks for your offer to help me test this patch. I am currently addressing review comments to the patch. I will let you know once it is ready for you to test. Thank You! I am re-working the patch to address Jani's comments. Please try these two patches: https://patchwork.freedesktop.org/series/61764/ I'm building 5.2.0-rc3 with the two patches now and will test it once the build is done. Hopefully I'll be done today. I've tested the two patches on top of Linux v5.2-rc3 and the EDID override works correctly on an Intel Ironlake GPU connected to a monitor that lost its EDID. I'll test with an nVidia GPU & nouveau later today. While testing I noticed a couple of things: While everything the GUI is the correct resolution, GNOME is unable to identify the monitor vendor or model. This is a regression from the previous edid override functionality. It looks like this is because the edid file in /sys is not populated with the EDID override data. I got a crash due to null pointer dereference at one point, I'll try to track down when this happens. (In reply to Paul Wise from comment #30) > While everything the GUI is the correct resolution, GNOME is unable to > identify the monitor vendor or model. This is a regression from the > previous edid override functionality. It looks like this is because the > edid file in /sys is not populated with the EDID override data. I've sent an update that should fix this; the patchwork link above has v2 now. I'm building and testing v2 now. I confirmed that v2 fixed the EDID override data. Thanks for the report and testing, sorry for taking so long to figure this out. commit 48eaeb7664c76139438724d520a1ea4a84a3ed92 Author: Jani Nikula <jani.nikula@intel.com> Date: Mon Jun 10 12:30:54 2019 +0300 drm: add fallback override/firmware EDID modes workaround |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.