Summary: | [BISECTED] EDID read failure breaks display mirroring | ||
---|---|---|---|
Product: | DRI | Reporter: | Justinas Narusevicius <junaru> |
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | harry.wentland, nicholas.kazlauskas, sunpeng.li |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Created attachment 140840 [details]
416.png Display mirroring available on 4.16.9 kernel
Created attachment 140841 [details]
417.png Display mirroring unavailable in 4.17+ kernels
Created attachment 140842 [details]
Philips 55PUS6401 (4k TV) EDID dump
Created attachment 140843 [details]
BenQ G2420HDBL (monitor) EDID dump
did you find ac916c914c3156e53505e9ea3a9d1495518bf873 as the problematic by bisection? If so, does reverting it fix the problem? If not, can you bisect and verify that this is the actual commit that causes the problem? Hey Alex, Yes ac916c914c3156e53505e9ea3a9d1495518bf873 was found by bisecting mainline kernel between tags of v4.16 (0adb32858b0bddf4ada5f364a84ed60b196dbcda good) and v4.17-rc1 (60cc43fc888428bb2f18f08997432d426a243338 bad) I can confirm that reverting ac916c914c3156e53505e9ea3a9d1495518bf873 via the attached patch on current mainline kernel HEAD (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cd3f77d74ac31b4627cdfa70812338076a1ea475) fixes all three issues. * Mirroring is available once again. * Extended desktop mode can now use all the resolutions up to and including 4K. * There's no 3rd erroneous display on HDMI-A-2 anymore. Should i test this against https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next or any other specific branch? Created attachment 140853 [details] [review] Patch to revert the problematic commit Harry, Leo, any objections to reverting this? That commit is correct. I don't think we should revert it. That said I don't quite understand why it leads to issues. Are you able to take another set of kernel logs from amd-staging-drm-next, both with the regression commit and without, with drm.debug=0x4 set both times? (In reply to Harry Wentland from comment #9) > That commit is correct. I don't think we should revert it. That said I don't > quite understand why it leads to issues. Isn't it strange that dc_link_detect goes on when edid_status==EDID_BAD_CHECKSUM but does return when edid_status==EDID_NO_RESPONSE? In both cases, one cannot expect to have read a valid EDID from the display at hand, but if it's ok to continue with an invalid EDID, why not also continue without one having been received? (In reply to dwagner from comment #10) > (In reply to Harry Wentland from comment #9) > > That commit is correct. I don't think we should revert it. That said I don't > > quite understand why it leads to issues. > Isn't it strange that dc_link_detect goes on when > edid_status==EDID_BAD_CHECKSUM but does return when > edid_status==EDID_NO_RESPONSE? In both cases, one cannot expect to have read > a valid EDID from the display at hand, but if it's ok to continue with an > invalid EDID, why not also continue without one having been received? Harry, is the expert, but a lot of times, especially with TVs or receivers, when adding the audio information, they forget to update the checksum, so the data is actually good even if the checksum is bad. Created attachment 140916 [details]
amd-staging-drm-next-5bb19d15d8f2-boot.log
Created attachment 140917 [details]
amd-staging-drm-next-5bb19d15d8f2-with-revert-patch-boot.log
(In reply to Harry Wentland from comment #9) > That commit is correct. I don't think we should revert it. You are probably correct. Some more things i have noticed: When using Gnome Display Manager on pre ac916c914c3156e53505e9ea3a9d1495518bf873 kernels my displays would always loose signal for half a second after login and then come right back up. Same 'signal flicker' (atleast form users perspective) could be observed on one display when other was being turned on or powered off. This no longer happens post ac916c914c3156e53505e9ea3a9d1495518bf873. The transition form GDM login to desktop is smooth and i can even see the desktop fadein animation. > Are you able to take another set of kernel logs from amd-staging-drm-next, > both with the regression commit and without, with drm.debug=0x4 set both > times? Logs are attached. Please let me know if i can do anything more to help. Can you try passing this on your kernel command line (with the bad commit): "video=HDMI-A-2:d"? This will force HDMI-A-2 to report disconnected. I wonder if that helps. (In reply to Harry Wentland from comment #15) > Can you try passing this on your kernel command line (with the bad commit): > "video=HDMI-A-2:d"? > > This will force HDMI-A-2 to report disconnected. I wonder if that helps. It does! After forcefully disabling HDMI-A-2 everything looks to be working the same way as on pre ac916c914c3156e53505e9ea3a9d1495518bf873 kernels. Looks like a faulty board that reports a port as connected when it shouldn't. Windows driver has a policy to only report that as connected if it's DP, but not for HDMI, which is likely why we've never spotted this before. Do you have more information on your graphics card (manufacturer/model)? Can you also print vbios_version? As sudo: cd /sys cat $(find -name 'vbios_version') Created attachment 140922 [details] [review] [PATCH] drm/amd/display: Report non-DP display as disconnected without EDID Try this (In reply to Harry Wentland from comment #17) > Do you have more information on your graphics card (manufacturer/model)? Its ASUS DUAL-RX580-O8G https://www.asus.com/Graphics-Cards/DUAL-RX580-O8G/specifications/ > Can you also print vbios_version? # cat $(find -name 'vbios_version') 115-D009PI2-101 # lspci|grep VGA 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7) > Try this This solves the problem! Everything looks to be working fine again. Thank you! If you need any more info I'll gladly help. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/465. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 140839 [details] Boot log showing EDID read failure Two displays are connected to a POLARIS 10 GPU: DVI-D-1 BenQ G2420HDBL (monitor) HDMI-A-1 Philips 55PUS6401 (4k TV) The displays worked fine in mirror mode on 1920x1080@60 until commit ac916c914c3156e53505e9ea3a9d1495518bf873: see 416.png - gnome display settings working as expected on mainline 4.16.9 kernel. As far as i can tell ac916c914c3156e53505e9ea3a9d1495518bf873 introduces 3 issues (listing the later two because they are probably side effects of the first one): #### 1st issue #### ac916c914c3156e53505e9ea3a9d1495518bf873 and later builds only allow desktop to be extended and not mirrored leaving the impression AMDGPU thinks the displays have no compatible output modes for mirroring: see 417.png - gnome display settings on mainline 4.17+ no longer allowing the two displays to be mirrored, the tabline present at the top of 416.png is missing in 417.png. Grepping through kernel logs indeed shows AMDGPU failing to read EDID (full boot log attached): [drm:dm_logger_write [amdgpu]] *ERROR* No EDID read. * The error message is present with only the monitor connected. * The error message is also present with only the TV connected. This leaves me to believe that EDID is bad on both of my displays and AMDGPU was tolerating it until now or there might be some issues on AMDGPU's side too. #### 2nd issue (probably related) #### If using the "Join Displays" (extended desktop) mode it was previously possible to have have an extended desktop span 3840x2160 on TV and 1920x1080 on monitor. After ac916c914c3156e53505e9ea3a9d1495518bf873 gnome display settings no longer allows choosing 3840x2160 on TV when monitor is also plugged in. Both displays are capped to 1920x1080@60. The 4k resolutions return when only TV is connected. #### 3rd issue (probably related) #### After ac916c914c3156e53505e9ea3a9d1495518bf873 a third erroneous "Unknown display" is found and put in 'enabled' state on what appears to be HDMI-A-2. with following modes: $ cat /sys/class/drm/card0-HDMI-A-2/modes 1024x768 800x600 800x600 848x480 640x480 There's nothing connected to HDMI-A-2 physically. #### SUMMARY #### Since broken EDID is probably the root cause of all of this i'm attaching both displays EDID dumps as produced by read-edid 3.0.2 I have no experience in kernel development but would gladly test patches if anyone has ideas on what could be wrong. I'll also understand if this will be filed under woun't fix due to display EDID issues.