Bug 107390

Summary: [BISECTED] EDID read failure breaks display mirroring
Product: DRI Reporter: Justinas Narusevicius <junaru>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: harry.wentland, nicholas.kazlauskas, sunpeng.li
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Boot log showing EDID read failure
none
416.png Display mirroring available on 4.16.9 kernel
none
417.png Display mirroring unavailable in 4.17+ kernels
none
Philips 55PUS6401 (4k TV) EDID dump
none
BenQ G2420HDBL (monitor) EDID dump
none
Patch to revert the problematic commit
none
amd-staging-drm-next-5bb19d15d8f2-boot.log
none
amd-staging-drm-next-5bb19d15d8f2-with-revert-patch-boot.log
none
[PATCH] drm/amd/display: Report non-DP display as disconnected without EDID none

Description Justinas Narusevicius 2018-07-26 18:10:27 UTC
Created attachment 140839 [details]
Boot log showing EDID read failure

Two displays are connected to a POLARIS 10 GPU:

DVI-D-1   BenQ G2420HDBL (monitor)
HDMI-A-1  Philips 55PUS6401 (4k TV)

The displays worked fine in mirror mode on 1920x1080@60 until commit ac916c914c3156e53505e9ea3a9d1495518bf873: see 416.png - gnome display settings working as expected on mainline 4.16.9 kernel.

As far as i can tell ac916c914c3156e53505e9ea3a9d1495518bf873 introduces 3 issues (listing the later two because they are probably side effects of the first one):

#### 1st issue ####
ac916c914c3156e53505e9ea3a9d1495518bf873 and later builds only allow desktop to be extended and not mirrored leaving the impression AMDGPU thinks the displays have no compatible output modes for mirroring: see 417.png - gnome display settings on mainline 4.17+ no longer allowing the two displays to be mirrored, the tabline present at the top of 416.png is missing in 417.png.

Grepping through kernel logs indeed shows AMDGPU failing to read EDID (full boot log attached):

[drm:dm_logger_write [amdgpu]] *ERROR* No EDID read.

* The error message is present with only the monitor connected.
* The error message is also present with only the TV connected.
This leaves me to believe that EDID is bad on both of my displays and AMDGPU was tolerating it until now or there might be some issues on AMDGPU's side too.

#### 2nd issue (probably related) ####
If using the "Join Displays" (extended desktop) mode it was previously possible to have have an extended desktop span 3840x2160 on TV and 1920x1080 on monitor. 

After ac916c914c3156e53505e9ea3a9d1495518bf873 gnome display settings no longer allows choosing 3840x2160 on TV when monitor is also plugged in. Both displays are capped to 1920x1080@60.
The 4k resolutions return when only TV is connected.

#### 3rd issue (probably related) ####
After ac916c914c3156e53505e9ea3a9d1495518bf873 a third erroneous "Unknown display" is found and put in 'enabled' state on what appears to be HDMI-A-2. with following modes:
$ cat /sys/class/drm/card0-HDMI-A-2/modes 
1024x768
800x600
800x600
848x480
640x480

There's nothing connected to HDMI-A-2 physically.

#### SUMMARY ####
Since broken EDID is probably the root cause of all of this i'm attaching both displays EDID dumps as produced by read-edid 3.0.2
I have no experience in kernel development but would gladly test patches if anyone has ideas on what could be wrong.
I'll also understand if this will be filed under woun't fix due to display EDID issues.
Comment 1 Justinas Narusevicius 2018-07-26 18:12:17 UTC
Created attachment 140840 [details]
416.png Display mirroring available on 4.16.9 kernel
Comment 2 Justinas Narusevicius 2018-07-26 18:13:17 UTC
Created attachment 140841 [details]
417.png Display mirroring unavailable in 4.17+ kernels
Comment 3 Justinas Narusevicius 2018-07-26 18:15:03 UTC
Created attachment 140842 [details]
Philips 55PUS6401 (4k TV) EDID dump
Comment 4 Justinas Narusevicius 2018-07-26 18:16:14 UTC
Created attachment 140843 [details]
BenQ G2420HDBL (monitor) EDID dump
Comment 5 Alex Deucher 2018-07-26 20:56:56 UTC
did you find ac916c914c3156e53505e9ea3a9d1495518bf873 as the problematic by bisection?  If so, does reverting it fix the problem?  If not, can you bisect and verify that this is the actual commit that causes the problem?
Comment 6 Justinas Narusevicius 2018-07-27 12:11:37 UTC
Hey Alex,

Yes ac916c914c3156e53505e9ea3a9d1495518bf873 was found by bisecting mainline kernel between tags of v4.16 (0adb32858b0bddf4ada5f364a84ed60b196dbcda good) and v4.17-rc1 (60cc43fc888428bb2f18f08997432d426a243338 bad)

I can confirm that reverting ac916c914c3156e53505e9ea3a9d1495518bf873 via the attached patch on current mainline kernel HEAD (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cd3f77d74ac31b4627cdfa70812338076a1ea475) fixes all three issues.

* Mirroring is available once again.
* Extended desktop mode can now use all the resolutions up to and including 4K.
* There's no 3rd erroneous display on HDMI-A-2 anymore.

Should i test this against https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next or any other specific branch?
Comment 7 Justinas Narusevicius 2018-07-27 12:12:22 UTC
Created attachment 140853 [details] [review]
Patch to revert the problematic commit
Comment 8 Alex Deucher 2018-07-27 14:33:28 UTC
Harry, Leo, any objections to reverting this?
Comment 9 Harry Wentland 2018-07-31 19:33:56 UTC
That commit is correct. I don't think we should revert it. That said I don't quite understand why it leads to issues.

Are you able to take another set of kernel logs from amd-staging-drm-next, both with the regression commit and without, with drm.debug=0x4 set both times?
Comment 10 dwagner 2018-07-31 21:16:40 UTC
(In reply to Harry Wentland from comment #9)
> That commit is correct. I don't think we should revert it. That said I don't
> quite understand why it leads to issues.
Isn't it strange that dc_link_detect goes on when edid_status==EDID_BAD_CHECKSUM but does return when edid_status==EDID_NO_RESPONSE? In both cases, one cannot expect to have read a valid EDID from the display at hand, but if it's ok to continue with an invalid EDID, why not also continue without one having been received?
Comment 11 Alex Deucher 2018-07-31 21:23:21 UTC
(In reply to dwagner from comment #10)
> (In reply to Harry Wentland from comment #9)
> > That commit is correct. I don't think we should revert it. That said I don't
> > quite understand why it leads to issues.
> Isn't it strange that dc_link_detect goes on when
> edid_status==EDID_BAD_CHECKSUM but does return when
> edid_status==EDID_NO_RESPONSE? In both cases, one cannot expect to have read
> a valid EDID from the display at hand, but if it's ok to continue with an
> invalid EDID, why not also continue without one having been received?

Harry, is the expert, but a lot of times, especially with TVs or receivers, when adding the audio information, they forget to update the checksum, so the data is actually good even if the checksum is bad.
Comment 12 Justinas Narusevicius 2018-08-01 11:59:01 UTC
Created attachment 140916 [details]
amd-staging-drm-next-5bb19d15d8f2-boot.log
Comment 13 Justinas Narusevicius 2018-08-01 11:59:38 UTC
Created attachment 140917 [details]
amd-staging-drm-next-5bb19d15d8f2-with-revert-patch-boot.log
Comment 14 Justinas Narusevicius 2018-08-01 12:49:15 UTC
(In reply to Harry Wentland from comment #9)
> That commit is correct. I don't think we should revert it.
You are probably correct. Some more things i have noticed:

When using Gnome Display Manager on pre ac916c914c3156e53505e9ea3a9d1495518bf873 kernels my displays would always loose signal for half a second after login and then come right back up.

Same 'signal flicker' (atleast form users perspective) could be observed on one display when other was being turned on or powered off.

This no longer happens post ac916c914c3156e53505e9ea3a9d1495518bf873. The transition form GDM login to desktop is smooth and i can even see the desktop fadein animation.

> Are you able to take another set of kernel logs from amd-staging-drm-next,
> both with the regression commit and without, with drm.debug=0x4 set both
> times?

Logs are attached. Please let me know if i can do anything more to help.
Comment 15 Harry Wentland 2018-08-01 13:55:27 UTC
Can you try passing this on your kernel command line (with the bad commit): "video=HDMI-A-2:d"?

This will force HDMI-A-2 to report disconnected. I wonder if that helps.
Comment 16 Justinas Narusevicius 2018-08-01 14:10:01 UTC
(In reply to Harry Wentland from comment #15)
> Can you try passing this on your kernel command line (with the bad commit):
> "video=HDMI-A-2:d"?
> 
> This will force HDMI-A-2 to report disconnected. I wonder if that helps.

It does! After forcefully disabling HDMI-A-2 everything looks to be working the same way as on pre ac916c914c3156e53505e9ea3a9d1495518bf873 kernels.
Comment 17 Harry Wentland 2018-08-01 14:51:16 UTC
Looks like a faulty board that reports a port as connected when it shouldn't. Windows driver has a policy to only report that as connected if it's DP, but not for HDMI, which is likely why we've never spotted this before.

Do you have more information on your graphics card (manufacturer/model)?

Can you also print vbios_version?

As sudo:
cd /sys
cat $(find -name 'vbios_version')
Comment 18 Harry Wentland 2018-08-01 14:53:44 UTC
Created attachment 140922 [details] [review]
[PATCH] drm/amd/display: Report non-DP display as disconnected without EDID

Try this
Comment 19 Justinas Narusevicius 2018-08-01 15:34:09 UTC
(In reply to Harry Wentland from comment #17)
> Do you have more information on your graphics card (manufacturer/model)?

Its ASUS DUAL-RX580-O8G
https://www.asus.com/Graphics-Cards/DUAL-RX580-O8G/specifications/
 
> Can you also print vbios_version?
# cat $(find -name 'vbios_version')
115-D009PI2-101

# lspci|grep VGA
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7)

> Try this
This solves the problem! Everything looks to be working fine again. Thank you!

If you need any more info I'll gladly help.
Comment 20 Martin Peres 2019-11-19 08:45:48 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/465.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.