My Tonga Firepro W7170M MXM GPU seems to have a bad reaction to kernels higher than Ubuntu's 4.18 on 18.10.
I just tried a whole bunch of mainlines and Ubuntu 19.04 and nailed it down that any kernel higher than 4.18 is causing the GPU to no longer be able to get the EDID of my dreamcolor IPS display.
Not sure how to best go about bisecting this...
It uses a colorboard to convert what I can only assume to be horizontal signal and vertical signalling information to two separate channels of LVDS from a single eDP connector coming off the board.
This results in the image shown on screen being completely unreadable with crazy scan wrapping from the channels getting mixed.
May be related to: https://bugs.freedesktop.org/show_bug.cgi?id=108806
Please attach the corresponding full output of dmesg.
Created attachment 143957 [details]
Dmesg output from an affected kernel
Here is the issue, you can see
[ 1.335096] [drm:dc_link_detect [amdgpu]] *ERROR* No EDID read.
Which is about where the display turns into fruit salad.
Please let me know if I can submit any more debug info that would assist! I am very much in need of running a newer kernel.
Looks like this is a bug in DRM itself with the parsing.
[ 1.325111] [drm] parse error at position 12 in video mode 'firstname.lastname@example.org'
[ 1.335096] [drm:dc_link_detect [amdgpu]] *ERROR* No EDID read.
It's hitting the "." as part of the video mode and erroring out because DRM doesn't consider it a valid character.
(In reply to Nicholas Kazlauskas from comment #3)
> It's hitting the "." as part of the video mode and erroring out because DRM
> doesn't consider it a valid character.
Or maybe there's two separate issues? Failure to parse the mode name shouldn't affect whether or not DC picks up EDID, should it?
(In reply to Michel Dänzer from comment #4)
> (In reply to Nicholas Kazlauskas from comment #3)
> > It's hitting the "." as part of the video mode and erroring out because DRM
> > doesn't consider it a valid character.
> Or maybe there's two separate issues? Failure to parse the mode name
> shouldn't affect whether or not DC picks up EDID, should it?
I suppose that doesn't actually influence whether the EDID has been read or is valid. So that's a different bug.
The "No EDID read." error comes from the result from drm_get_edid(...) being NULL, however.
While we're responsible for the actual transfers to and from the receiver the actual logic is shared there between drivers.
(In reply to Babblebones from comment #2)
> Created attachment 143957 [details]
> Dmesg output from an affected kernel
> Here is the issue, you can see
> [ 1.335096] [drm:dc_link_detect [amdgpu]] *ERROR* No EDID read.
> Which is about where the display turns into fruit salad.
> Please let me know if I can submit any more debug info that would assist! I
> am very much in need of running a newer kernel.
Can you post a dmesg log with drm.debug=4 as part of your kernel boot parameters?
Will do. Just switched my distro to Gentoo, specifically so I can stay on kernel 4.18 for as long as necessary to combat the issue and apply a patch when ready, and cleared all of the cruft out of the grub config.
I can drop my EDID itself here as well if that will help.
Give me a bit to get everything ready and I will post the drm debug of old and new.
Created attachment 144124 [details]
Funny enough it complains about missing EDID in this one too. May not even be the issue now that I'm coming in on it.
But I seem to have found the commit where my panel breaks down.
It's a big drm-next merge. It has to be one of the core changes listed on the changelog.
I'm not a git wizard, is there any way to get more granular about this commit?
Anyone have an idea what's broken in here?
I've been going down the rabbithole looking for the commit that soured my display.
Is the closest I can get so far, if I go one back, the kernel version works with my display.
The commit directly before that
Works just fine.
Is there anything inside this merge that would cause this?
I don't know much of what I'm looking at or looking for in these commits but I'll continue dissecting.
I've found the exact commit!
Fixes the issue against a few kernels affected but my issue is that the code base has been modified so heavily while retaining the same behavior that I can't apply this to kernel 5.2 linux-stable git.
I can't even discern where to manually edit the related files to change the behavior.
It may be necessary to include another fix that that list of related patches to fix the behavior for my connector/ panel. Not a programmer myself so I'm not sure what's supposed to happen here.
Best I can tell, and I may be wrong, the error checking code was moved from the DC part straight into DRM which now replicates the exact bug which was reverted by the above DC commits but were never implemented for DRM.
My dreamcolor display uses a STDP8028 chip, istting inbetween the display and the motherboard just behind the screen, to convert a displayport signal coming off the board into a dual channel LVDS to run the display.
The EDID can't be read through this for some reason and it doesn't print any modelines at all for the display so it picks the lowest resolution possible and all the timings are incorrect resulting in the display scramble.
I hope the behavior highlighted in the above commit can help someone search for the regression in the new DRM mode setting as it produces the exact same type of scramble and lack of modelines.
Created attachment 144277 [details]
Don't know if this helps but ALL kernels seem affected by not being able to grab EDID on startup but the changes after 4.18 break something further as to make the display unusable by changing the default behavior of the panel's mode without EDID.
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.18.20 root=/dev/sdb2 ro amdgpu.ppfeaturemask=0xffffffff amdgpu.dpm=1 amdgpu.dc=1 amdgpu.gpu_recovery=1 amdgpu.powerplay=1 drm.edid_firmware=eDP-1:edid/edid.bin
[ 4.340989] [drm] Got external EDID base block and 0 extensions from "edid/edid.bin" for connector "eDP-1"
[ 4.451624] drm_do_probe_ddc_edid+0xb9/0x130
[ 4.451628] ? drm_edid_block_valid+0x180/0x180
[ 4.451629] drm_do_get_edid+0xb1/0x330
[ 4.451631] drm_get_edid+0x61/0x380
[ 4.451671] dm_helpers_read_local_edid+0x4c/0xe0 [amdgpu]
[ 4.630956] drm_do_probe_ddc_edid+0xb9/0x130
[ 4.630966] ? drm_edid_block_valid+0x180/0x180
[ 4.630969] drm_do_get_edid+0xb1/0x330
[ 4.630972] drm_get_edid+0x61/0x380
[ 4.631115] dm_helpers_read_local_edid+0x4c/0xe0 [amdgpu]
[ 4.801879] drm_do_probe_ddc_edid+0xb9/0x130
[ 4.801889] ? drm_edid_block_valid+0x180/0x180
[ 4.801892] drm_do_get_edid+0xb1/0x330
[ 4.801895] drm_get_edid+0x61/0x380
[ 4.802006] dm_helpers_read_local_edid+0x4c/0xe0 [amdgpu]
[ 4.972889] drm_do_probe_ddc_edid+0xb9/0x130
[ 4.972900] ? drm_edid_block_valid+0x180/0x180
[ 4.972903] drm_do_get_edid+0xb1/0x330
[ 4.972907] drm_get_edid+0x61/0x380
[ 4.973028] dm_helpers_read_local_edid+0x4c/0xe0 [amdgpu]
[ 5.145825] drm_do_probe_ddc_edid+0xb9/0x130
[ 5.145835] ? drm_edid_block_valid+0x180/0x180
[ 5.145837] drm_do_get_edid+0xb1/0x330
[ 5.145841] drm_get_edid+0x61/0x380
[ 5.145942] dm_helpers_read_local_edid+0x4c/0xe0 [amdgpu]
[ 5.170556] [drm:dc_link_detect [amdgpu]] *ERROR* No EDID read.
Is what happens when I set the edid manually from the kernel commandline.
Setting it manually freaks out newer kernels and my display won't modeset properly making it a mess but on 4.18 it seemed to drop this.
Any debug patches I can run to help you guys figure it out?
I have included my EDID file if you want to run it through anything to see what breaks.