Bug 64858 - [BISECTED] Nouveau sees invalid EDID for eDP-1 whenever that display is turned off
Summary: [BISECTED] Nouveau sees invalid EDID for eDP-1 whenever that display is turne...
Status: NEEDINFO
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 48090 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-05-22 09:50 UTC by gspr
Modified: 2013-10-02 07:28 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
git bisect log (2.14 KB, text/plain)
2013-05-25 14:48 UTC, gspr
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description gspr 2013-05-22 09:50:15 UTC
With kernel 3.8.8 and 3.9.0, nouveau sees an EDID with incorrect checksum for the display on eDP-1 as soon as that connector is turned off. Repeatedly re-fetching that EDID causes so much stuttering that the system becomes unusuable. A kworker thread rises to near 100% CPU usage, so I suspect the EDID is being re-fetched repeatedly. Turning the connector back on restores normal behavior (the correct EDID, with correct checksum, is obtained, and there is no stuttering). Kernel 3.5.7.9 did not behave like this.

Extracting the correct EDID when the display is on and supplying it as a custom EDID does not seem to prevent re-fetching of the invalid one. Passing drm_kms_helper.poll=0 does not seem to prevent the re-fetching either.

I apologize in advance if it turns out that it is in fact my hardware that simply supplies the wrong EDID when the display is turned off. In that case, maybe one could add an option to stop re-fetching after N failed checksums? (I guess that's not in the domain of nouveau, though...)

Device: 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M] (rev a2)

Example of kernel message:
[  169.657456] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 64
[  169.657461] Raw EDID:
[  169.657464]          00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00
[  169.657465]          22 50 54 00 00 00 01 01 01 01 01 01 01 01 01 01
[  169.657466]          01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20
[  169.657468]          33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30
[  169.657469]          40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43
[  169.657470]          43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00
[  169.657471]          00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d
[  169.657473]          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  171.713571] nouveau E[     I2C][0000:01:00.0] AUXCH(1): tx req timeout 0x01114000

(The tx req timeout message is not always present. The EDID checksum failure is repeated for as long as eDP-1 is disabled. As soon as eDP-1 is turned on, the messages stop appearing and the stuttering described above stops.)
Comment 1 gspr 2013-05-22 10:08:39 UTC
If it is of any interest, the EDIDs reported when off and when on are suspiciously similar:

When OFF
00ff ffff ffff ff00 06af 4741 0000 0000
2250 5400 0000 0101 0101 0101 0101 0101
0101 0101 0101 9425 a03e 5184 0c30 4020
3300 2fbd 1000 001a 0d19 a03e 5184 0c30
4020 3300 2fbd 1000 001a 0000 00fe 0043
4358 4d57 8042 3134 3150 5734 0000 0000
0000 4121 1e00 0000 0009 010a 2020 009d
0000 0000 0000 0000 0000 0000 0000 0000

When ON
00ff ffff ffff ff00 06af 4741 0000 0000
0113 0104 951e 1378 0289 e594 5754 9327
2250 5400 0000 0101 0101 0101 0101 0101
0101 0101 0101 9425 a03e 5184 0c30 4020
3300 2fbd 1000 001a 0d19 a03e 5184 0c30
4020 3300 2fbd 1000 001a 0000 00fe 0043
4358 4d57 8042 3134 3150 5734 0000 0000
0000 4121 1e00 0000 0009 010a 2020 009d

As you can see, the EDID when off simply lacks the second line (bytes 9-16). Again, this could of course be my hardware misbehaving.
Comment 2 gspr 2013-05-22 10:18:40 UTC
I spoke too soon. Missing bytes 9 through 16 is not the only thing that can happen. Here are some more incorrect ones with eDP-1 off:

[  166.677153] nouveau E[     I2C][0000:01:00.0] AUXCH(1): tx req timeout 0x0111500f
[  166.683210] Raw EDID:
[  166.683213]          ff ff ff ff ff 00 06 af 47 41 00 00 00 00 01 13
[  166.683214]          01 04 95 1e 13 78 02 89 e5 94 57 54 93 27 22 50
[  166.683214]          54 00 00 00 01 01 01 01 01 01 01 01 01 01 01 01
[  166.683215]          01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20 33 00
[  166.683216]          2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30 40 20
[  166.683217]          33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43 43 58
[  166.683218]          4d 57 80 42 31 34 31 50 57 34 00 00 00 00 00 00
[  166.683219]          41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d 00 00
[  167.595936] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 48
[  167.595939] Raw EDID:
[  167.595941]          00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00
[  167.595942]          01 13 01 04 95 1e 13 78 02 89 e5 94 57 54 93 27
[  167.595943]          01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20
[  167.595944]          33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30
[  167.595944]          40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43
[  167.595945]          43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00
[  167.595946]          00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d
[  167.595947]          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  169.657456] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 64
[  169.657461] Raw EDID:
[  169.657464]          00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00
[  169.657465]          22 50 54 00 00 00 01 01 01 01 01 01 01 01 01 01
[  169.657466]          01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20
[  169.657468]          33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30
[  169.657469]          40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43
[  169.657470]          43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00
[  169.657471]          00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d
[  169.657473]          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  171.713571] nouveau E[     I2C][0000:01:00.0] AUXCH(1): tx req timeout 0x01114000
[  173.092798] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 48
[  173.092802] Raw EDID:
[  173.092804]          00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00
[  173.092805]          01 13 01 04 95 1e 13 78 02 89 e5 94 57 54 93 27
[  173.092806]          01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20
[  173.092807]          33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30
[  173.092807]          40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43
[  173.092808]          43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00
[  173.092809]          00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d
[  173.092810]          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  174.006147] nouveau E[     I2C][0000:01:00.0] AUXCH(1): tx req timeout 0x0111500f
[  174.012152] Raw EDID:
[  174.012155]          ff ff ff ff ff 00 06 af 47 41 00 00 00 00 01 13
[  174.012156]          01 04 95 1e 13 78 02 89 e5 94 57 54 93 27 22 50
[  174.012157]          54 00 00 00 01 01 01 01 01 01 01 01 01 01 01 01
[  174.012157]          01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20 33 00
[  174.012158]          2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30 40 20
[  174.012159]          33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43 43 58
[  174.012160]          4d 57 80 42 31 34 31 50 57 34 00 00 00 00 00 00
[  174.012161]          41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d 00 00
[  175.380475] nouveau E[     I2C][0000:01:00.0] AUXCH(1): tx req timeout 0x0111500f
[  177.215513] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 48
[  177.215516] Raw EDID:
[  177.215518]          00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00
[  177.215519]          01 13 01 04 95 1e 13 78 02 89 e5 94 57 54 93 27
[  177.215520]          01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20
[  177.215520]          33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30
[  177.215521]          40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43
[  177.215522]          43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00
[  177.215523]          00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d
[  177.215524]          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  178.818764] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 48
[  178.818768] Raw EDID:
[  178.818770]          00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00
[  178.818771]          01 13 01 04 95 1e 13 78 02 89 e5 94 57 54 93 27
[  178.818772]          01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20
[  178.818773]          33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30
[  178.818774]          40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43
[  178.818775]          43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00
[  178.818776]          00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d
[  178.818777]          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Comment 3 gspr 2013-05-25 14:46:18 UTC
I've finished a git bisect (log attached). It concludes with commit 69787f7da6b2adc4054357a661aaa1701a9ca76f:

69787f7da6b2adc4054357a661aaa1701a9ca76f is the first bad commit
commit 69787f7da6b2adc4054357a661aaa1701a9ca76f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Oct 23 18:23:34 2012 +0000

    drm: run the hpd irq event code directly

    All drivers already have a work item to run the hpd code, so we don't
    need to launch a new one in the helper code. Dave Airlie mentioned
    that the cancel+re-queue might paper over DP related hpd ping-pongs,
    hence why this is split out.

    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>

:040000 040000 34387171acedcebba1e05d10b819c9363833650b 2476d15ff51e66eec241d6ab5e8943c4c46c4360 M drivers
:040000 040000 e0c4edaf7c42932ee64d431177244b7000e93fb3 138f6ab2e8c8e5d3985ddaa6610f0e95f241ee59 M include
Comment 4 gspr 2013-05-25 14:48:02 UTC
Created attachment 79786 [details]
git bisect log

Concludes with commit 69787f7da6b2adc4054357a661aaa1701a9ca76f as the first bad one.
Comment 5 gspr 2013-05-27 08:04:18 UTC
The patch from https://patchwork.kernel.org/patch/2402211/ makes the symptoms go away for me with drm_kms_helper.poll=N.
Comment 6 Daniel Vetter 2013-08-28 08:11:48 UTC
Quick question: For older, working kernels do you also need to set the poll=0 option to get a useable system or does it work without any kernel option?
Comment 7 gspr 2013-08-28 17:04:25 UTC
I just checked with a known working 3.5 kernel. It does not need poll=0 in order to be symptom-free. More interestingly though, I accidentally booted into my current kernel (3.8.x with the patch from comment 5) without poll=0, and it also works fine. I am sorry if I overlooked this earlier. It could also be that my distribution has patched the kernel since May. I can look through the changelogs and/or try an upstream kernel when I have some free time (maybe this weekend).
Comment 8 Ilia Mirkin 2013-08-28 17:17:22 UTC
Testing with an upstream kernel would be preferable, either nouveau/git or 3.11-rc7 (or 3.11 if it's out by then).
Comment 9 Ilia Mirkin 2013-08-31 01:49:32 UTC
*** Bug 48090 has been marked as a duplicate of this bug. ***
Comment 10 gspr 2013-09-01 13:08:02 UTC
With 3.11-rc7, I no longer see the same stuttering described in the initial bug report. There is, however, still a 100% load kworker thread, lots and lots of EDID checksum failed messages, and long periods of unresponsiveness when changing resolution. Adding drm_kms_helper.poll=N removes the repeated EDID checksum messages, but the system still becomes unresponsive for large stretches of time after changing resolution, and there's still a kworker thread at 100%.
Comment 11 gspr 2013-10-02 07:28:27 UTC
Disregard my comment 7 about not needing poll=0 for older, working kernels. I just realized that I had it set as a modprobe option, so it was still in effect when I tested without it as a boot option. I've checked again, and in reference to Vetter's comment 6: I *do* need poll=0 also for older kernels that did not need the patch.

I'm very sorry for the misinformation.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.