With kernel 3.8.8 and 3.9.0, nouveau sees an EDID with incorrect checksum for the display on eDP-1 as soon as that connector is turned off. Repeatedly re-fetching that EDID causes so much stuttering that the system becomes unusuable. A kworker thread rises to near 100% CPU usage, so I suspect the EDID is being re-fetched repeatedly. Turning the connector back on restores normal behavior (the correct EDID, with correct checksum, is obtained, and there is no stuttering). Kernel 3.5.7.9 did not behave like this. Extracting the correct EDID when the display is on and supplying it as a custom EDID does not seem to prevent re-fetching of the invalid one. Passing drm_kms_helper.poll=0 does not seem to prevent the re-fetching either. I apologize in advance if it turns out that it is in fact my hardware that simply supplies the wrong EDID when the display is turned off. In that case, maybe one could add an option to stop re-fetching after N failed checksums? (I guess that's not in the domain of nouveau, though...) Device: 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M] (rev a2) Example of kernel message: [ 169.657456] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 64 [ 169.657461] Raw EDID: [ 169.657464] 00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00 [ 169.657465] 22 50 54 00 00 00 01 01 01 01 01 01 01 01 01 01 [ 169.657466] 01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20 [ 169.657468] 33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30 [ 169.657469] 40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43 [ 169.657470] 43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00 [ 169.657471] 00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d [ 169.657473] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 171.713571] nouveau E[ I2C][0000:01:00.0] AUXCH(1): tx req timeout 0x01114000 (The tx req timeout message is not always present. The EDID checksum failure is repeated for as long as eDP-1 is disabled. As soon as eDP-1 is turned on, the messages stop appearing and the stuttering described above stops.)
If it is of any interest, the EDIDs reported when off and when on are suspiciously similar: When OFF 00ff ffff ffff ff00 06af 4741 0000 0000 2250 5400 0000 0101 0101 0101 0101 0101 0101 0101 0101 9425 a03e 5184 0c30 4020 3300 2fbd 1000 001a 0d19 a03e 5184 0c30 4020 3300 2fbd 1000 001a 0000 00fe 0043 4358 4d57 8042 3134 3150 5734 0000 0000 0000 4121 1e00 0000 0009 010a 2020 009d 0000 0000 0000 0000 0000 0000 0000 0000 When ON 00ff ffff ffff ff00 06af 4741 0000 0000 0113 0104 951e 1378 0289 e594 5754 9327 2250 5400 0000 0101 0101 0101 0101 0101 0101 0101 0101 9425 a03e 5184 0c30 4020 3300 2fbd 1000 001a 0d19 a03e 5184 0c30 4020 3300 2fbd 1000 001a 0000 00fe 0043 4358 4d57 8042 3134 3150 5734 0000 0000 0000 4121 1e00 0000 0009 010a 2020 009d As you can see, the EDID when off simply lacks the second line (bytes 9-16). Again, this could of course be my hardware misbehaving.
I spoke too soon. Missing bytes 9 through 16 is not the only thing that can happen. Here are some more incorrect ones with eDP-1 off: [ 166.677153] nouveau E[ I2C][0000:01:00.0] AUXCH(1): tx req timeout 0x0111500f [ 166.683210] Raw EDID: [ 166.683213] ff ff ff ff ff 00 06 af 47 41 00 00 00 00 01 13 [ 166.683214] 01 04 95 1e 13 78 02 89 e5 94 57 54 93 27 22 50 [ 166.683214] 54 00 00 00 01 01 01 01 01 01 01 01 01 01 01 01 [ 166.683215] 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20 33 00 [ 166.683216] 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30 40 20 [ 166.683217] 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43 43 58 [ 166.683218] 4d 57 80 42 31 34 31 50 57 34 00 00 00 00 00 00 [ 166.683219] 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d 00 00 [ 167.595936] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 48 [ 167.595939] Raw EDID: [ 167.595941] 00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00 [ 167.595942] 01 13 01 04 95 1e 13 78 02 89 e5 94 57 54 93 27 [ 167.595943] 01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20 [ 167.595944] 33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30 [ 167.595944] 40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43 [ 167.595945] 43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00 [ 167.595946] 00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d [ 167.595947] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 169.657456] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 64 [ 169.657461] Raw EDID: [ 169.657464] 00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00 [ 169.657465] 22 50 54 00 00 00 01 01 01 01 01 01 01 01 01 01 [ 169.657466] 01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20 [ 169.657468] 33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30 [ 169.657469] 40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43 [ 169.657470] 43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00 [ 169.657471] 00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d [ 169.657473] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 171.713571] nouveau E[ I2C][0000:01:00.0] AUXCH(1): tx req timeout 0x01114000 [ 173.092798] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 48 [ 173.092802] Raw EDID: [ 173.092804] 00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00 [ 173.092805] 01 13 01 04 95 1e 13 78 02 89 e5 94 57 54 93 27 [ 173.092806] 01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20 [ 173.092807] 33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30 [ 173.092807] 40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43 [ 173.092808] 43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00 [ 173.092809] 00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d [ 173.092810] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 174.006147] nouveau E[ I2C][0000:01:00.0] AUXCH(1): tx req timeout 0x0111500f [ 174.012152] Raw EDID: [ 174.012155] ff ff ff ff ff 00 06 af 47 41 00 00 00 00 01 13 [ 174.012156] 01 04 95 1e 13 78 02 89 e5 94 57 54 93 27 22 50 [ 174.012157] 54 00 00 00 01 01 01 01 01 01 01 01 01 01 01 01 [ 174.012157] 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20 33 00 [ 174.012158] 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30 40 20 [ 174.012159] 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43 43 58 [ 174.012160] 4d 57 80 42 31 34 31 50 57 34 00 00 00 00 00 00 [ 174.012161] 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d 00 00 [ 175.380475] nouveau E[ I2C][0000:01:00.0] AUXCH(1): tx req timeout 0x0111500f [ 177.215513] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 48 [ 177.215516] Raw EDID: [ 177.215518] 00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00 [ 177.215519] 01 13 01 04 95 1e 13 78 02 89 e5 94 57 54 93 27 [ 177.215520] 01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20 [ 177.215520] 33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30 [ 177.215521] 40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43 [ 177.215522] 43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00 [ 177.215523] 00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d [ 177.215524] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 178.818764] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 48 [ 178.818768] Raw EDID: [ 178.818770] 00 ff ff ff ff ff ff 00 06 af 47 41 00 00 00 00 [ 178.818771] 01 13 01 04 95 1e 13 78 02 89 e5 94 57 54 93 27 [ 178.818772] 01 01 01 01 01 01 94 25 a0 3e 51 84 0c 30 40 20 [ 178.818773] 33 00 2f bd 10 00 00 1a 0d 19 a0 3e 51 84 0c 30 [ 178.818774] 40 20 33 00 2f bd 10 00 00 1a 00 00 00 fe 00 43 [ 178.818775] 43 58 4d 57 80 42 31 34 31 50 57 34 00 00 00 00 [ 178.818776] 00 00 41 21 1e 00 00 00 00 09 01 0a 20 20 00 9d [ 178.818777] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
I've finished a git bisect (log attached). It concludes with commit 69787f7da6b2adc4054357a661aaa1701a9ca76f: 69787f7da6b2adc4054357a661aaa1701a9ca76f is the first bad commit commit 69787f7da6b2adc4054357a661aaa1701a9ca76f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Oct 23 18:23:34 2012 +0000 drm: run the hpd irq event code directly All drivers already have a work item to run the hpd code, so we don't need to launch a new one in the helper code. Dave Airlie mentioned that the cancel+re-queue might paper over DP related hpd ping-pongs, hence why this is split out. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> :040000 040000 34387171acedcebba1e05d10b819c9363833650b 2476d15ff51e66eec241d6ab5e8943c4c46c4360 M drivers :040000 040000 e0c4edaf7c42932ee64d431177244b7000e93fb3 138f6ab2e8c8e5d3985ddaa6610f0e95f241ee59 M include
Created attachment 79786 [details] git bisect log Concludes with commit 69787f7da6b2adc4054357a661aaa1701a9ca76f as the first bad one.
The patch from https://patchwork.kernel.org/patch/2402211/ makes the symptoms go away for me with drm_kms_helper.poll=N.
Quick question: For older, working kernels do you also need to set the poll=0 option to get a useable system or does it work without any kernel option?
I just checked with a known working 3.5 kernel. It does not need poll=0 in order to be symptom-free. More interestingly though, I accidentally booted into my current kernel (3.8.x with the patch from comment 5) without poll=0, and it also works fine. I am sorry if I overlooked this earlier. It could also be that my distribution has patched the kernel since May. I can look through the changelogs and/or try an upstream kernel when I have some free time (maybe this weekend).
Testing with an upstream kernel would be preferable, either nouveau/git or 3.11-rc7 (or 3.11 if it's out by then).
*** Bug 48090 has been marked as a duplicate of this bug. ***
With 3.11-rc7, I no longer see the same stuttering described in the initial bug report. There is, however, still a 100% load kworker thread, lots and lots of EDID checksum failed messages, and long periods of unresponsiveness when changing resolution. Adding drm_kms_helper.poll=N removes the repeated EDID checksum messages, but the system still becomes unresponsive for large stretches of time after changing resolution, and there's still a kworker thread at 100%.
Disregard my comment 7 about not needing poll=0 for older, working kernels. I just realized that I had it set as a modprobe option, so it was still in effect when I tested without it as a boot option. I've checked again, and in reference to Vetter's comment 6: I *do* need poll=0 also for older kernels that did not need the patch. I'm very sorry for the misinformation.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/45.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.