Summary: | "link training failed": nouveau does not recover from monitor suspend | ||
---|---|---|---|
Product: | xorg | Reporter: | Dan Callaghan <djc> |
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> |
Status: | RESOLVED MOVED | QA Contact: | Xorg Project Team <xorg-team> |
Severity: | normal | ||
Priority: | medium | CC: | bghome, jbeh, patrys, roflawl2009, sgonzalez |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Dan Callaghan
2015-09-10 03:07:55 UTC
Created attachment 118176 [details]
kernel messages showing "link training failed"
Complete kernel messages showing "link training failed".
At 11:34 the console went blank and the monitor started to go into power save, but I hit a key while it was still powered on showing a "Power save mode" message, before it actually powered down. Nouveau said "training complete" and I got the image back.
At 11:44 the monitor went into power save again and I let it power all the way down. Nouveau says "link training failed" and then "training complete" but the monitor just goes back to power save, there is no image.
The Nvidia proprietary driver is able to recover from the monitor going to sleep. Here is an mmiotrace showing the monitor going to sleep (xset dpms force suspend), eventually powering down, and then waking up and nvidia recovering the display: https://fedorapeople.org/~dcallagh/fdo-bz91954-mmiotrace.log.xz For a similar issue with a Lenovo W530 and any Fedora kernel above 4.1.3-201, see https://bugzilla.redhat.com/show_bug.cgi?id=1260053 External monitor remains black with kernel 4.1.6-200.fc22.x86_64 - nouveau reports "link training failed" I have a similar issue on Fedora 23 using kernel rawhide 4.4.0-0.rc5.git0.1.fc24.x86_64 + xorg-x11-drv-nouveau-1.0.12-1.fc23.x86_64. I have two monitors: DVI + Display port. After letting them go to suspend mode, when I tried to move the mouse the DVI one woke up but the one on display port was completely corrupted. I went to monitor settings in order to change resolution as an attempt to fix the issue without reboot and it froze completly the UI (I can still SSH from another computer). [14390.513212] nouveau 0000:01:00.0: disp: outp 05:0006:0f44: link training failed [14390.670611] ------------[ cut here ]------------ [14390.670617] WARNING: CPU: 3 PID: 26365 at include/drm/drm_crtc.h:1565 drm_helper_choose_encoder_dpms+0x8a/0x90 [drm_kms_helper]() [14390.670618] Modules linked in: tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_broute bridge stp llc ebtable_filter ebtable_nat ebtables ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_raw iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle binfmt_misc gspca_zc3xx gspca_main v4l2_common videodev media snd_usb_audio snd_usbmidi_lib snd_rawmidi joydev snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm iTCO_wdt iTCO_vendor_support intel_rapl eeepc_wmi iosf_mbi x86_pkg_temp_thermal coretemp asus_wmi sparse_keymap kvm rfkill irqbypass snd_timer [14390.670639] snd crct10dif_pclmul crc32_pclmul crc32c_intel soundcore mei_me mei lpc_ich i2c_i801 shpchp tpm_infineon tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc nouveau serio_raw r8169 mxm_wmi i2c_algo_bit drm_kms_helper uas mii ttm usb_storage drm hid_roccat_konepure hid_roccat hid_roccat_common video wmi fjes [14390.670651] CPU: 3 PID: 26365 Comm: kworker/3:1 Tainted: G W 4.4.0-0.rc5.git0.1.fc24.x86_64 #1 [14390.670652] Hardware name: ASUS All Series/Z87-C, BIOS 2103 08/15/2014 [14390.670663] Workqueue: events nvif_notify_work [nouveau] [14390.670664] 0000000000000000 000000005f9a7b6d ffff8801a0fefd10 ffffffff813b022f [14390.670666] 0000000000000000 ffff8801a0fefd48 ffffffff810a2ef2 ffff880212424000 [14390.670667] ffff880036a50600 ffff880214c47000 0000000000000000 ffff880214eb63e8 [14390.670668] Call Trace: [14390.670672] [<ffffffff813b022f>] dump_stack+0x44/0x55 [14390.670674] [<ffffffff810a2ef2>] warn_slowpath_common+0x82/0xc0 [14390.670675] [<ffffffff810a303a>] warn_slowpath_null+0x1a/0x20 [14390.670677] [<ffffffffa01192ca>] drm_helper_choose_encoder_dpms+0x8a/0x90 [drm_kms_helper] [14390.670679] [<ffffffffa01193bb>] drm_helper_connector_dpms+0x4b/0x100 [drm_kms_helper] [14390.670695] [<ffffffffa020574b>] nouveau_connector_hotplug+0x5b/0xb0 [nouveau] [14390.670700] [<ffffffffa0165a77>] nvif_notify_work+0x27/0xa0 [nouveau] [14390.670702] [<ffffffff81791f8e>] ? _raw_spin_unlock_irqrestore+0xe/0x10 [14390.670704] [<ffffffff810ba9bd>] ? pwq_dec_nr_in_flight+0x4d/0xa0 [14390.670705] [<ffffffff810bb07e>] process_one_work+0x19e/0x3f0 [14390.670706] [<ffffffff810bb31e>] worker_thread+0x4e/0x450 [14390.670708] [<ffffffff8178de30>] ? __schedule+0x3e0/0x9b0 [14390.670709] [<ffffffff810bb2d0>] ? process_one_work+0x3f0/0x3f0 [14390.670710] [<ffffffff810bb2d0>] ? process_one_work+0x3f0/0x3f0 [14390.670711] [<ffffffff810c10a8>] kthread+0xd8/0xf0 [14390.670712] [<ffffffff810c0fd0>] ? kthread_worker_fn+0x160/0x160 [14390.670714] [<ffffffff8179284f>] ret_from_fork+0x3f/0x70 [14390.670715] [<ffffffff810c0fd0>] ? kthread_worker_fn+0x160/0x160 [14390.670716] ---[ end trace 0fc951b1df0a1d95 ]-- Created attachment 120791 [details]
Kernel logs
I just wanted to mention that kernel 4.5-rc5 is also affected by this issue. Created attachment 121981 [details]
Kernel log of detaching and re-attaching external monitor
By looking at the log, my theory is that sometimes nouveau does not store the preferred mode correctly when probing for available modes and tries to set an invalid mode later.
In error_connecting_monitor.log at line 344 the list of available modes can be seen for screen attached via display port. The native mode is 1920x1080, so the Modeline 67 should be remembered. Now if you look at the line 392, you will see that Modeline 52 is set instead, which is not on the list of allowed modes.
Can any developer confirm this?
Created attachment 122088 [details] Kernel 4.5-rc6 log of detaching and re-attaching external monitor A fix is landed in kernel 4.5-rc6 which addresses this issue. See Ben's commit here: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=95664e66fad964c3dd7945d6edfb1d0931844664 I have been running rc6 kernel for 3 days now and haven't run into this bug since then. However I still see the "link training failed" error in the logs when the monitor is unplugged. Fortunately it does not bring down Xorg server. See the attached log file for more details. I'm still running into this issue. I posted more detailed information and the output of dmesg and journalctl on Fedoraforum, here: http://forums.fedoraforum.org/showthread.php?t=310633 When my system is suspended and wakes back up, the primary LCD is stuck on a blank white image and sometimes cannot recover when switching TTY's. Dmesg fills up with this: [ 270.115821] nouveau 0000:01:00.0: disp: outp 00:0006:0344: link training failed [ 270.122646] nouveau 0000:01:00.0: disp: outp 00:0006:0344: link training failed [ 270.123698] nouveau 0000:01:00.0: disp: outp 00:0006:0344: link not trained before attach Hardware: HP EliteBook 8440p, Legacy Boot Nvidia GT218M (NVS 3100M) Software: Fedora 24, latest updates, Gnome 3.20.2 Gallium 0.4 on NVA8 Kernel - 4.6.3-300.fc24.x86_64 Issue is still present for my system several months and updates later. Fedora 24, latest updates, Gnome and LXDE both experience issue, along with login screen. GDM 3.20.1 Nouveau running on 4.8.7-200.fc24.x86_64 We are trying to push in DRM link-status that will allow the userspace to react to errors like this. I may be convinced to add support for this in Nouveau-drm and -nouveau, since I studied it and have patches for -modesetting already. I'm no longer seeing this issue with the 4.15.10-200.fc26 kernel in Fedora. When the monitor goes to sleep, and I wake it back up, I still see two kernel messages like this: nouveau 0000:03:00.0: disp: outp 02:0006:0f42: training failed nouveau 0000:03:00.0: disp: outp 02:0006:0f42: training failed but it seems that nouveau recovers regardless. My X display comes back and everything keeps working. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/213. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.