Bug 111044 - Resume up from suspend sometimes freezes system (Optimus/Nouveau)
Summary: Resume up from suspend sometimes freezes system (Optimus/Nouveau)
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-03 00:00 UTC by JM9
Modified: 2019-07-10 02:53 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description JM9 2019-07-03 00:00:38 UTC
I'm on an Optimus laptop. Every time, I resume from suspend, I get this message and the resume succeeds. 

tmp tpm0: tpm_try_transmit: send(): error -5

However, sometimes, I get this following messages and the system freezes:

nouveau 0000:01: 00.0: disp: outp 03:0006:0f81: link rate unsupported by sink
nouveau 0000:01: 00.0: disp: outp 03:0006:0f81: training failed

Not sure if this is relevant, but I also have a external multi-monitor setup.


$lspci -v | grep VGA                                                                                                                                                                                                            

00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])

01:00.0 VGA compatible controller: NVIDIA Corporation GK104GLM [Quadro K3100M] (rev a1) (prog-if 00 [VGA controller])
Comment 1 JM9 2019-07-03 00:11:34 UTC
Linux jonmhome 5.1.15-arch1-1-ARCH #1 SMP PREEMPT Tue Jun 25 04:49:39 UTC 2019 x86_64 GNU/Linux
Comment 2 Ilia Mirkin 2019-07-03 00:16:11 UTC
(In reply to JM9 from comment #0)
> I'm on an Optimus laptop. Every time, I resume from suspend, I get this
> message and the resume succeeds. 
> 
> tmp tpm0: tpm_try_transmit: send(): error -5

This is related to the TPM driver - Trusted Platform Module. No clue what it's for. Intel ME interactions perhaps?

> 
> However, sometimes, I get this following messages and the system freezes:
> 
> nouveau 0000:01: 00.0: disp: outp 03:0006:0f81: link rate unsupported by sink
> nouveau 0000:01: 00.0: disp: outp 03:0006:0f81: training failed
> 
> Not sure if this is relevant, but I also have a external multi-monitor setup.

Is the system actually frozen, or just the display is down? I suspect disconnecting and reconnecting would "fix" the issue in such a situation. Unfortunately I can't help much more than that, but perhaps someone else can suggest something.
Comment 3 JM9 2019-07-03 00:29:36 UTC
>Is the system actually frozen, or just the display is down?

I'm guessing its frozen since I can't even switch consoles at this point.

>I suspect disconnecting and reconnecting would "fix" the issue in such a >situation. Unfortunately I can't help much more than that, but perhaps someone >else can suggest something.

It is possible. But typically, I disconnect my external monitors, close the laptop and leave to a different location in the field with no external monitors. So when I open the laptop and encounter the freeze, I'm not near a monitor. Closing and reopening the laptop does not help, but next time I encounter this issue, I'll see if plugging the external monitors back when in this state does anything.
Comment 4 Ilia Mirkin 2019-07-03 00:41:49 UTC
(In reply to JM9 from comment #3)
> It is possible. But typically, I disconnect my external monitors, close the
> laptop and leave to a different location in the field with no external
> monitors. So when I open the laptop and encounter the freeze, I'm not near a
> monitor. Closing and reopening the laptop does not help, but next time I
> encounter this issue, I'll see if plugging the external monitors back when
> in this state does anything.

Oh. Is your primary screen driven by nouveau or by the intel chip? If it's the intel chip, the issue is probably not (directly) related to nouveau.
Comment 5 JM9 2019-07-03 15:57:16 UTC
I believe all screens (Display Ports) are wired to intel. If you think this is an intel issue, I can file a bug there, but the nouveau message threw me.
Comment 6 Ilia Mirkin 2019-07-03 16:05:30 UTC
(In reply to JM9 from comment #5)
> I believe all screens (Display Ports) are wired to intel. If you think this
> is an intel issue, I can file a bug there, but the nouveau message threw me.

The error messages suggest that there's something attached to a DP port on the nvidia chip. Perhaps it's a phantom connection though.

You can easily check to see where things are connected by doing

grep . /sys/class/drm/card*-*/status

which should show you the current state of each port, as well as what card it's on. (Then it's a matter of determining which card is which...)
Comment 7 JM9 2019-07-03 18:56:38 UTC
ok, thanks. This is what it looks like with all the monitors connected:

/sys/class/drm/card0-eDP-1/status:connected
/sys/class/drm/card0-VGA-1/status:connected
/sys/class/drm/card1-DP-1/status:connected
/sys/class/drm/card1-DP-2/status:disconnected
/sys/class/drm/card1-DP-3/status:connected
Comment 8 Ilia Mirkin 2019-07-03 19:05:59 UTC
(In reply to JM9 from comment #7)
> ok, thanks. This is what it looks like with all the monitors connected:
> 
> /sys/class/drm/card0-eDP-1/status:connected
> /sys/class/drm/card0-VGA-1/status:connected
> /sys/class/drm/card1-DP-1/status:connected
> /sys/class/drm/card1-DP-2/status:disconnected
> /sys/class/drm/card1-DP-3/status:connected

OK, so this looks like all the DP ports are on the nvidia chip, while the internal screen (eDP-1) is on the intel chip.

On resume, is the internal screen messed up? If so, this is an intel issue (or at least intel-involved).

Also, what display system are you using? Xorg, or a wayland compositor (if so, which one)?
Comment 9 JM9 2019-07-03 19:09:39 UTC
Yes, on resume, I typically don't have the external monitors connected, so it is the internal screen that displays the nouveau messages and appears to freeze.

I'm using Wayland compositor (SwayWM: https://github.com/swaywm/sway)
Comment 10 JM9 2019-07-03 19:35:35 UTC
ok, changing component to Driver/intel.

I'm using xf86-video-intel-git 1:2.99.917+863+g6afed33b-2
Comment 11 Ilia Mirkin 2019-07-03 19:41:45 UTC
Well, this isn't really related to Driver/intel esp since you're on wayland.

I'm guessing this is some happy combination of intel waiting on something which never completes and/or sway doing something funky in this situation. This will require a lot of cross-functional investigation. I'd loop in the sway folks to see if they have some debugging suggestions.
Comment 12 JM9 2019-07-03 19:46:27 UTC
ok, thanks. This started its life as a swaywm issue and I was told it was a nouveau bug. But here is the issue. I've cross referenced this bug and will request sway developers to chime in here if they could.

https://github.com/swaywm/sway/issues/4150
Comment 13 JM9 2019-07-10 02:53:52 UTC
So I ran into this again and on the way back, stopped off at office to reconnect monitors to see if  it will wake up. Unfortunately, that didn't work. But I was able to get this from the journal. Hope it is of some help:

LID opened
ACPI action undefined: PNP0C0A:00
kernel: WARNING: CPU: 0 PID: 6623 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:172 nvkm_dp_enable+0xf2/0x110 [nouveau]
kernel: Modules linked in: cmac nls_iso8859_1 nls_cp437 vfat fat snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device hid_generic usbhid hid rfcomm ccm fuse bnep 8021q garp mrp stp llc joydev mousedev arc4 snd_hda_codec_hdmi nouveau i915 intel_rapl mei_hdcp mei_wdt x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm >
kernel:  fb_sys_fops mei_me intel_uncore ecdh_generic rtsx_pci_ms input_leds i2c_i801 pcspkr rfkill memstick mei lpc_ich soundcore intel_rapl_perf psmouse ie31200_edac parport_pc tpm_tis tpm_tis_core hp_accel battery parport lis3lv02d input_polldev evdev mac_hid tpm wmi pcc_cpufreq rng_core hp_wireless ac sg crypto_user ip_tables x_>
kernel: CPU: 0 PID: 6623 Comm: kworker/0:2 Tainted: G        W         5.1.16-arch1-1-ARCH #1
kernel: Hardware name: Hewlett-Packard HP ZBook 17 G2/2255, BIOS M70 Ver. 01.24 04/17/2019
kernel: Workqueue: events nvkm_notify_work [nouveau]
kernel: RIP: 0010:nvkm_dp_enable+0xf2/0x110 [nouveau]
kernel: Code: 00 4c 89 e7 4c 8d 83 09 01 00 00 be 01 00 00 00 e8 23 04 fd ff 85 c0 74 0a 4c 89 e7 e8 37 02 fd ff eb 81 80 7c 24 07 10 74 02 <0f> 0b 4c 89 e7 e8 24 02 fd ff 89 e8 eb 83 e8 eb 2e 67 e1 66 66 2e
kernel: RSP: 0018:ffff8ff1c9717df8 EFLAGS: 00010287
kernel: RAX: 0000000000000000 RBX: ffff8c2139352600 RCX: ffff8ff1c9717dff
kernel: RDX: ffff8ff1c9717da8 RSI: ffff8ff1c700e5d4 RDI: ffff8ff1c700e5d4
kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: ffff8ff1c9717dff
kernel: R10: 0000000000000000 R11: 0000000000000018 R12: ffff8c2126c77800
kernel: R13: ffff8c2126d8a840 R14: 0000000000000000 R15: 0ffff8c213fc2770
kernel: FS:  0000000000000000(0000) GS:ffff8c213fc00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000035c1066b1020 CR3: 0000000859c0e004 CR4: 00000000001606f0
kernel: Call Trace:
kernel:  nvkm_dp_hpd+0xf1/0x100 [nouveau]
kernel:  nvkm_notify_work+0x1d/0x80 [nouveau]
kernel:  process_one_work+0x1d1/0x3e0
kernel:  worker_thread+0x4a/0x3d0
kernel:  kthread+0xfb/0x130
kernel:  ? process_one_work+0x3e0/0x3e0
kernel:  ? kthread_park+0x90/0x90
kernel:  ret_from_fork+0x35/0x40
kernel: ---[ end trace d1860b58087867eb ]---


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.