110996 – swaywm (wayland) crashes when turning off monitors through dpms

Bug 110996 - swaywm (wayland) crashes when turning off monitors through dpms

Summary: swaywm (wayland) crashes when turning off monitors through dpms

Status:	RESOLVED MOVED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/nouveau (show other bugs)
Version:	unspecified
Hardware:	x86 (IA32) Linux (All)

Importance:	medium major
Assignee:	Nouveau Project
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2019-06-25 15:09 UTC by Sjon
Modified:	2019-12-04 09:50 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
filtered kernel messages with drm.debug=0x14 nouveau.debug=disp=trace (129.47 KB, application/gzip) 2019-06-25 16:40 UTC, Sjon	no flags	Details
View All

Description Sjon 2019-06-25 15:09:55 UTC

My wayland setup always crashes when powering my 2 DP monitors down through dpms. Afterwards I can't get the monitors to power on (through ssh) - only fix is a reboot.
This is a GTX 1080 Ti btw

nouveau 0000:01:00.0: NVIDIA GP102 (132000a1)
nouveau 0000:01:00.0: bios: version 86.02.39.00.3d
nouveau 0000:01:00.0: fb: 11264 MiB GDDR5X
[TTM] Zone  kernel: Available graphics memory: 16407230 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
nouveau 0000:01:00.0: DRM: VRAM: 11264 MiB
nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
nouveau 0000:01:00.0: DRM: BIT table 'A' not found
nouveau 0000:01:00.0: DRM: BIT table 'L' not found
nouveau 0000:01:00.0: DRM: TMDS table version 2.0
nouveau 0000:01:00.0: DRM: DCB version 4.1
nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f42 00020030
nouveau 0000:01:00.0: DRM: DCB outp 01: 04811f96 04600020
nouveau 0000:01:00.0: DRM: DCB outp 02: 04011f92 00020020
nouveau 0000:01:00.0: DRM: DCB outp 03: 04822f86 04600010
nouveau 0000:01:00.0: DRM: DCB outp 04: 04022f82 00020010
nouveau 0000:01:00.0: DRM: DCB outp 06: 02033f62 00020020
nouveau 0000:01:00.0: DRM: DCB outp 07: 02844f76 04600010
nouveau 0000:01:00.0: DRM: DCB outp 08: 02044f72 00020010
nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031
nouveau 0000:01:00.0: DRM: DCB conn 01: 02000146
nouveau 0000:01:00.0: DRM: DCB conn 02: 01000246
nouveau 0000:01:00.0: DRM: DCB conn 03: 00020361
nouveau 0000:01:00.0: DRM: DCB conn 04: 00010446
nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1

...

nouveau 0000:01:00.0: DRM: core notifier timeout
nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 616e18 [ IBUS ]
nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 616d48 [ IBUS ]
nouveau 0000:01:00.0: DRM: core notifier timeout
nouveau 0000:01:00.0: DRM: base-1: timeout
nouveau 0000:01:00.0: DRM: base-1: timeout
nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 616618 [ IBUS ]
nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 616618 [ IBUS ]
nouveau 0000:01:00.0: DRM: core notifier timeout
nouveau 0000:01:00.0: DRM: base-0: timeout
nouveau 0000:01:00.0: DRM: base-0: timeout
nouveau 0000:01:00.0: DRM: base-1: timeout
nouveau 0000:01:00.0: DRM: base-0: timeout
nouveau 0000:01:00.0: DRM: base-1: timeout
nouveau 0000:01:00.0: DRM: base-0: timeout

..repeated for ~ an hour, unknown cause...

BUG: unable to handle kernel paging request at ffffaa2b3e7f6000
#PF error: [WRITE]
PGD 80ed39067 P4D 80ed39067 PUD 0 
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 7 PID: 728 Comm: sway Tainted: G           OE     5.1.14-arch1-1-ARCH #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme4, BIOS P2.90 07/11/2013
RIP: 0010:evo_wait+0x5a/0x130 [nouveau]
Code: 00 00 c1 eb 02 4c 89 f7 e8 b3 64 c2 d5 89 da 44 01 e3 48 8d 04 95 00 00 00 00 81 fb f7 03 00 00 0f 86 86 00 00 00 48 8b 45 70 <c7> 04 90 00 00 00 20 f6 45 58 01 74 09 48 8b 7d 28 e8 50 e2 ff ff
RSP: 0018:ffffaa2a838cbae0 EFLAGS: 00010212
RAX: ffffaa2a83a05000 RBX: 000000002eb7c402 RCX: ffff8ecc7dc1b000
RDX: 000000002eb7c400 RSI: 0000000000000002 RDI: ffff8ecc7a84ec10
RBP: ffff8ecc7a84eb48 R08: 0000000000000000 R09: 0000000000000004
R10: ffff8ecc8ec03980 R11: ffff8ecc8933f600 R12: 0000000000000002
R13: ffff8ecc893be350 R14: ffff8ecc7a84ec10 R15: ffff8ecc893be000
FS:  00007fe98e9c53c0(0000) GS:ffff8ecc8f1c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffaa2b3e7f6000 CR3: 00000007e4196004 CR4: 00000000000606e0
Call Trace:
 base507c_update+0x20/0x70 [nouveau]
 nv50_disp_atomic_commit_wndw.isra.0+0x5e/0x80 [nouveau]
 nv50_disp_atomic_commit_tail+0x4bb/0x6c0 [nouveau]
 nv50_disp_atomic_commit+0x16d/0x1f0 [nouveau]
 drm_atomic_connector_commit_dpms+0xd7/0x100 [drm]
 drm_mode_obj_set_property_ioctl+0x159/0x2b0 [drm]
 ? drm_connector_set_obj_prop+0x90/0x90 [drm]
 drm_connector_property_set_ioctl+0x39/0x60 [drm]
 drm_ioctl_kernel+0xb0/0xf0 [drm]
 drm_ioctl+0x233/0x400 [drm]
 ? drm_connector_set_obj_prop+0x90/0x90 [drm]
 ? unix_stream_recvmsg+0x53/0x70
 ? unix_state_double_unlock+0x40/0x40
 nouveau_drm_ioctl+0x63/0xb0 [nouveau]
 do_vfs_ioctl+0x40c/0x670
 ksys_ioctl+0x5e/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x5b/0x190
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe99088804b
Code: 0f 1e fa 48 8b 05 45 8e 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 8e 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc2148a5f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000055fafaabcc00 RCX: 00007fe99088804b
RDX: 00007ffc2148a630 RSI: 00000000c01064ab RDI: 0000000000000009
RBP: 00007ffc2148a630 R08: 000055fafb0fd9d0 R09: 000055fafb0fd9a0
R10: 0000000000000007 R11: 0000000000000246 R12: 00000000c01064ab
R13: 0000000000000009 R14: 000055fafaabf670 R15: 000055fafaabf674
Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache nct6775 hwmon_vid sunrpc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass snd_hda_codec_hdmi i915 nls_iso8859_1 nouveau nls_cp437 vfat fat crct1 scsi_mod xhci_hcd ehci_pci ehci_hcd bna
CR2: ffffaa2b3e7f6000
---[ end trace 48fbd0db141b831f ]---
RIP: 0010:evo_wait+0x5a/0x130 [nouveau]
Code: 00 00 c1 eb 02 4c 89 f7 e8 b3 64 c2 d5 89 da 44 01 e3 48 8d 04 95 00 00 00 00 81 fb f7 03 00 00 0f 86 86 00 00 00 48 8b 45 70 <c7> 04 90 00 00 00 20 f6 45 58 01 74 09 48 8b 7d 28 e8 50 e2 ff ff
RSP: 0018:ffffaa2a838cbae0 EFLAGS: 00010212
RAX: ffffaa2a83a05000 RBX: 000000002eb7c402 RCX: ffff8ecc7dc1b000
RDX: 000000002eb7c400 RSI: 0000000000000002 RDI: ffff8ecc7a84ec10
RBP: ffff8ecc7a84eb48 R08: 0000000000000000 R09: 0000000000000004
R10: ffff8ecc8ec03980 R11: ffff8ecc8933f600 R12: 0000000000000002
R13: ffff8ecc893be350 R14: ffff8ecc7a84ec10 R15: ffff8ecc893be000
FS:  00007fe98e9c53c0(0000) GS:ffff8ecc8f1c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffaa2b3e7f6000 CR3: 00000007e4196004 CR4: 00000000000606e0

Comment 1 Ilia Mirkin 2019-06-25 15:15:17 UTC

What was the first group of errors before the first "core notifier timeout"? That should help inform why there was a gpu hang. Actually, can you reproduce when booted with

drm.debug=0x14 nouveau.debug=disp=trace

and then reproducing this issue. Only the first few sets of errors are interesting (really the first, usually), everything after that is just follow-on fail.

Looks like there are two bugs here, BTW:

1. We hang the EVO engine somehow
2. We have some kind of page map fail with evo_wait (hence the BUG at the end)

Comment 2 Sjon 2019-06-25 16:40:00 UTC

Created attachment 144636 [details]
filtered kernel messages with drm.debug=0x14 nouveau.debug=disp=trace

I'm not sure what caused the 'core notifier timeout' messages to start - but there was at least half an hour between that error and the actual crash. I'll attach the requested log - it's huge and doesn't show the same hang. The problem did occur, my monitors turned off and my machine needed a reboot

Comment 3 Martin Peres 2019-12-04 09:50:23 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/491.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.