Bug 110996 - swaywm (wayland) crashes when turning off monitors through dpms
Summary: swaywm (wayland) crashes when turning off monitors through dpms
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-25 15:09 UTC by Sjon
Modified: 2019-06-25 16:41 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
filtered kernel messages with drm.debug=0x14 nouveau.debug=disp=trace (129.47 KB, application/gzip)
2019-06-25 16:40 UTC, Sjon
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sjon 2019-06-25 15:09:55 UTC
My wayland setup always crashes when powering my 2 DP monitors down through dpms. Afterwards I can't get the monitors to power on (through ssh) - only fix is a reboot.
This is a GTX 1080 Ti btw

nouveau 0000:01:00.0: NVIDIA GP102 (132000a1)
nouveau 0000:01:00.0: bios: version 86.02.39.00.3d
nouveau 0000:01:00.0: fb: 11264 MiB GDDR5X
[TTM] Zone  kernel: Available graphics memory: 16407230 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
nouveau 0000:01:00.0: DRM: VRAM: 11264 MiB
nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
nouveau 0000:01:00.0: DRM: BIT table 'A' not found
nouveau 0000:01:00.0: DRM: BIT table 'L' not found
nouveau 0000:01:00.0: DRM: TMDS table version 2.0
nouveau 0000:01:00.0: DRM: DCB version 4.1
nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f42 00020030
nouveau 0000:01:00.0: DRM: DCB outp 01: 04811f96 04600020
nouveau 0000:01:00.0: DRM: DCB outp 02: 04011f92 00020020
nouveau 0000:01:00.0: DRM: DCB outp 03: 04822f86 04600010
nouveau 0000:01:00.0: DRM: DCB outp 04: 04022f82 00020010
nouveau 0000:01:00.0: DRM: DCB outp 06: 02033f62 00020020
nouveau 0000:01:00.0: DRM: DCB outp 07: 02844f76 04600010
nouveau 0000:01:00.0: DRM: DCB outp 08: 02044f72 00020010
nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031
nouveau 0000:01:00.0: DRM: DCB conn 01: 02000146
nouveau 0000:01:00.0: DRM: DCB conn 02: 01000246
nouveau 0000:01:00.0: DRM: DCB conn 03: 00020361
nouveau 0000:01:00.0: DRM: DCB conn 04: 00010446
nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1

...

nouveau 0000:01:00.0: DRM: core notifier timeout
nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 616e18 [ IBUS ]
nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 616d48 [ IBUS ]
nouveau 0000:01:00.0: DRM: core notifier timeout
nouveau 0000:01:00.0: DRM: base-1: timeout
nouveau 0000:01:00.0: DRM: base-1: timeout
nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 616618 [ IBUS ]
nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 616618 [ IBUS ]
nouveau 0000:01:00.0: DRM: core notifier timeout
nouveau 0000:01:00.0: DRM: base-0: timeout
nouveau 0000:01:00.0: DRM: base-0: timeout
nouveau 0000:01:00.0: DRM: base-1: timeout
nouveau 0000:01:00.0: DRM: base-0: timeout
nouveau 0000:01:00.0: DRM: base-1: timeout
nouveau 0000:01:00.0: DRM: base-0: timeout

..repeated for ~ an hour, unknown cause...

BUG: unable to handle kernel paging request at ffffaa2b3e7f6000
#PF error: [WRITE]
PGD 80ed39067 P4D 80ed39067 PUD 0 
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 7 PID: 728 Comm: sway Tainted: G           OE     5.1.14-arch1-1-ARCH #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme4, BIOS P2.90 07/11/2013
RIP: 0010:evo_wait+0x5a/0x130 [nouveau]
Code: 00 00 c1 eb 02 4c 89 f7 e8 b3 64 c2 d5 89 da 44 01 e3 48 8d 04 95 00 00 00 00 81 fb f7 03 00 00 0f 86 86 00 00 00 48 8b 45 70 <c7> 04 90 00 00 00 20 f6 45 58 01 74 09 48 8b 7d 28 e8 50 e2 ff ff
RSP: 0018:ffffaa2a838cbae0 EFLAGS: 00010212
RAX: ffffaa2a83a05000 RBX: 000000002eb7c402 RCX: ffff8ecc7dc1b000
RDX: 000000002eb7c400 RSI: 0000000000000002 RDI: ffff8ecc7a84ec10
RBP: ffff8ecc7a84eb48 R08: 0000000000000000 R09: 0000000000000004
R10: ffff8ecc8ec03980 R11: ffff8ecc8933f600 R12: 0000000000000002
R13: ffff8ecc893be350 R14: ffff8ecc7a84ec10 R15: ffff8ecc893be000
FS:  00007fe98e9c53c0(0000) GS:ffff8ecc8f1c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffaa2b3e7f6000 CR3: 00000007e4196004 CR4: 00000000000606e0
Call Trace:
 base507c_update+0x20/0x70 [nouveau]
 nv50_disp_atomic_commit_wndw.isra.0+0x5e/0x80 [nouveau]
 nv50_disp_atomic_commit_tail+0x4bb/0x6c0 [nouveau]
 nv50_disp_atomic_commit+0x16d/0x1f0 [nouveau]
 drm_atomic_connector_commit_dpms+0xd7/0x100 [drm]
 drm_mode_obj_set_property_ioctl+0x159/0x2b0 [drm]
 ? drm_connector_set_obj_prop+0x90/0x90 [drm]
 drm_connector_property_set_ioctl+0x39/0x60 [drm]
 drm_ioctl_kernel+0xb0/0xf0 [drm]
 drm_ioctl+0x233/0x400 [drm]
 ? drm_connector_set_obj_prop+0x90/0x90 [drm]
 ? unix_stream_recvmsg+0x53/0x70
 ? unix_state_double_unlock+0x40/0x40
 nouveau_drm_ioctl+0x63/0xb0 [nouveau]
 do_vfs_ioctl+0x40c/0x670
 ksys_ioctl+0x5e/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x5b/0x190
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe99088804b
Code: 0f 1e fa 48 8b 05 45 8e 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 8e 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc2148a5f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000055fafaabcc00 RCX: 00007fe99088804b
RDX: 00007ffc2148a630 RSI: 00000000c01064ab RDI: 0000000000000009
RBP: 00007ffc2148a630 R08: 000055fafb0fd9d0 R09: 000055fafb0fd9a0
R10: 0000000000000007 R11: 0000000000000246 R12: 00000000c01064ab
R13: 0000000000000009 R14: 000055fafaabf670 R15: 000055fafaabf674
Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache nct6775 hwmon_vid sunrpc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass snd_hda_codec_hdmi i915 nls_iso8859_1 nouveau nls_cp437 vfat fat crct1 scsi_mod xhci_hcd ehci_pci ehci_hcd bna
CR2: ffffaa2b3e7f6000
---[ end trace 48fbd0db141b831f ]---
RIP: 0010:evo_wait+0x5a/0x130 [nouveau]
Code: 00 00 c1 eb 02 4c 89 f7 e8 b3 64 c2 d5 89 da 44 01 e3 48 8d 04 95 00 00 00 00 81 fb f7 03 00 00 0f 86 86 00 00 00 48 8b 45 70 <c7> 04 90 00 00 00 20 f6 45 58 01 74 09 48 8b 7d 28 e8 50 e2 ff ff
RSP: 0018:ffffaa2a838cbae0 EFLAGS: 00010212
RAX: ffffaa2a83a05000 RBX: 000000002eb7c402 RCX: ffff8ecc7dc1b000
RDX: 000000002eb7c400 RSI: 0000000000000002 RDI: ffff8ecc7a84ec10
RBP: ffff8ecc7a84eb48 R08: 0000000000000000 R09: 0000000000000004
R10: ffff8ecc8ec03980 R11: ffff8ecc8933f600 R12: 0000000000000002
R13: ffff8ecc893be350 R14: ffff8ecc7a84ec10 R15: ffff8ecc893be000
FS:  00007fe98e9c53c0(0000) GS:ffff8ecc8f1c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffaa2b3e7f6000 CR3: 00000007e4196004 CR4: 00000000000606e0
Comment 1 Ilia Mirkin 2019-06-25 15:15:17 UTC
What was the first group of errors before the first "core notifier timeout"? That should help inform why there was a gpu hang. Actually, can you reproduce when booted with

drm.debug=0x14 nouveau.debug=disp=trace

and then reproducing this issue. Only the first few sets of errors are interesting (really the first, usually), everything after that is just follow-on fail.

Looks like there are two bugs here, BTW:

1. We hang the EVO engine somehow
2. We have some kind of page map fail with evo_wait (hence the BUG at the end)
Comment 2 Sjon 2019-06-25 16:40:00 UTC
Created attachment 144636 [details]
filtered kernel messages with drm.debug=0x14 nouveau.debug=disp=trace

I'm not sure what caused the 'core notifier timeout' messages to start - but there was at least half an hour between that error and the actual crash. I'll attach the requested log - it's huge and doesn't show the same hang. The problem did occur, my monitors turned off and my machine needed a reboot


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.