Bug 112239 - nouveau hangs video with TU116 - regression in kernel 5.3
Summary: nouveau hangs video with TU116 - regression in kernel 5.3
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: 7.7 (2012.06)
Hardware: Other All
: not set not set
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-10 00:01 UTC by Marcin Zajaczkowski
Modified: 2019-12-04 09:55 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Logs when I open and later on close a lid (169.84 KB, text/x-log)
2019-11-10 00:08 UTC, Marcin Zajaczkowski
no flags Details

Description Marcin Zajaczkowski 2019-11-10 00:01:10 UTC
My GeForce GTX 1660 Ti mobile (NV168/TU116) in Hyperbook NH5/Clevo NH55RCQ worked "fine" with some applied workarounds with kernel 5.2 (https://bugs.freedesktop.org/show_bug.cgi?id=110830#c14), however, with upgrade to 5.3 it started to hang video on the NVidia card state switch. In fact, I don't use it to render the output (is DynOff by default), but I cannot disable it in BIOS and when I open/close a laptop lid it is temporarily waken up to get back to sleep after a few seconds. It works that way in 5.2, but in 5.3 it "hangs video" on the consequtive switch (occasionally also during the first X/gdm setup).

The key related errors in the system log:
> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
(a lot of)

> kernel: nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 0000000000002000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel -1 [017fedf000 unknown]
(every few seconds)

> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> kernel: ------------[ cut here ]------------
> kernel: nouveau 0000:01:00.0: timeout
> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
(and the end)


On boot (here with self rebuilt kernel-core-5.4.0-0.rc6.git0.1.fc30.x86_64 on Fedora 30, but the errors are similar with 5.3) I see:

> Nov 1000:26:12 foobar kernel: Linux version 5.4.0-0.rc6.git0.1.fc30.x86_64 (me@foobar) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Sat Nov 9 18:47:45 CET 2019
...
> Nov 1000:26:12 foobar kernel: fb0: switching to inteldrmfb from EFI VGA
> Nov 1000:26:12 foobar kernel: Console: switching to colour dummy device 80x25
> Nov 1000:26:12 foobar kernel: i915 0000:00:02.0: vgaarb: deactivate vga console
> Nov 1000:26:12 foobar kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> Nov 1000:26:12 foobar kernel: [drm] Driver supports precise vblank timestamp query.
> Nov 1000:26:12 foobar kernel: i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
> Nov 1000:26:12 foobar kernel: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
> Nov 1000:26:12 foobar kernel: MXM: GUID detected in BIOS
> Nov 1000:26:12 foobar kernel: ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190816/nsarguments-59)
> Nov 1000:26:12 foobar kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190816/nsarguments-59)
> Nov 1000:26:12 foobar kernel: pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
> Nov 1000:26:12 foobar kernel: VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
> Nov 1000:26:12 foobar kernel: nouveau: detected PR support, will not use DSM
> Nov 1000:26:12 foobar kernel: nouveau 0000:01:00.0: enabling device (0106 -> 0107)
> Nov 1000:26:12 foobar kernel: nouveau 0000:01:00.0: NVIDIA TU116 (168000a1)
...
> Nov 1000:26:13 foobar kernel: [drm] Initialized i915 1.6.0 20190822 for 0000:00:02.0 on minor 0
> Nov 1000:26:13 foobar kernel: logitech-djreceiver 0003:046D:C52F.0002: hiddev96,hidraw1: USB HID v1.11 Device [Logitech USB Receiver] on usb-0000:00:14.0-1/input1
> Nov 1000:26:13 foobar kernel: logitech-djreceiver 0003:046D:C52F.0002: device of type QUAD or eQUAD (0x03) connected on slot 1
> Nov 1000:26:13 foobar kernel: input: Logitech Wireless Mouse PID:101f Mouse as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.1/0003:046D:C52F.0002/0003:046D:101F.0005/input/input17
> Nov 1000:26:13 foobar kernel: input: Logitech Wireless Mouse PID:101f Consumer Control as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.1/0003:046D:C52F.0002/0003:046D:101F.0005/input/input18
> Nov 1000:26:13 foobar kernel: hid-generic 0003:046D:101F.0005: input,hidraw4: USB HID v1.11 Mouse [Logitech Wireless Mouse PID:101f] on usb-0000:00:14.0-1/input1:1
> Nov 1000:26:13 foobar kernel: psmouse serio2: synaptics: queried max coordinates: x [..5658], y [..4722]
> Nov 1000:26:13 foobar kernel: ACPI: Video Device [PEGP] (multi-head: no  rom: yes  post: no)
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: bios: version 90.16.26.00.11
> Nov 1000:26:13 foobar kernel: input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:00/LNXVIDEO:00/input/input22
> Nov 1000:26:13 foobar kernel: ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
> Nov 1000:26:13 foobar kernel: input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:01/input/input23
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: fb: 6144 MiB GDDR6
> Nov 1000:26:13 foobar kernel: psmouse serio2: synaptics: queried min coordinates: x [1284..], y [1130..]
> Nov 1000:26:13 foobar kernel: psmouse serio2: synaptics: Your touchpad (PNP: PNP0f13) says it can support a different bus. If i2c-hid and hid-rmi are not used, you might want to try setting psmouse.synaptics_intertouch to 1 and report t>
> Nov 1000:26:13 foobar kernel: vga_switcheroo: enabled
> Nov 1000:26:13 foobar kernel: [TTM] Zone  kernel: Available graphics memory: 8047486 KiB
> Nov 1000:26:13 foobar kernel: [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
> Nov 1000:26:13 foobar kernel: [TTM] Initializing pool allocator
> Nov 1000:26:13 foobar kernel: [TTM] Initializing DMA pool allocator
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: VRAM: 6144 MiB
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: BIT table 'A' not found
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: BIT table 'L' not found
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB version 4.1
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 02002f52 00020010
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 04814f76 04600010
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 04814f72 00020010
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB conn 02: 00010261
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: DCB conn 04: 01000446
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: failed to create kernel channel, -22
> Nov 1000:26:13 foobar kernel: nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> Nov 1000:26:13 foobar kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> Nov 1000:26:13 foobar kernel: [drm] Driver supports precise vblank timestamp query.
> Nov 1000:26:13 foobar kernel: [drm] Cannot find any crtc or sizes
> Nov 1000:26:13 foobar kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
> Nov 1000:26:13 foobar kernel: [drm] Cannot find any crtc or sizes
> Nov 1000:26:13 foobar kernel: [drm] Cannot find any crtc or sizes
> Nov 1000:26:13 foobar kernel: psmouse serio2: synaptics: Touchpad model: 1, fw: 9.16, id: 0x1e2a1, caps: 0xf00123/0x840300/0x2e800/0x500000, board id: 3429, fw id: 2840755
> Nov 1000:26:13 foobar kernel: input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio2/input/input10
> Nov 1000:26:13 foobar kernel: fbcon: i915drmfb (fb0) is primary device
> Nov 1000:26:13 foobar kernel: Console: switching to colour frame buffer device 240x75
> Nov 1000:26:13 foobar kernel: i915 0000:00:02.0: fb0: i915drmfb frame buffer device


> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: fault 09 [PHYS_WRITE] at 000000017fef0000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 0d [REGION_VIOLATION] on channel -1 [0000000000 unknown]
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: fault 09 [PHYS_WRITE] at 000000017fef0000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 0d [REGION_VIOLATION] on channel -1 [0000000000 unknown]
> Nov 1000:26:28 foobar kernel: snd_hda_intel 0000:01:00.1: Disabling MSI
> Nov 1000:26:28 foobar kernel: snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 000000000028b000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 04 [UNBOUND_INST_BLOCK] on channel -1 [0000000000 unknown]
> Nov 1000:26:28 foobar kernel: ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs'
> Nov 1000:26:28 foobar kernel: thermal thermal_zone3: failed to read out thermal zone (-61)
> Nov 1000:26:28 foobar kernel: usb usb3: root hub lost power or was reset
> Nov 1000:26:28 foobar kernel: usb usb4: root hub lost power or was reset
> Nov 1000:26:28 foobar systemd-udevd[1143]: Using default interface naming scheme 'v240'.
> Nov 1000:26:28 foobar systemd-udevd[1143]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
> Nov 1000:26:28 foobar kernel: iwlwifi 0000:00:14.3 wlp0s20f3: renamed from wlan0
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:26:28 foobar wireless[1394]: setting regulatory domain to PL based on timezone (Europe/Warsaw)
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:26:28 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []

"nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []" happens number of times per second, "fifo: fault 01 [VIRT_WRITE]" once a few seconds.

On the Nvidia card state switch (here I opened a lid) I observe something like that:

> Nov 1000:27:20 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:27:20 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:27:20 foobar systemd-logind[1719]: Lid opened.
> Nov 1000:27:20 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): Allocate new frame buffer 3840x1200 stride
> Nov 1000:27:20 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> Nov 1000:27:20 foobar kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
...
> Nov 1000:27:21 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): EDID vendor "SAM", prod id 1415
> Nov 1000:27:21 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): Using hsync ranges from config file
...
> Nov 1000:27:41 foobar kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> Nov 1000:27:41 foobar kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> Nov 1000:27:44 foobar tracker-store[2907]: OK
> Nov 1000:27:44 foobar systemd[2370]: tracker-store.service: Succeeded.
> Nov 1000:27:47 foobar kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> Nov 1000:27:47 foobar kernel: ------------[ cut here ]------------
> Nov 1000:27:47 foobar kernel: nouveau 0000:01:00.0: timeout
> Nov 1000:27:47 foobar kernel: WARNING: CPU: 0 PID: 1085 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/0xe0 [nouveau]
> Nov 1000:27:47 foobar kernel: Modules linked in: ccm rfcomm xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrac>
> Nov 1000:27:47 foobar kernel:  libarc4 snd_hda_codec videobuf2_v4l2 btintel videobuf2_common iwlwifi kvm snd_hda_core snd_hwdep snd_seq snd_seq_device irqbypass videodev mei_hdcp bluetooth intel_cstate iTCO_wdt mc iTCO_vendor_support sn>
> Nov 1000:27:47 foobar kernel: CPU: 0 PID: 1085 Comm: kworker/0:4 Tainted: G           OE     5.4.0-0.rc6.git0.1.fc30.x86_64 #1
> Nov 1000:27:47 foobar kernel: Hardware name: Blue Technology Sp. z o.o. NH5_NH7/NH5_NH7, BIOS 1.07.03TBT 11/16/2018
> Nov 1000:27:47 foobar kernel: Workqueue: pm pm_runtime_work
> Nov 1000:27:47 foobar kernel: RIP: 0010:g84_bar_flush+0xcf/0xe0 [nouveau]
> Nov 1000:27:47 foobar kernel: Code: 8b 40 10 48 8b 78 10 4c 8b 6f 50 4d 85 ed 75 03 4c 8b 2f e8 33 05 f0 e7 4c 89 ea 48 c7 c7 a4 74 92 c0 48 89 c6 e8 3f b4 96 e7 <0f> 0b eb a7 e8 58 b1 96 e7 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> Nov 1000:27:47 foobar kernel: RSP: 0018:ffffaa40c06cb640 EFLAGS: 00010086
> Nov 1000:27:47 foobar kernel: RAX: 0000000000000000 RBX: ffff95f1d47dfc00 RCX: 0000000000000006
> Nov 1000:27:47 foobar kernel: RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff95f1e0217900
> Nov 1000:27:47 foobar kernel: RBP: ffff95f1dd6e6748 R08: 0000000000000001 R09: 00000000000016f2
> Nov 1000:27:47 foobar kernel: R10: 000000000000cc44 R11: 0000000000000003 R12: 0000000000000246
> Nov 1000:27:47 foobar kernel: R13: ffff95f1dcd96050 R14: 0000000000000000 R15: ffff95f18c17a0c0
> Nov 1000:27:47 foobar kernel: FS:  0000000000000000(0000) GS:ffff95f1e0200000(0000) knlGS:0000000000000000
> Nov 1000:27:47 foobar kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Nov 1000:27:47 foobar kernel: CR2: 00007f792a718e20 CR3: 000000040a60a003 CR4: 00000000003606f0
> Nov 1000:27:47 foobar kernel: Call Trace:
> Nov 1000:27:47 foobar kernel:  nv50_instobj_release+0x2f/0xc0 [nouveau]
> Nov 1000:27:47 foobar kernel:  nvkm_vmm_iter.constprop.0+0x2bc/0x810 [nouveau]
> Nov 1000:27:47 foobar kernel:  ? gp100_vmm_join+0x20/0x20 [nouveau]
> Nov 1000:27:47 foobar kernel:  nvkm_vmm_map+0x136/0x360 [nouveau]
> Nov 1000:27:47 foobar kernel:  ? gp100_vmm_join+0x20/0x20 [nouveau]
> Nov 1000:27:47 foobar kernel:  nvkm_mem_map_dma+0x56/0x80 [nouveau]
> Nov 1000:27:47 foobar kernel:  nvkm_uvmm_mthd+0x66a/0x780 [nouveau]
> Nov 1000:27:47 foobar kernel:  nvkm_ioctl+0xde/0x180 [nouveau]
> Nov 1000:27:47 foobar kernel:  nvif_object_mthd+0x104/0x130 [nouveau]
> Nov 1000:27:47 foobar kernel:  nvif_vmm_map+0x115/0x130 [nouveau]
> Nov 1000:27:47 foobar kernel:  nouveau_mem_map+0x8d/0x100 [nouveau]
> Nov 1000:27:47 foobar kernel:  nouveau_vma_map+0x44/0x70 [nouveau]
> Nov 1000:27:47 foobar kernel:  nouveau_bo_move_ntfy+0xcd/0xe0 [nouveau]
> Nov 1000:27:47 foobar kernel:  ttm_bo_handle_move_mem+0xd2/0x5a0 [ttm]
> Nov 1000:27:47 foobar kernel:  ttm_bo_evict+0x16f/0x1f0 [ttm]
> Nov 1000:27:47 foobar kernel:  ? __drm_legacy_pci_free+0x66/0x90 [drm]
> Nov 1000:27:47 foobar kernel:  ttm_mem_evict_first+0x273/0x360 [ttm]
> Nov 1000:27:47 foobar kernel:  ttm_bo_force_list_clean+0xa4/0x170 [ttm]
> Nov 1000:27:47 foobar kernel:  nouveau_do_suspend+0x80/0x170 [nouveau]
> Nov 1000:27:47 foobar kernel:  nouveau_pmops_runtime_suspend+0x40/0xa0 [nouveau]
> Nov 1000:27:47 foobar kernel:  pci_pm_runtime_suspend+0x58/0x140
> Nov 1000:27:47 foobar kernel:  ? __switch_to_asm+0x40/0x70
> Nov 1000:27:47 foobar kernel:  ? pci_pm_thaw_noirq+0xa0/0xa0
> Nov 1000:27:47 foobar kernel:  __rpm_callback+0xc1/0x140
> Nov 1000:27:47 foobar kernel:  ? pci_pm_thaw_noirq+0xa0/0xa0
> Nov 1000:27:47 foobar kernel:  rpm_callback+0x1f/0x70
> Nov 1000:27:47 foobar kernel:  rpm_suspend+0x10a/0x5a0
> Nov 1000:27:47 foobar kernel:  ? __switch_to_asm+0x34/0x70
> Nov 1000:27:47 foobar kernel:  pm_runtime_work+0x86/0x90
> Nov 1000:27:47 foobar kernel:  process_one_work+0x1b0/0x350
> Nov 1000:27:47 foobar kernel:  worker_thread+0x50/0x3b0
> Nov 1000:27:47 foobar kernel:  kthread+0xfb/0x130
> Nov 1000:27:47 foobar kernel:  ? process_one_work+0x350/0x350
> Nov 1000:27:47 foobar kernel:  ? kthread_park+0x90/0x90
> Nov 1000:27:47 foobar kernel:  ret_from_fork+0x35/0x40
> Nov 1000:27:47 foobar kernel: ---[ end trace e70ebf987c8ad925 ]---
> Nov 1000:27:47 foobar kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> Nov 1000:27:47 foobar kernel: ------------[ cut here ]------------
> Nov 1000:27:47 foobar kernel: nouveau 0000:01:00.0: timeout
> Nov 1000:27:47 foobar kernel: WARNING: CPU: 0 PID: 1085 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmtu102.c:44 tu102_vmm_flush+0x128/0x140 [nouveau]
> Nov 1000:27:47 foobar kernel: Modules linked in: ccm rfcomm xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrac>
> Nov 1000:27:47 foobar kernel:  libarc4 snd_hda_codec videobuf2_v4l2 btintel videobuf2_common iwlwifi kvm snd_hda_core snd_hwdep snd_seq snd_seq_device irqbypass videodev mei_hdcp bluetooth intel_cstate iTCO_wdt mc iTCO_vendor_support sn>
> Nov 1000:27:47 foobar kernel: CPU: 0 PID: 1085 Comm: kworker/0:4 Tainted: G        W  OE     5.4.0-0.rc6.git0.1.fc30.x86_64 #1
> Nov 1000:27:47 foobar kernel: Hardware name: Blue Technology Sp. z o.o. NH5_NH7/NH5_NH7, BIOS 1.07.03TBT 11/16/2018
> Nov 1000:27:47 foobar kernel: Workqueue: pm pm_runtime_work
> Nov 1000:27:47 foobar kernel: RIP: 0010:tu102_vmm_flush+0x128/0x140 [nouveau]
> Nov 1000:27:47 foobar kernel: Code: 8b 40 10 48 8b 78 10 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 ca 19 eb e7 4c 89 e2 48 c7 c7 dc 8e 92 c0 48 89 c6 e8 d6 c8 91 e7 <0f> 0b eb aa e8 ef c5 91 e7 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> Nov 1000:27:47 foobar kernel: RSP: 0018:ffffaa40c06cb678 EFLAGS: 00010286
> Nov 1000:27:47 foobar kernel: RAX: 0000000000000000 RBX: ffff95f1d47dfc00 RCX: 0000000000000006
> Nov 1000:27:47 foobar kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff95f1e0217900
> Nov 1000:27:47 foobar kernel: RBP: ffff95f1dce73220 R08: 0000000000000001 R09: 000000000000172b
> Nov 1000:27:47 foobar kernel: R10: 000000000000e120 R11: 0000000000000003 R12: ffff95f1dcd96050
> Nov 1000:27:47 foobar kernel: R13: ffff95f1d457f200 R14: 0000000000000000 R15: ffff95f18c17a0c0
> Nov 1000:27:47 foobar kernel: FS:  0000000000000000(0000) GS:ffff95f1e0200000(0000) knlGS:0000000000000000
> Nov 1000:27:47 foobar kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Nov 1000:27:47 foobar kernel: CR2: 00007f792a718e20 CR3: 000000040a60a003 CR4: 00000000003606f0
> Nov 1000:27:47 foobar kernel: Call Trace:
> Nov 1000:27:47 foobar kernel:  nvkm_vmm_iter.constprop.0+0x34b/0x810 [nouveau]
> Nov 1000:27:47 foobar kernel:  ? gp100_vmm_join+0x20/0x20 [nouveau]

repeated (the stacktrace) a few times. Attached in a more complete form. The "" are no longer visible.

On the consecutive laptop lid close the video hangs) - the music is still playing, but caps lock doesn't turn on a led on my keyboard). In logs, after a call trace I see "kernel: [TTM] Buffer eviction failed":

> Nov 1000:28:14 foobar systemd-logind[1719]: Lid closed.
...
> Nov 1000:28:15 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): Allocate new frame buffer 1920x1200 stride
> Nov 1000:28:16 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): EDID vendor "CMN", prod id 5608
> Nov 1000:28:16 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): Printing DDC gathered Modelines:
> Nov 1000:28:16 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): Modeline "1920x1080"x0.0  152.84  1920 2000 2054 2250  1080 1086 1094 1132 -hsync -vsync (67.9 kHz eP)
> Nov 1000:28:16 foobar /usr/libexec/gdm-x-session[2441]: (II) modeset(0): EDID vendor "SAM", prod id 1415
...
> Nov 1000:28:18 foobar kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> Nov 1000:28:18 foobar kernel: ------------[ cut here ]------------
> Nov 1000:28:18 foobar kernel: nouveau 0000:01:00.0: timeout
> Nov 1000:28:18 foobar kernel: WARNING: CPU: 0 PID: 1085 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/0xe0 [nouveau]
> Nov 1000:28:18 foobar kernel: Modules linked in: ccm rfcomm xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrac>
> Nov 1000:28:18 foobar kernel:  libarc4 snd_hda_codec videobuf2_v4l2 btintel videobuf2_common iwlwifi kvm snd_hda_core snd_hwdep snd_seq snd_seq_device irqbypass videodev mei_hdcp bluetooth intel_cstate iTCO_wdt mc iTCO_vendor_support sn>
> Nov 1000:28:18 foobar kernel: CPU: 0 PID: 1085 Comm: kworker/0:4 Tainted: G        W  OE     5.4.0-0.rc6.git0.1.fc30.x86_64 #1
> Nov 1000:28:18 foobar kernel: Hardware name: Blue Technology Sp. z o.o. NH5_NH7/NH5_NH7, BIOS 1.07.03TBT 11/16/2018
> Nov 1000:28:18 foobar kernel: Workqueue: pm pm_runtime_work
> Nov 1000:28:18 foobar kernel: RIP: 0010:g84_bar_flush+0xcf/0xe0 [nouveau]
...
> Nov 1000:28:18 foobar kernel: ---[ end trace e70ebf987c8ad92c ]---
> Nov 1000:28:18 foobar kernel: [TTM] Buffer eviction failed
> Nov 1000:28:19 foobar abrt-dump-journal-oops[1695]: abrt-dump-journal-oops: Found oopses: 2
> Nov 1000:28:19 foobar abrt-dump-journal-oops[1695]: abrt-dump-journal-oops: Creating problem directories
> Nov 1000:28:19 foobar abrt-server[10758]: Package 'kernel-core' isn't signed with proper key
> Nov 1000:28:19 foobar abrt-server[10758]: 'post-create' on '/var/spool/abrt/oops-2019-11-10-00:28:19-1695-0' exited with 1
> Nov 1000:28:19 foobar abrt-server[10758]: Deleting problem directory '/var/spool/abrt/oops-2019-11-10-00:28:19-1695-0'
> Nov 1000:28:20 foobar abrt-server[10761]: Package 'kernel-core' isn't signed with proper key
> Nov 1000:28:20 foobar abrt-server[10761]: 'post-create' on '/var/spool/abrt/oops-2019-11-10-00:28:19-1695-1' exited with 1
> Nov 1000:28:20 foobar abrt-server[10761]: Deleting problem directory '/var/spool/abrt/oops-2019-11-10-00:28:19-1695-1'
> Nov 1000:28:21 foobar abrt-dump-journal-oops[1695]: Reported 2 kernel oopses to Abrt
> Nov 1000:28:33 foobar kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> Nov 1000:28:33 foobar kernel: BUG: unable to handle page fault for address: ffffaa41c0386ffc
> Nov 1000:28:33 foobar kernel: #PF: supervisor write access in kernel mode
> Nov 1000:28:33 foobar kernel: #PF: error_code(0x0002) - not-present page
> Nov 1000:28:33 foobar kernel: PGD 45e550067 P4D 45e550067 PUD 0 
> Nov 1000:28:33 foobar kernel: Oops: 0002 [#1] SMP PTI
> Nov 1000:28:33 foobar kernel: CPU: 0 PID: 1085 Comm: kworker/0:4 Tainted: G        W  OE     5.4.0-0.rc6.git0.1.fc30.x86_64 #1
> Nov 1000:28:33 foobar kernel: Hardware name: Blue Technology Sp. z o.o. NH5_NH7/NH5_NH7, BIOS 1.07.03TBT 11/16/2018
> Nov 1000:28:33 foobar kernel: Workqueue: pm pm_runtime_work
> Nov 1000:28:33 foobar kernel: RIP: 0010:evo_wait+0x5a/0x130 [nouveau]
...
> Nov 1000:28:40 foobar gsd-xsettings[2815]: Failed to get current display configuration state: Timeout was reached


> $ lspci | grep VGA
> 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile)
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU116M [GeForce GTX 1660 Ti Mobile] (rev a1)


I've seen similar issue, however, in that case it is a regression - it worked fine with kernel 5.2 (tested since RC1 to 5.2.18) and it's broken in 5.3 (tested with 5.3.1 to 5.3.8 and 5.4.0-rc6).

I'm not sure which commit broke it (building the kernel takes some time), but having some candidates I could try to verify before/after it occurs or not.


Btw, I'm looking for potential workarounds (better than sticking to 5.2.18). I don't use NVidia to render the output, so I could blacklist nouveau and use bbswitch to keep NVidia card off. However, it would make testing newer kernel versions somehow harder. Maybe I can disable something in nouveau to keep the card off and still do not suffer from the errors above?
Comment 1 Marcin Zajaczkowski 2019-11-10 00:08:10 UTC
Created attachment 145928 [details]
Logs when I open and later on close a lid
Comment 2 Martin Peres 2019-12-04 09:55:09 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.