Bug 96876

Summary: system freeze "fifo: gr engine fault on channel 6" NVIDIA
Product: xorg Reporter: Ehsan Azar <dashesy>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: NEW --- QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: bjorn.lie
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
acpidump
none
dmesg none

Description Ehsan Azar 2016-07-09 23:31:24 UTC
Created attachment 124976 [details]
acpidump

I saw some similarity with bug #93629, but this is not related to any SCHED_ERROR. Also it is very often, usually chrome triggering it. 
I had problem with this GC, turning off the monitors on startup, but that problem is now fixed in the latest kernel.

$ journalctl --no-pager -b -2 -p err
-- Logs begin at Thu 2016-05-05 05:39:48 PDT, end at Sat 2016-07-09 16:10:13 PDT. --
Jul 04 02:15:01 dashesy.wavelet kernel: nouveau 0000:01:00.0: priv: HUB0: 086014 ffffffff (1f70820c)
Jul 04 09:15:32 dashesy.wavelet mcelog[877]: Family 6 Model 5e CPU: only decoding architectural errors
Jul 04 09:15:33 dashesy.wavelet bluetoothd[873]: Failed to obtain handles for "Service Changed" characteristic
Jul 04 09:16:01 dashesy.wavelet spice-vdagent[1546]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
Jul 04 09:17:36 dashesy.wavelet spice-vdagent[2064]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
Jul 04 12:24:41 dashesy.wavelet kernel: nouveau 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
Jul 09 09:59:48 dashesy.wavelet kernel: nouveau 0000:01:00.0: fifo: read fault at 0000260000 engine 00 [GR] client 01 [GPC0/T1_0] reason 02 [PTE] on channel 6 [007fb61000 systemd-logind[859]]
Jul 09 09:59:48 dashesy.wavelet kernel: nouveau 0000:01:00.0: fifo: gr engine fault on channel 6, recovering...
Jul 09 09:59:48 dashesy.wavelet kernel: nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[859]]
Jul 09 09:59:48 dashesy.wavelet kernel: nouveau 0000:01:00.0: gr: GPC0/TPC0/TEX: 80000049
Jul 09 09:59:48 dashesy.wavelet kernel: nouveau 0000:01:00.0: gr: GPC0/TPC1/TEX: 80000049


$ lspci -v
01:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 730] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Dell Device 1083
	Flags: bus master, fast devsel, latency 0, IRQ 126
	Memory at de000000 (32-bit, non-prefetchable) [size=16M]
	Memory at d0000000 (64-bit, prefetchable) [size=128M]
	Memory at d8000000 (64-bit, prefetchable) [size=32M]
	I/O ports at e000 [size=128]
	Expansion ROM at df000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
	Kernel modules: nouveau

$ uname -a
Linux dashesy.wavelet 4.5.7-202.fc23.x86_64 #1 SMP Tue Jun 28 18:22:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


This is a complete hang, and seems to happen quite enough, I will gladly run any kernel/driver combination to sort this out.
Comment 1 Ilia Mirkin 2016-07-09 23:38:45 UTC
Could you include a full dmesg? Also are you using modesetting or nouveau ddx? If modesetting, please switch to nouveau.

Also, kernel 4.6 received a number of "various" fixes which could affect your situation. Or they might not, unfortunately it's incredibly hard to diagnose, but it's definitely an easy thing to try.
Comment 2 Ehsan Azar 2016-07-09 23:49:33 UTC
I use vanilla Fedora 23 with nouveau. Thanks, will try 4.6 from rawhide.
Comment 3 Ehsan Azar 2016-07-09 23:52:47 UTC
Created attachment 124977 [details]
dmesg
Comment 4 Ehsan Azar 2016-07-23 16:52:59 UTC
I finally have time today to try new kernel. But I just got this (no freeze, just chrome got broken) and thought might be helpful.

[  685.293359] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293368] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 128, format = 0, storage type = 0
[  685.293401] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293408] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 136, format = 0, storage type = 0
[  685.293488] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293494] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 144, format = 0, storage type = 0
[  685.293501] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293506] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 152, format = 0, storage type = 0
[  685.293521] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293526] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 160, format = 0, storage type = 0
[  685.293540] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293545] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 168, format = 0, storage type = 0
[  685.293557] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293563] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 176, format = 0, storage type = 0
[  685.293575] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293581] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 184, format = 0, storage type = 0
[  685.293594] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293620] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 192, format = 0, storage type = 0
[  685.293638] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293645] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 208, format = 0, storage type = 0
[  685.293660] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293666] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 216, format = 0, storage type = 0
[  685.293701] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293707] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 224, format = 0, storage type = 0
[  685.293720] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293726] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 232, format = 0, storage type = 0
[  685.293735] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293741] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 240, format = 0, storage type = 0
[  685.293753] nouveau 0000:01:00.0: gr: TRAP ch 6 [007fb61000 systemd-logind[904]]
[  685.293758] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000200 [] x = 896, y = 248, format = 0, storage type = 0
[127998.748786] perf interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
Comment 5 Ehsan Azar 2016-07-23 17:31:08 UTC
Ok I am now running:

$ uname -a
Linux dashesy.wavelet 4.6.4-301.fc24.x86_64 #1 SMP Tue Jul 12 11:50:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


Is this latest enough?
I am waiting to see if lockup happens again, but I am afraid I still see the same kind of precursor:

$ journalctl --no-pager -b -p err
-- Logs begin at Thu 2016-05-05 05:39:48 PDT, end at Sat 2016-07-23 10:26:44 PDT. --
Jul 23 03:23:01 dashesy.wavelet kernel: nouveau 0000:01:00.0: priv: HUB0: 085014 ffffffff (1a70820b)
Jul 23 10:23:12 dashesy.wavelet kernel: tpm_crb MSFT0101:00: can't request region for resource [mem 0xfed40040-0xfed4103f]
Jul 23 10:23:13 dashesy.wavelet mcelog[865]: Family 6 Model 5e CPU: only decoding architectural errors
Jul 23 10:23:15 dashesy.wavelet bluetoothd[869]: Failed to obtain handles for "Service Changed" characteristic
Jul 23 10:23:41 dashesy.wavelet spice-vdagent[1493]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
Jul 23 10:23:56 dashesy.wavelet spice-vdagent[1985]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
Jul 23 10:23:58 dashesy.wavelet pulseaudio[2107]: [pulseaudio] pid.c: Daemon already running.
Comment 6 Ehsan Azar 2016-07-24 15:28:35 UTC
Same lockup happens with 4.6.4, should I try some newer kernel?


Jul 23 21:50:16 dashesy.wavelet kernel: nouveau 0000:01:00.0: fifo: write fault at 0000260000 engine 00 [GR] client 0f [GPC0/PROP_0] reason 02 [PTE] on channel 6 [007fb71000 systemd-logind[867]]
Jul 23 21:50:16 dashesy.wavelet kernel: nouveau 0000:01:00.0: fifo: gr engine fault on channel 6, recovering...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.