Bug 92922

Summary:	[NVE6] Xorg random freeze
Product:	xorg	Reporter:	Antonios Karagiannis <antokarag>
Component:	Driver/nouveau	Assignee:	Nouveau Project <nouveau>
Status:	RESOLVED MOVED	QA Contact:	Xorg Project Team <xorg-team>
Severity:	major
Priority:	medium	CC:	antokarag, davispuh, fdsfgs, nheghathivhistha, pytnik89
Version:	unspecified
Hardware:	x86-64 (AMD64)
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:

Description Antonios Karagiannis 2015-11-12 18:01:37 UTC

On Fedora (Gnome) Workstation 23, with the following:
xorg-x11-server-Xorg.x86_64   1.18.0-1.fc23
xorg-x11-drv-nouveau.x86_64   1:1.0.12-0.3.fc23
and
NVIDIA Corporation GK106 [GeForce GTX 660] (rev a1)

At random times but suspecting Google Chrome may be causing it, 
or at least causing it to happen more easily.

The screen freezes. I have intentionally added the seconds at the Gnome Clock to see if they will keep changing during the freeze but all the times, they freeze as well.

The mouse moves and usually but not always, the keyboard works as well (caps lock lights, etc.) few times I could even switch session into terminal.

From the system journal I get:
kernel: nouveau E[  PGRAPH][0000:02:00.0] TRAP ch 9 [0x023f566000 gnome-shell[2271]]
kernel: nouveau E[  PGRAPH][0000:02:00.0] GPC0/PROP trap: ZETA_STORAGE_TYPE_MISMATCH
kernel: nouveau E[  PGRAPH][0000:02:00.0] x = 968, y = 136, format = 0, storage type = fe
kernel: nouveau E[  PGRAPH][0000:02:00.0] GPC1/PROP trap: ZETA_STORAGE_TYPE_MISMATCH
kernel: nouveau E[  PGRAPH][0000:02:00.0] x = 1016, y = 104, format = 0, storage type = fe
kernel: nouveau E[   PFIFO][0000:02:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]
kernel: nouveau E[   PFIFO][0000:02:00.0] PGRAPH engine fault on channel 10, recovering...

or just:
kernel: nouveau E[   PFIFO][0000:02:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]
kernel: nouveau E[   PFIFO][0000:02:00.0] PGRAPH engine fault on channel 10, recovering...

and this gets repeated allot:
kernel: nouveau E[  PGRAPH][0000:02:00.0] TRAP ch 9 [0x023f566000 gnome-shell[2224]]
kernel: nouveau E[  PGRAPH][0000:02:00.0] GPC1/PROP trap: ZETA_STORAGE_TYPE_MISMATCH
kernel: nouveau E[  PGRAPH][0000:02:00.0] x = 576, y = 512, format = 0, storage type = fe

tried with wayland gnome session and it also happens:
kernel: nouveau E[  PGRAPH][0000:02:00.0] TRAP ch 10 [0x023f315000 Xwayland[5816]]
kernel: nouveau E[  PGRAPH][0000:02:00.0] GPC2/TPC0/MP trap: INVALID_OPCODE


Please keep in mind that the GPU is ok,
I have been using it with Windows, without any issues...

I plan to find another PC and try to SSH during the freeze, once I do, I will post an update. Until then, the above log may be of useful to you.

Comment 1 Ilia Mirkin 2015-11-12 18:09:59 UTC

Actually the version of mesa is the most relevant thing here. Looks like we're somehow feeding a color surface to a zeta endpoint? That shouldn't happen :) I'd believe the opposite... but I don't see how this happens.

[ zeta = depth/stencil buffer. storage type = 0xfe == color storage type fallback ]

Unfortunate that you're getting this with gnome-shell... would be nice if you could repro it with something that's easier to trace. If you're open to building your own mesa lib, I could hack up a patch that would assert that we're sticking zeta surfaces into the zeta endpoint, which would make any offenders die at the point of offence.

The INVALID_OPCODE thing is unrelated -- that sounds like good ol' bad code generated by our shader compiler, or perhaps some sort of confusion in which code gets uploaded where. Not entirely unheard of, but haven't had one of those in a while.

Comment 2 Antonios Karagiannis 2015-11-12 18:25:49 UTC

Installed Packages
mesa-dri-drivers.x86_64      11.0.4-1.20151105.fc23
mesa-filesystem.x86_64       11.0.4-1.20151105.fc23
mesa-libEGL.i686             11.0.4-1.20151105.fc23
mesa-libEGL.x86_64           11.0.4-1.20151105.fc23
mesa-libGL.i686              11.0.4-1.20151105.fc23
mesa-libGL.x86_64            11.0.4-1.20151105.fc23
mesa-libGLES.x86_64          11.0.4-1.20151105.fc23
mesa-libGLU.x86_64           9.0.0-9.fc23
mesa-libgbm.i686             11.0.4-1.20151105.fc23
mesa-libgbm.x86_64           11.0.4-1.20151105.fc23
mesa-libglapi.i686           11.0.4-1.20151105.fc23
mesa-libglapi.x86_64         11.0.4-1.20151105.fc23
mesa-libwayland-egl.x86_64   11.0.4-1.20151105.fc23
mesa-libxatracker.x86_64     11.0.4-1.20151105.fc23

I hope this clarifies the mesa versions installed.

Now as for building the custom mesa lib, I haven't done that before.
If it's just ./configure && make && make install, I just might give it a try ;-)

Comment 3 Ilia Mirkin 2015-11-12 18:45:55 UTC

You can try the patch below and see if it ever triggers. Make sure to build with --enable-debug (otherwise you won't get the assert). You'll also probably want --enable-texture-float (I'd check how your distro builds it and do the same thing).

Of course note that if gnome-shell is hitting this... the assert will happen in gnome-session and it will crash :) If you could run it in gdb and poke around at things when it does crash that'd be useful.

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
index 205e7dc..d7ee0763 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
@@ -150,6 +150,7 @@ nvc0_validate_fb(struct nvc0_context *nvc0)
         struct nv50_surface *sf = nv50_surface(fb->zsbuf);
         int unk = mt->base.base.target == PIPE_TEXTURE_2D;
 
+        assert(nouveau_bo_memtype(nv04_resource(fb->zsbuf->texture)->bo) < 0xd0);
         BEGIN_NVC0(push, NVC0_3D(ZETA_ADDRESS_HIGH), 5);
         PUSH_DATAh(push, mt->base.address + sf->offset);
         PUSH_DATA (push, mt->base.address + sf->offset);

Comment 4 Julien HENRY 2016-04-05 07:56:32 UTC

It seems I have a similar issue but on kwin. I was also scrolling in Chrome when the problem occurred (screen was frozen, I had to reboot):

Logs at the time of crash were filled with tons of:
kernel: nouveau 0000:01:00.0: gr: TRAP ch 8 [007f439000 kwin_x11[1624]]
avril 05 09:39:21 localhost.localdomain kernel: nouveau 0000:01:00.0: gr: GPC1/PROP trap: 00000080 [ZETA_STORAGE_TYPE_MISMATCH] x = 1936, y = 1392, format =
kernel: nouveau 0000:01:00.0: gr: TRAP ch 8 [007f439000 kwin_x11[1624]]

(journal file size is 160Mb)

mesa-dri-drivers-11.1.0-2.20151218.fc23.x86_64

Comment 5 Dāvis 2016-05-20 16:39:39 UTC

I might be hitting this same bug with Nvidia GTX 650 Ti on Arch Linux using released kernel 4.6, xf86-video-nouveau 1.0.12 and mesa 11.2.2

When using GNOME Shell and Chromium

[ 3589.722408] nouveau 0000:01:00.0: fifo: FB_FLUSH_TIMEOUT
[ 3672.824102] nouveau 0000:01:00.0: fifo: read fault at 0002708000 engine 00 [GR] client 0f [GPC0/PROP_0] reason 02 [PTE] on channel 5 [003f8aa000 gnome-shell[4706]]
[ 3672.824111] nouveau 0000:01:00.0: fifo: gr engine fault on channel 5, recovering...
[ 3672.824126] nouveau 0000:01:00.0: gr: TRAP ch 5 [003f8aa000 gnome-shell[4706]]
[ 3672.824139] nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000080 [ZETA_STORAGE_TYPE_MISMATCH] x = 64, y = 56, format = 0, storage type = 0
[ 3672.824151] nouveau 0000:01:00.0: gr: GPC1/PROP trap: 00000080 [ZETA_STORAGE_TYPE_MISMATCH] x = 48, y = 56, format = 0, storage type = 0


maybe it's related to bug #93629 which also happens sometimes.

Comment 6 Victor Zhavoronkov 2016-07-22 08:22:00 UTC

Seems that I am hitting same bug while using Opera browser on my Fedora 24 Workstation (GNOME) with NVidia GTX 650 Ti. Desktop freezes completely, keyboard and mouse are not responding.

Part of journald log:

Jul 22 10:58:15 localhost.localdomain kernel: nouveau 0000:01:00.0: gr: TRAP ch 9 [003f548000 gnome-shell[1489]]
Jul 22 10:58:15 localhost.localdomain kernel: nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000080 [ZETA_STORAGE_TYPE_MISMATCH] x = 128, y = 64, format = 0, storage type = fe
Jul 22 10:58:15 localhost.localdomain kernel: nouveau 0000:01:00.0: gr: GPC1/PROP trap: 00000080 [ZETA_STORAGE_TYPE_MISMATCH] x = 80, y = 64, format = 0, storage type = fe

Package versions:

kernel - 4.6.3
mesa - 12.0.1
xorg-x11-drv-nouveau - 1.0.12

Comment 7 David Kredba 2017-01-18 19:24:00 UTC

Is this related please?

[44058.076846] nouveau 0000:01:00.0: fb: trapped write at 0041c27000 on channel 9 [0f1d4000 vlc[14761]] engine 00 [PGRAPH] client 0b [PROP] subclient 00 [RT0] reason 00000002 [PAGE_NOT_PRESENT]                                                                                                                           
[44058.096561] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00000040 [RT_FAULT] - Address 0041c27000                                                          
[44058.096564] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 00000000, e20: 00002e00, e24: 00030000                         
[44058.096574] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - 00000040 [RT_FAULT] - Address 0041c27000                                                          
[44058.096576] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - e0c: 00000000, e18: 00000000, e1c: 00000010, e20: 00002e00, e24: 00030000                         
[44058.096584] nouveau 0000:01:00.0: gr: 00200000 [] ch 9 [000f1d4000 vlc[14761]] subc 3 class 8297 mthd 1224 data 00000001                                   
[44058.096600] nouveau 0000:01:00.0: fb: trapped write at 0041c27000 on channel 9 [0f1d4000 vlc[14761]] engine 00 [PGRAPH] client 0b [PROP] subclient 00 [RT0] reason 00000002 [PAGE_NOT_PRESENT]

I am getting random freezes in different web browsers. During the freeze is killing different kde plasma processes possible but killing xorg-server after from ssh is not possible. I try will the assertin patch.

After switched vlc to use a non-accelerated video output I got bunch of above messages.

media-libs/mesa-13.0.3
xf86-video-nouveau-1.0.13

mpv never caused the same type of crash, profile chosen is opengl-hq.

Comment 8 Ivan 2017-07-14 08:15:36 UTC

On Arch 4.11.9-1-ARCH x86_64 with the following:
xorg-server 1.19.3-2
xf86-video-nouveau 1.0.15-1 
(realy does't matter because of, this issue i have for a long time)
GK106 [GeForce GTX 650 Ti]
At random times but suspecting skypeforlinux-bin package (some times it's going to core dump and do not produce any log in journal, some times it produce this kind of logs): 

nouveau 0000:01:00.0: fifo: gr engine fault on channel 6, recovering...
nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
nouveau 0000:01:00.0: gr: GPC2/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp
nouveau 0000:01:00.0: gr: GPC1/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp
nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp
nouveau 0000:01:00.0: gr: TRAP ch 5 [003f8aa000 compton[470]]

nouveau 0000:01:00.0: compton[22165]: channel 5 killed!
nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 40912c [ IBUS ]
nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
nouveau 0000:01:00.0: fifo: channel 5: killed
nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
nouveau 0000:01:00.0: gr: GPC1/PROP trap: 00000080 [ZETA_STORAGE_TYPE_MISMATCH] x = 848,
nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000080 [ZETA_STORAGE_TYPE_MISMATCH] x = 832,
nouveau 0000:01:00.0: gr: TRAP ch 5 [003f8aa000 compton[22165]]

I think this things are related with this bug.

Comment 9 Martin Peres 2019-12-04 09:06:24 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/234.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.