Xorg hangs randomly with nouveau driver. It could be reproduced cometimes when playing video or starting libreoffice, but not limited to. If pressing Ctrl+Alt+Backspace, monitor goes to sleep immediately. Alt+Sysrq combinations are usually working and also ssh.
System journal contains:
kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 16 [soffice.bin] subc 5 mthd 0184 data beef0201
... and many similar lines with changing numbers after subc, mthd and data.
/usr/libexec/gdm-x-session: QXcbConnection: XCB error: 3 (BadWindow), sequence: 55765, resource id: 100663298, major code: 18 (ChangeProperty), minor code: 0
kernel: nouveau 0000:01:00.0: gr: TRAP_CCACHE 00000001 [FAULT]
kernel: nouveau 0000:01:00.0: gr: TRAP_CCACHE 000e0080 00000000 00000000 00000000 00000000 00000004 00000000
kernel: nouveau 0000:01:00.0: gr: 00200000  ch 16 [001eb0f000 soffice.bin] subc 3 class 8597 mthd 13bc data 00000054
kernel: nouveau 0000:01:00.0: fb: trapped read at 002027ff00 on channel 16 [1eb0f000 soffice.bin] engine 00 [PGRAPH] client 05 [CCACHE] subclient 00 [CB] reason 00.......
kernel: nouveau 0000:01:00.0: gr: PGRAPH TLB flush idle timeout fail
kernel: nouveau 0000:01:00.0: gr: PGRAPH_STATUS 00000503 [BUSY DISPATCH CTXPROG CCACHE_PREGEOM]
kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS0: 00000008 [CCACHE]
kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS1: 00000000 
kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS2: 00000000 
(EE) [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
(EE) 0: /usr/libexec/Xorg (mieqEnqueue+0x253) [0x578753]
(EE) 1: /usr/libexec/Xorg (QueuePointerEvents+0x52) [0x44f352]
(EE) 2: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x30eb) [0x7f1f83f13dfb]
(EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x3855) [0x7f1f83f15035]
(EE) 4: /usr/libexec/Xorg (DPMSSupported+0xe8) [0x4769c8]
(EE) 5: /usr/libexec/Xorg (xf86SerialModemClearBits+0x2b2) [0x49fe62]
(EE) 6: /lib64/libc.so.6 (__restore_rt+0x0) [0x7f1f8df6fb1f]
(EE) 7: /lib64/libc.so.6 (ioctl+0x5) [0x7f1f8e033705]
(EE) 8: /lib64/libdrm.so.2 (drmIoctl+0x28) [0x7f1f8f32f508]
(EE) 9: /lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x7f1f8f33208b]
(EE) 10: /lib64/libdrm_nouveau.so.2 (nouveau_bo_wait+0xbc) [0x7f1f88a2637c]
(EE) 11: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (_init+0x75f9) [0x7f1f88c3ed19]
(EE) 12: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (_init+0x801d) [0x7f1f88c400ed]
(EE) 13: /usr/libexec/Xorg (DRI2SwapBuffers+0x1c8) [0x569268]
(EE) 14: /usr/libexec/Xorg (DRI2GetParam+0xb7c) [0x56ae0c]
(EE) 15: /usr/libexec/Xorg (SendErrorToClient+0x2df) [0x4369bf]
(EE) 16: /usr/libexec/Xorg (remove_fs_handlers+0x453) [0x43a9e3]
(EE) 17: /lib64/libc.so.6 (__libc_start_main+0xf0) [0x7f1f8df5b580]
(EE) 18: /usr/libexec/Xorg (_start+0x29) [0x424ce9]
(EE) 19: ? (?+0x29) [0x29]
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
(EE) [mi] mieq is *NOT* the cause. It is a victim.
(EE) [mi] EQ overflow continuing. 100 events have been dropped.
What hardware? What kernel version? What mesa version?
Just forgot to mention, I am running xorg-x11-drv-nouveau-1.0.12-1.fc23.x86_64
version (latest from fedora 23 repository).
01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. EN210 SILENT
Flags: bus master, fast devsel, latency 0, IRQ 45
Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at e000 [size=128]
Expansion ROM at fe000000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: nouveau
Kernel modules: nouveau
Linux marek.grepo.lan 4.3.3-301.fc23.x86_64 #1 SMP Fri Jan 15 14:03:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
It is the truth the problem arised after mesa or kernel update, not after Xorg update. I am not sure but I give higher probability to mesa. Kernel updated on Jan 12-th (to 4.3.3-300 which I was running when problem arised first time), mesa on Jan 16-th. I am sure I had not this problem before Jan 12-th. I am not sure whether I had it between Jan 12-th and Jan 16-th.
I think I faced this exactly same crash today on Slackware 14.2 x86 system.
Chromium was loading some page while whole X freezes. i was able to SSH in to the system but only restart really fixed the system.
OS info :
kernel : 4.4.14 #2 SMP Fri Jun 24 13:38:27 CDT 2016 x86_64 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz GenuineIntel GNU/Linux
GPU: 01:00.0 VGA compatible controller: NVIDIA Corporation G84GL [Quadro FX 370] (rev a1)
Packet / library info & versions
I attach some xorg log and kernel log
Created attachment 126000 [details]
xorg-log after crash
Created attachment 126001 [details]
Kernel log when crash
I got the same problem here with randomly occurring freezes (only mouse pointer can be moved but I can still ssh into it).
- Debian Jessie
- newest backports kernel (which is 4.7.2 currently, it also happened with 4.6*)
- Dual monitor setup with 8800 GTS 320MB
Logs will be attached from dmesg and Xorg.0.log
Created attachment 126923 [details]
dmesg output for Debian Jessie + 4.7.2
Created attachment 126924 [details]
Xorg.0.log for Debian Jessie + 4.7.2
I've got the same issue with 4.4 (kernel) not with 4.1.15 when screensaver fire up.
How can I help, with trace/logfile ?
01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)
Nov 5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux] subc 6 mthd 01c8 data beef0201
Nov 5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux] subc 6 mthd 01c4 data beef0201
Nov 5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux] subc 6 mthd 01c0 data beef0201
Nov 5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux] subc 6 mthd 01b8 data beef0201
With newer kernel I've got new messages... If it can help.
ov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 13 [flux] get 0000000000 put 0000000000 ib_get 00000000 ib_put 00000002 state c0000000 (err: MEM_FAULT) push 00400040
Nov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fb: trapped read at 0020010000 on channel 13 [3eebf000 flux] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 0000000f [DMAOBJ_LIMIT]
Nov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 13 [flux] get 0000000000 put 0000000000 ib_get 00000002 ib_put 00000004 state c0000000 (err: MEM_FAULT) push 00400040
Nov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fb: trapped read at 0020010010 on channel 13 [3eebf000 flux] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 0000000f [DMAOBJ_LIMIT]
Nov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 13 [flux] get 0000000000 put 0000000000 ib_get 00000004 ib_put 00000007 state c0000000 (err: MEM_FAULT) push 00400040
Nov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fb: trapped read at 0020010020 on channel 13 [3eebf000 flux] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 0000000f [DMAOBJ_LIMIT]
Created attachment 127928 [details]
I attached a kernel trace which may be related. I got this when:
1. Upgraded Fedora 24 to Fedora 25.
2. Disabled wayland for gdm.
3. Created script with export QSG_RENDER_LOOP=basic in profile.d.
4. Logged in as a first user to the kde session.
5. Pressed Ctrl+Alt+F1 to get gdm login screen.
6. Logged in as a second user to the kde session.
7. Both kde sessions were stuck, but as opposed to previous behaviour I was able to Ctrl+Alt+Backspace both sessions (maybe because of previous steps I did not use before).
8. After turning down second session got previously attached kernel trace.
Created attachment 128304 [details]
Extraction of crash info when using modesetting driver
I tried to switch to modesetting driver, but my X sessions are crashing also. I attached crash info recently.
Also QSG_RENDER_LOOP=basic was applied.
Is there any workaround available to avoid crashes? I do not need any 3D or anything, just stable 2D desktop.
I've got the same kind of issue.
Hang when starting up some software.
01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2) (an prety old card).
nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 12 [gwenview] subc 3 mthd 01a8 data beef0201
(only X/keyboard hang... ssh still working).
Cannot go to console... console didn't work with nouveau without fb.
(I had no more issue after recompile xscreensaver/flux).
I had to replace my graphic card now as the old one died now. So could these problems arise from memory corruption in a dying card? I have now a GTX 550ti which has not shown these error messages yet.
This workarounded this issue by using nvidia 340 drivers. But after upgrade to Fedora 25 these drivers were not available for rather long period. Since there was nothing done in this bug for almost a yeat I tried to replace hardware.
Firstly I tried GT730. I ended up with the same behaviour. Intermittent GUI lockups, only mouse cursor moved. Just logs were different. If I remember well they were similar to bug 93629. It is opened also for almost a year. I suspect these bugs have something common. Just logs are different on different hardware. So I returned the GT730, sold the GT210 and bought AMD Radeon 6450 and the desktop is rock solid now.
I am not closing the bug because of other guys having same problems, but after solving, please, do not wait for my confirmation. I no longer have the nvidia hardware to test.
Thanks for interest and the hard work without specs from vendor.
The nouveau driver hangs with mythfrontend. My system is a desktop intel i7 onboard graphics running kde plasma and a separate user (seat) running mythfrontend directly on X (not kde) using an nvidia GT240 The allows me to use my desktop system at the same time the TV is running mythtv in an adjacent room. When the nouveau driver hangs it has no affect on the desktop system, the last frame or menu remains static on the TV even after restarting the mythfrontend application.
I can reliably cause the hang by stopping the video player using the remote control then trying to stop it again before it completes the first stop. I see the last frame of video and the remote no longer works. Restarting the mythfrontend application, as root on the desktop, blanks the screen then the last frame of video is shown on the TV even though the application has not started a player. Restarting should display a menu, but once hung the picture does not change.
I suspect the some internal registers or kernel data structures are not finished being cleaned up and the second stop does not allow this to complete.