Bug 93828

Summary: Xorg hangs randomly with nouveau driver
Product: xorg Reporter: Marek Greško <mgresko8>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: aebenjam, dontmind, fdsfgs, fourdan, jan.public, pachoramos1, ss.c
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
xorg-log after crash
none
Kernel log when crash
none
dmesg output for Debian Jessie + 4.7.2
none
Xorg.0.log for Debian Jessie + 4.7.2
none
Kernel trace
none
Extraction of crash info when using modesetting driver none

Description Marek Greško 2016-01-22 21:10:25 UTC
Xorg hangs randomly with nouveau driver. It could be reproduced cometimes when playing video or starting libreoffice, but not limited to. If pressing Ctrl+Alt+Backspace, monitor goes to sleep immediately. Alt+Sysrq combinations are usually working and also ssh.

System journal contains:
 kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 16 [soffice.bin[30009]] subc 5 mthd 0184 data beef0201

... and many similar lines with changing numbers after subc, mthd and data.
Folowed by:
/usr/libexec/gdm-x-session[11881]: QXcbConnection: XCB error: 3 (BadWindow), sequence: 55765, resource id: 100663298, major code: 18 (ChangeProperty), minor code: 0

and
kernel: nouveau 0000:01:00.0: gr: TRAP_CCACHE 00000001 [FAULT]
kernel: nouveau 0000:01:00.0: gr: TRAP_CCACHE 000e0080 00000000 00000000 00000000 00000000 00000004 00000000
kernel: nouveau 0000:01:00.0: gr: 00200000 [] ch 16 [001eb0f000 soffice.bin[30009]] subc 3 class 8597 mthd 13bc data 00000054
kernel: nouveau 0000:01:00.0: fb: trapped read at 002027ff00 on channel 16 [1eb0f000 soffice.bin[30009]] engine 00 [PGRAPH] client 05 [CCACHE] subclient 00 [CB] reason 00.......
kernel: nouveau 0000:01:00.0: gr: PGRAPH TLB flush idle timeout fail
kernel: nouveau 0000:01:00.0: gr: PGRAPH_STATUS 00000503 [BUSY DISPATCH CTXPROG CCACHE_PREGEOM]
kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS0: 00000008 [CCACHE]
kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS1: 00000000 []
kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS2: 00000000 []
(EE) [mi] EQ overflowing.  Additional events will be discarded until existing events are processed.
(EE) 
(EE) Backtrace:
(EE) 0: /usr/libexec/Xorg (mieqEnqueue+0x253) [0x578753]
(EE) 1: /usr/libexec/Xorg (QueuePointerEvents+0x52) [0x44f352]
(EE) 2: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x30eb) [0x7f1f83f13dfb]
(EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x3855) [0x7f1f83f15035]
(EE) 4: /usr/libexec/Xorg (DPMSSupported+0xe8) [0x4769c8]
(EE) 5: /usr/libexec/Xorg (xf86SerialModemClearBits+0x2b2) [0x49fe62]
(EE) 6: /lib64/libc.so.6 (__restore_rt+0x0) [0x7f1f8df6fb1f]
(EE) 7: /lib64/libc.so.6 (ioctl+0x5) [0x7f1f8e033705]
(EE) 8: /lib64/libdrm.so.2 (drmIoctl+0x28) [0x7f1f8f32f508]
(EE) 9: /lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x7f1f8f33208b]
(EE) 10: /lib64/libdrm_nouveau.so.2 (nouveau_bo_wait+0xbc) [0x7f1f88a2637c]
(EE) 11: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (_init+0x75f9) [0x7f1f88c3ed19]
(EE) 12: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (_init+0x801d) [0x7f1f88c400ed]
(EE) 13: /usr/libexec/Xorg (DRI2SwapBuffers+0x1c8) [0x569268]
(EE) 14: /usr/libexec/Xorg (DRI2GetParam+0xb7c) [0x56ae0c]
(EE) 15: /usr/libexec/Xorg (SendErrorToClient+0x2df) [0x4369bf]
(EE) 16: /usr/libexec/Xorg (remove_fs_handlers+0x453) [0x43a9e3]
(EE) 17: /lib64/libc.so.6 (__libc_start_main+0xf0) [0x7f1f8df5b580]
(EE) 18: /usr/libexec/Xorg (_start+0x29) [0x424ce9]
(EE) 19: ? (?+0x29) [0x29]
(EE) 
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
(EE) [mi] mieq is *NOT* the cause.  It is a victim.
(EE) [mi] EQ overflow continuing.  100 events have been dropped.
(EE)
Comment 1 Ilia Mirkin 2016-01-22 21:12:26 UTC
What hardware? What kernel version? What mesa version?
Comment 2 Marek Greško 2016-01-22 21:19:48 UTC
Just forgot to mention, I am running xorg-x11-drv-nouveau-1.0.12-1.fc23.x86_64
 version (latest from fedora 23 repository).
Comment 3 Marek Greško 2016-01-24 07:53:52 UTC
Hardware:
01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. EN210 SILENT
        Flags: bus master, fast devsel, latency 0, IRQ 45
        Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at fe000000 [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nouveau
        Kernel modules: nouveau


Kernel:
Linux marek.grepo.lan 4.3.3-301.fc23.x86_64 #1 SMP Fri Jan 15 14:03:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Mesa:
mesa-dri-drivers-11.1.0-2.20151218.fc23.i686


It is the truth the problem arised after mesa or kernel update, not after Xorg update. I am not sure but I give higher probability to mesa. Kernel updated on Jan 12-th (to 4.3.3-300 which I was running when problem arised first time), mesa on Jan 16-th. I am sure I had not this problem before Jan 12-th. I am not sure whether I had it between Jan 12-th and Jan 16-th.
Comment 4 joni 2016-08-24 13:01:54 UTC
I think I faced this exactly same crash today on Slackware 14.2 x86 system.

Chromium was loading some page while whole X freezes. i was able to SSH in to the system but only restart really fixed the system.

OS info :
Slackware 14.2

kernel : 4.4.14 #2 SMP Fri Jun 24 13:38:27 CDT 2016 x86_64 Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz GenuineIntel GNU/Linux

GPU: 01:00.0 VGA compatible controller: NVIDIA Corporation G84GL [Quadro FX 370] (rev a1)

Packet / library info & versions
kernel-huge-4.4.14-x86_64-1
mesa-11.2.2-x86_64-1
xf86-video-nouveau-1.0.12-x86_64-1
xorg-server-1.18.3-x86_64-2

I attach some xorg log and kernel log
Comment 5 joni 2016-08-24 13:02:57 UTC
Created attachment 126000 [details]
xorg-log after crash
Comment 6 joni 2016-08-24 13:03:22 UTC
Created attachment 126001 [details]
Kernel log when crash
Comment 7 steph 2016-09-30 20:53:00 UTC
I got the same problem here with randomly occurring freezes (only mouse pointer can be moved but I can still ssh into it).

Environment:
- Debian Jessie
- newest backports kernel (which is 4.7.2 currently, it also happened with 4.6*)
- Dual monitor setup with 8800 GTS 320MB

Logs will be attached from dmesg and Xorg.0.log
Comment 8 steph 2016-09-30 20:54:28 UTC
Created attachment 126923 [details]
dmesg output for Debian Jessie + 4.7.2
Comment 9 steph 2016-09-30 20:55:13 UTC
Created attachment 126924 [details]
Xorg.0.log for Debian Jessie + 4.7.2
Comment 10 derrierdo 2016-11-05 15:52:31 UTC
Hi,

I've got the same issue with 4.4 (kernel) not with 4.1.15 when screensaver fire up.
How can I help, with trace/logfile ?


01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)


Nov  5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux[21781]] subc 6 mthd 01c8 data beef0201
Nov  5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux[21781]] subc 6 mthd 01c4 data beef0201
Nov  5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux[21781]] subc 6 mthd 01c0 data beef0201
Nov  5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux[21781]] subc 6 mthd 01b8 data beef0201
Comment 11 derrierdo 2016-11-06 22:31:56 UTC
With newer kernel I've got new messages... If it can help.

ov  6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 13 [flux[28527]] get 0000000000 put 0000000000 ib_get 00000000 ib_put 00000002 state c0000000 (err: MEM_FAULT) push 00400040
Nov  6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fb: trapped read at 0020010000 on channel 13 [3eebf000 flux[28527]] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 0000000f [DMAOBJ_LIMIT]
Nov  6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 13 [flux[28527]] get 0000000000 put 0000000000 ib_get 00000002 ib_put 00000004 state c0000000 (err: MEM_FAULT) push 00400040
Nov  6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fb: trapped read at 0020010010 on channel 13 [3eebf000 flux[28527]] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 0000000f [DMAOBJ_LIMIT]
Nov  6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 13 [flux[28527]] get 0000000000 put 0000000000 ib_get 00000004 ib_put 00000007 state c0000000 (err: MEM_FAULT) push 00400040
Nov  6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fb: trapped read at 0020010020 on channel 13 [3eebf000 flux[28527]] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 0000000f [DMAOBJ_LIMIT]
Comment 12 Marek Greško 2016-11-12 11:49:41 UTC
Created attachment 127928 [details]
Kernel trace
Comment 13 Marek Greško 2016-11-12 11:54:22 UTC
I attached a kernel trace which may be related. I got this when:

1. Upgraded Fedora 24 to Fedora 25.

2. Disabled wayland for gdm.

3. Created script with export QSG_RENDER_LOOP=basic in profile.d.

4. Logged in as a first user to the kde session.

5. Pressed Ctrl+Alt+F1 to get gdm login screen.

6. Logged in as a second user to the kde session.

7. Both kde sessions were stuck, but as opposed to previous behaviour I was able to Ctrl+Alt+Backspace both sessions (maybe because of previous steps I did not use before).

8. After turning down second session got previously attached kernel trace.
Comment 14 Marek Greško 2016-12-01 20:12:35 UTC
Created attachment 128304 [details]
Extraction of crash info when using modesetting driver
Comment 15 Marek Greško 2016-12-01 20:14:54 UTC
I tried to switch to modesetting driver, but my X sessions are crashing also. I attached crash info recently.

Also QSG_RENDER_LOOP=basic was applied.

Is there any workaround available to avoid crashes? I do not need any 3D or anything, just stable 2D desktop.
Comment 16 derrierdo 2016-12-27 02:29:50 UTC
Hi Guys,

I've got the same  kind of issue.

Hang when starting up some software.
01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)  (an prety old card).
nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 12 [gwenview[13360]] subc 3 mthd 01a8 data beef0201
...
(only X/keyboard hang... ssh still working).


Cannot go to console... console didn't work with nouveau without fb.
(I had no more issue after recompile xscreensaver/flux).

Dominique
Comment 17 steph 2016-12-31 12:50:21 UTC
I had to replace my graphic card now as the old one died now. So could these problems arise from memory corruption in a dying card? I have now a GTX 550ti which has not shown these error messages yet.
Comment 18 Marek Greško 2017-01-05 18:13:12 UTC
This workarounded this issue by using nvidia 340 drivers. But after upgrade to Fedora 25 these drivers were not available for rather long period. Since there was nothing done in this bug for almost a yeat I tried to replace hardware. 

Firstly I tried GT730. I ended up with the same behaviour. Intermittent GUI lockups, only mouse cursor moved. Just logs were different. If I remember well they were similar to bug 93629. It is opened also for almost a year. I suspect these bugs have something common. Just logs are different on different hardware. So I returned the GT730, sold the GT210 and bought AMD Radeon 6450 and the desktop is rock solid now.

I am not closing the bug because of other guys having same problems, but after solving, please, do not wait for my confirmation. I no longer have the nvidia hardware to test.

Thanks for interest and the hard work without specs from vendor.
Comment 19 BitPit 2017-09-13 15:24:59 UTC
The nouveau driver hangs with mythfrontend.  My system is a desktop intel i7 onboard graphics running kde plasma and a separate user (seat) running mythfrontend directly on X (not kde) using an nvidia GT240   The allows me to use my desktop system at the same time the TV is running mythtv in an adjacent room.  When the nouveau driver hangs it has no affect on the desktop system, the last frame or menu remains static on the TV even after restarting the mythfrontend application.

I can reliably cause the hang by stopping the video player using the remote control then trying to stop it again before it completes the first stop.  I see the last frame of video and the remote no longer works.  Restarting the mythfrontend application, as root on the desktop, blanks the screen then the last frame of video is shown on the TV even though the application has not started a player.  Restarting should display a menu, but once hung the picture does not change.

I suspect the some internal registers or kernel data structures are not finished being cleaned up and the second stop does not allow this to complete.
Comment 20 Martin Peres 2019-12-04 09:10:12 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/251.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.