Created attachment 128783 [details]
dmesg showing error
Intermittently (a few times a day), my display will completely freeze and doesn't recover. Although the kernel doesn't hang and I can ssh in, I can't chvt to a non-graphical VT.
Whenever this occurs, I see a message like this in dmesg:
[12535.260195] nouveau 0000:01:00.0: gr: TRAP ch 6 [007f778000 Xwayland]
[12535.260211] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000000  warp 3d0001 [STACK_ERROR]
[12539.595312] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[12539.595318] nouveau 0000:01:00.0: fifo: gr engine fault on channel 5, recovering...
Despite the "recovering..." message it never actually recovers and only a reboot solves the problem.
I'm using GNOME on Wayland, and I'm typically running gnome-terminal, Firefox, and/or Chromium. So far I haven't identified any specific action that triggers this failure.
My computer is a mid-2014 Macbook Pro with a GeForce 750M (GK107). I'm running nouveau with "nouveau.nofbaccel=1". I've tried adding "nouveau.config=NvGrUseFW=1" but it complains about not finding /lib/firmware/nvidia/gk107/fecs_inst.bin. Is there an external firmware blob available for this card?
I recently updated my kernel and several relevant packages, and have seen no difference in behavior. I'm running linux 4.9.0, and the latest version of mesa from git (36b5f1d200).
See my attached dmesg output. Is there any debug flag I can enable to shed more light on the situation?
To answer your immediate question about firmware, you can get blob firmware by following the insturctions at
to the letter. Unfortunately I think that linux 4.9 wants some of these to be renamed, but there's a patch to fix it to look for the "old" names as well:
Should be included in a later 4.9.x release.
However I'm only aware of the blob firmware fixing issues for some GTX 660 owners.
nofbaccel=1 is unlikely to be of much help - that disables acceleration of the fbdev device for your terminals.
Note that there are additional patches in Linux 4.10-rc1+ which are likely to improve stability, such as
and you might also want this one, although it'll be a little annoying to apply to upstream tree:
Thanks for the quick reply! That's all very informative, especially the bit about the firmware filenames. (I was wondering why the filenames in /lib/firmware/nouveau were so different from the ones in /lib/firmware/nvidia!)
I'll give those patches a shot and follow up with an update soon.