|Summary:||[NVE7] read/write faults leave machine unsuable|
|Component:||Driver/nouveau||Assignee:||Nouveau Project <nouveau>|
|Status:||NEW ---||QA Contact:||Xorg Project Team <xorg-team>|
|i915 platform:||i915 features:|
Description tethys 2014-10-16 11:45:10 UTC
Created attachment 107921 [details] nouveau errors (from journalctl) I came in this morning to find my machine at a text console, displaying the following error: Oct 15 20:57:55 localhost.localdomain kernel: nouveau E[Xorg] failed to idle channel 0xcccc0000 [Xorg] Oct 15 20:58:10 localhost.localdomain kernel: nouveau E[Xorg] failed to idle channel 0xcccc0000 [Xorg] Oct 15 20:58:15 localhost.localdomain kernel: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x0001e60000 [PTE] from CE2/GR_COPY on channel 0x007f8ef000 [unknown] Possibly relevant things: tet:~# uname -a Linux tet.colet10.gambit 3.17.0 #9 SMP Mon Oct 13 16:26:23 BST 2014 x86_64 x86_64 x86_64 GNU/Linux tet:~# lspci -v | sed -n '/VGA/,/^$/p' 01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [NVS 510] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation Device 0967 Flags: bus master, fast devsel, latency 0, IRQ 25 Memory at f2000000 (32-bit, non-prefetchable) [size=16M] Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f0000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] Expansion ROM at f3000000 [disabled] [size=512K] Capabilities:  Power Management version 3 Capabilities:  MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities:  Express Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities:  Virtual Channel Capabilities:  Power Budgeting <?> Capabilities:  Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities:  #19 Kernel driver in use: nouveau Kernel modules: nvidiafb tet:~# fgrep Monitor /var/log/Xorg.0.log [ 51.449] (**) | |-->Monitor "<default monitor>" [ 51.714] (II) NOUVEAU(0): Monitor name: DELL U2713HM [ 51.748] (II) NOUVEAU(0): Monitor name: SDM-S205F/K [ 51.817] (II) NOUVEAU(0): Monitor name: DELL U2711 tet:~# rpm -qa | fgrep nouveau xorg-x11-drv-nouveau-1.0.9-2.fc20.x86_64 tet:~# rpm -qa | fgrep mesa mesa-libGL-10.1.5-1.20140607.fc20.i686 mesa-libglapi-10.1.5-1.20140607.fc20.x86_64 mesa-filesystem-10.1.5-1.20140607.fc20.x86_64 mesa-libglapi-10.1.5-1.20140607.fc20.i686 mesa-libGLU-9.0.0-5.fc20.x86_64 mesa-libgbm-10.1.5-1.20140607.fc20.x86_64 mesa-libGLES-10.1.5-1.20140607.fc20.x86_64 mesa-libwayland-egl-10.1.5-1.20140607.fc20.x86_64 mesa-dri-drivers-10.1.5-1.20140607.fc20.x86_64 mesa-libEGL-10.1.5-1.20140607.fc20.x86_64 mesa-libxatracker-10.1.5-1.20140607.fc20.x86_64 mesa-libgbm-10.1.5-1.20140607.fc20.i686 mesa-libGL-10.1.5-1.20140607.fc20.x86_64 mesa-libEGL-10.1.5-1.20140607.fc20.i686
Comment 1 tethys 2014-10-16 11:46:14 UTC
Created attachment 107922 [details] xrandr output The only option I had after the error was to reboot.
Comment 2 tethys 2014-10-16 11:51:34 UTC
Created attachment 107925 [details] Screenshot Possibly relevant. The on screen output seems to have converted what I assume is a tab character into a checkerboard pattern.
Comment 3 Tobias Klausmann 2014-10-16 15:56:56 UTC
The first step to solve this would be a way to reproduce the problem. With this you could bisect or test patches and make sure it does not happen again.
Comment 4 Ilia Mirkin 2014-10-16 16:09:39 UTC
In an effort to reduce irrelevant information, you've cut out potentially relevant things. Best to attach unredacted logs... (a) Is this an optimus setup? If so, can you try booting with nouveau.runpm=0 [this will cause the nvidia card not to auto-suspend when not used] (b) Please attach a full kernel log, from boot until the errors. It would appear that the errors are brought on by a channel hang, from which nouveau recovers in a... less-than-graceful fashion. This is, generally, a known "issue" (more like 100 inter-related issues). However perhaps you can update your userspace (specifically mesa), and whatever issue causes the hang has been fixed in 10.3.1. Upgrading the ddx to 1.0.11 wouldn't hurt either. As Tobias suggested, if you can identify an action that causes the issue, that would be extremely helpful.
Comment 5 tethys 2014-10-16 18:09:11 UTC
Created attachment 107942 [details] Full boot log (from journalctl -k -b bootid) It's been doing this for a while, and I'd been advised to try a newer kernel as there had apparently been a fair amount of nouveau related improvements. So I tried various -rc kernels and now the final 3.17.0 kernel. It seems (although I don't have enough data to be certain) that it's improved things slightly in that the crashes seem less frequent now. But it obviously still crashes. Reproducing it is simply a matter of waiting. If I leave the machine overnight, there's a reasonable chance that it'll be crashed when I get back into work the following morning. In terms of specific behaviour to trigger the crash, then no, I can't really give any information other than that I leave the machine basically idle overnight, running one of the less demanding screensavers from xscreensaver (cubicgrid). I don't think it's an optimus setup, but I'm not familiar enough with the Nvidia product range to say for sure. It's a PCI GeForce card in a desktop machine. By upgrading the ddx, I'm assuming you mean the xorg-x11-drv-nouveau package? I'm already at the latest distribution supplied version (as I am for mesa), but I guess I could look at compiling it from source if you think it might help.
Comment 6 Ilia Mirkin 2014-10-16 18:55:04 UTC
(In reply to tethys from comment #5) > By upgrading the ddx, I'm assuming you mean the xorg-x11-drv-nouveau > package? I'm already at the latest distribution supplied version (as > I am for mesa), but I guess I could look at compiling it from source > if you think it might help. Right... but it'd be silly to chase bugs already resolved upstream. Distros tend to be cautious with new versions, which makes sense if your starting position is that things work. When things don't work, get the latest :)