Bug 85086

Summary: [NVE7] read/write faults leave machine unsuable
Product: xorg Reporter: tethys
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: NEW --- QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: ahippo
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
nouveau errors (from journalctl)
none
xrandr output
none
Screenshot
none
Full boot log (from journalctl -k -b bootid) none

Description tethys 2014-10-16 11:45:10 UTC
Created attachment 107921 [details]
nouveau errors (from journalctl)

I came in this morning to find my machine at a text console, displaying
the following error:

Oct 15 20:57:55 localhost.localdomain kernel: nouveau E[Xorg[4149]] failed to idle channel 0xcccc0000 [Xorg[4149]]
Oct 15 20:58:10 localhost.localdomain kernel: nouveau E[Xorg[4149]] failed to idle channel 0xcccc0000 [Xorg[4149]]
Oct 15 20:58:15 localhost.localdomain kernel: nouveau E[   PFIFO][0000:01:00.0] read fault at 0x0001e60000 [PTE] from CE2/GR_COPY on channel 0x007f8ef000 [unknown]

Possibly relevant things:

tet:~# uname -a
Linux tet.colet10.gambit 3.17.0 #9 SMP Mon Oct 13 16:26:23 BST 2014 x86_64 x86_64 x86_64 GNU/Linux

tet:~# lspci -v | sed -n '/VGA/,/^$/p'
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [NVS 510] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation Device 0967
        Flags: bus master, fast devsel, latency 0, IRQ 25
        Memory at f2000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at f3000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Kernel driver in use: nouveau
        Kernel modules: nvidiafb

tet:~# fgrep Monitor /var/log/Xorg.0.log
[    51.449] (**) |   |-->Monitor "<default monitor>"
[    51.714] (II) NOUVEAU(0): Monitor name: DELL U2713HM
[    51.748] (II) NOUVEAU(0): Monitor name: SDM-S205F/K
[    51.817] (II) NOUVEAU(0): Monitor name: DELL U2711

tet:~# rpm -qa | fgrep nouveau
xorg-x11-drv-nouveau-1.0.9-2.fc20.x86_64

tet:~# rpm -qa | fgrep mesa
mesa-libGL-10.1.5-1.20140607.fc20.i686
mesa-libglapi-10.1.5-1.20140607.fc20.x86_64
mesa-filesystem-10.1.5-1.20140607.fc20.x86_64
mesa-libglapi-10.1.5-1.20140607.fc20.i686
mesa-libGLU-9.0.0-5.fc20.x86_64
mesa-libgbm-10.1.5-1.20140607.fc20.x86_64
mesa-libGLES-10.1.5-1.20140607.fc20.x86_64
mesa-libwayland-egl-10.1.5-1.20140607.fc20.x86_64
mesa-dri-drivers-10.1.5-1.20140607.fc20.x86_64
mesa-libEGL-10.1.5-1.20140607.fc20.x86_64
mesa-libxatracker-10.1.5-1.20140607.fc20.x86_64
mesa-libgbm-10.1.5-1.20140607.fc20.i686
mesa-libGL-10.1.5-1.20140607.fc20.x86_64
mesa-libEGL-10.1.5-1.20140607.fc20.i686
Comment 1 tethys 2014-10-16 11:46:14 UTC
Created attachment 107922 [details]
xrandr output

The only option I had after the error was to reboot.
Comment 2 tethys 2014-10-16 11:51:34 UTC
Created attachment 107925 [details]
Screenshot

Possibly relevant. The on screen output seems to have converted what I
assume is a tab character into a checkerboard pattern.
Comment 3 Tobias Klausmann 2014-10-16 15:56:56 UTC
The first step to solve this would be a way to reproduce the problem. With this you could bisect or test patches and make sure it does not happen again.
Comment 4 Ilia Mirkin 2014-10-16 16:09:39 UTC
In an effort to reduce irrelevant information, you've cut out potentially relevant things. Best to attach unredacted logs...

(a) Is this an optimus setup? If so, can you try booting with nouveau.runpm=0 [this will cause the nvidia card not to auto-suspend when not used]
(b) Please attach a full kernel log, from boot until the errors.

It would appear that the errors are brought on by a channel hang, from which nouveau recovers in a... less-than-graceful fashion. This is, generally, a known "issue" (more like 100 inter-related issues). However perhaps you can update your userspace (specifically mesa), and whatever issue causes the hang has been fixed in 10.3.1. Upgrading the ddx to 1.0.11 wouldn't hurt either.

As Tobias suggested, if you can identify an action that causes the issue, that would be extremely helpful.
Comment 5 tethys 2014-10-16 18:09:11 UTC
Created attachment 107942 [details]
Full boot log (from journalctl -k -b bootid)

It's been doing this for a while, and I'd been advised to try a newer
kernel as there had apparently been a fair amount of nouveau related
improvements. So I tried various -rc kernels and now the final 3.17.0
kernel. It seems (although I don't have enough data to be certain)
that it's improved things slightly in that the crashes seem less
frequent now. But it obviously still crashes.

Reproducing it is simply a matter of waiting. If I leave the machine
overnight, there's a reasonable chance that it'll be crashed when I
get back into work the following morning. In terms of specific
behaviour to trigger the crash, then no, I can't really give any
information other than that I leave the machine basically idle
overnight, running one of the less demanding screensavers from
xscreensaver (cubicgrid).

I don't think it's an optimus setup, but I'm not familiar enough with
the Nvidia product range to say for sure. It's a PCI GeForce card in
a desktop machine.

By upgrading the ddx, I'm assuming you mean the xorg-x11-drv-nouveau
package? I'm already at the latest distribution supplied version (as
I am for mesa), but I guess I could look at compiling it from source
if you think it might help.
Comment 6 Ilia Mirkin 2014-10-16 18:55:04 UTC
(In reply to tethys from comment #5)
> By upgrading the ddx, I'm assuming you mean the xorg-x11-drv-nouveau
> package? I'm already at the latest distribution supplied version (as
> I am for mesa), but I guess I could look at compiling it from source
> if you think it might help.

Right... but it'd be silly to chase bugs already resolved upstream. Distros tend to be cautious with new versions, which makes sense if your starting position is that things work. When things don't work, get the latest :)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.