Bug 85086

Summary:

[NVE7] read/write faults leave machine unsuable

Product:

xorg

Reporter:

tethys

Component:

Driver/nouveau

Assignee:

Nouveau Project <nouveau>

Status:

RESOLVED MOVED

QA Contact:

Xorg Project Team <xorg-team>

Severity:

normal

Priority:

medium

CC:

ahippo

Version:

unspecified

Hardware:

Other

OS:

All

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
nouveau errors (from journalctl)	none
xrandr output	none
Screenshot	none
Full boot log (from journalctl -k -b bootid)	none

Description tethys 2014-10-16 11:45:10 UTC

Created attachment 107921 [details]
nouveau errors (from journalctl)

I came in this morning to find my machine at a text console, displaying
the following error:

Oct 15 20:57:55 localhost.localdomain kernel: nouveau E[Xorg[4149]] failed to idle channel 0xcccc0000 [Xorg[4149]]
Oct 15 20:58:10 localhost.localdomain kernel: nouveau E[Xorg[4149]] failed to idle channel 0xcccc0000 [Xorg[4149]]
Oct 15 20:58:15 localhost.localdomain kernel: nouveau E[   PFIFO][0000:01:00.0] read fault at 0x0001e60000 [PTE] from CE2/GR_COPY on channel 0x007f8ef000 [unknown]

Possibly relevant things:

tet:~# uname -a
Linux tet.colet10.gambit 3.17.0 #9 SMP Mon Oct 13 16:26:23 BST 2014 x86_64 x86_64 x86_64 GNU/Linux

tet:~# lspci -v | sed -n '/VGA/,/^$/p'
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [NVS 510] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation Device 0967
        Flags: bus master, fast devsel, latency 0, IRQ 25
        Memory at f2000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at f3000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Kernel driver in use: nouveau
        Kernel modules: nvidiafb

tet:~# fgrep Monitor /var/log/Xorg.0.log
[    51.449] (**) |   |-->Monitor "<default monitor>"
[    51.714] (II) NOUVEAU(0): Monitor name: DELL U2713HM
[    51.748] (II) NOUVEAU(0): Monitor name: SDM-S205F/K
[    51.817] (II) NOUVEAU(0): Monitor name: DELL U2711

tet:~# rpm -qa | fgrep nouveau
xorg-x11-drv-nouveau-1.0.9-2.fc20.x86_64

tet:~# rpm -qa | fgrep mesa
mesa-libGL-10.1.5-1.20140607.fc20.i686
mesa-libglapi-10.1.5-1.20140607.fc20.x86_64
mesa-filesystem-10.1.5-1.20140607.fc20.x86_64
mesa-libglapi-10.1.5-1.20140607.fc20.i686
mesa-libGLU-9.0.0-5.fc20.x86_64
mesa-libgbm-10.1.5-1.20140607.fc20.x86_64
mesa-libGLES-10.1.5-1.20140607.fc20.x86_64
mesa-libwayland-egl-10.1.5-1.20140607.fc20.x86_64
mesa-dri-drivers-10.1.5-1.20140607.fc20.x86_64
mesa-libEGL-10.1.5-1.20140607.fc20.x86_64
mesa-libxatracker-10.1.5-1.20140607.fc20.x86_64
mesa-libgbm-10.1.5-1.20140607.fc20.i686
mesa-libGL-10.1.5-1.20140607.fc20.x86_64
mesa-libEGL-10.1.5-1.20140607.fc20.i686

Comment 1 tethys 2014-10-16 11:46:14 UTC

Created attachment 107922 [details]
xrandr output

The only option I had after the error was to reboot.

Comment 2 tethys 2014-10-16 11:51:34 UTC

Created attachment 107925 [details]
Screenshot

Possibly relevant. The on screen output seems to have converted what I
assume is a tab character into a checkerboard pattern.

Comment 3 Tobias Klausmann 2014-10-16 15:56:56 UTC

The first step to solve this would be a way to reproduce the problem. With this you could bisect or test patches and make sure it does not happen again.

Comment 4 Ilia Mirkin 2014-10-16 16:09:39 UTC

In an effort to reduce irrelevant information, you've cut out potentially relevant things. Best to attach unredacted logs...

(a) Is this an optimus setup? If so, can you try booting with nouveau.runpm=0 [this will cause the nvidia card not to auto-suspend when not used]
(b) Please attach a full kernel log, from boot until the errors.

It would appear that the errors are brought on by a channel hang, from which nouveau recovers in a... less-than-graceful fashion. This is, generally, a known "issue" (more like 100 inter-related issues). However perhaps you can update your userspace (specifically mesa), and whatever issue causes the hang has been fixed in 10.3.1. Upgrading the ddx to 1.0.11 wouldn't hurt either.

As Tobias suggested, if you can identify an action that causes the issue, that would be extremely helpful.

Comment 5 tethys 2014-10-16 18:09:11 UTC

Created attachment 107942 [details]
Full boot log (from journalctl -k -b bootid)

It's been doing this for a while, and I'd been advised to try a newer
kernel as there had apparently been a fair amount of nouveau related
improvements. So I tried various -rc kernels and now the final 3.17.0
kernel. It seems (although I don't have enough data to be certain)
that it's improved things slightly in that the crashes seem less
frequent now. But it obviously still crashes.

Reproducing it is simply a matter of waiting. If I leave the machine
overnight, there's a reasonable chance that it'll be crashed when I
get back into work the following morning. In terms of specific
behaviour to trigger the crash, then no, I can't really give any
information other than that I leave the machine basically idle
overnight, running one of the less demanding screensavers from
xscreensaver (cubicgrid).

I don't think it's an optimus setup, but I'm not familiar enough with
the Nvidia product range to say for sure. It's a PCI GeForce card in
a desktop machine.

By upgrading the ddx, I'm assuming you mean the xorg-x11-drv-nouveau
package? I'm already at the latest distribution supplied version (as
I am for mesa), but I guess I could look at compiling it from source
if you think it might help.

Comment 6 Ilia Mirkin 2014-10-16 18:55:04 UTC

(In reply to tethys from comment #5)
> By upgrading the ddx, I'm assuming you mean the xorg-x11-drv-nouveau
> package? I'm already at the latest distribution supplied version (as
> I am for mesa), but I guess I could look at compiling it from source
> if you think it might help.

Right... but it'd be silly to chase bugs already resolved upstream. Distros tend to be cautious with new versions, which makes sense if your starting position is that things work. When things don't work, get the latest :)

Comment 7 Martin Peres 2019-12-04 08:50:20 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/139.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.