Bug 85086 - [NVE7] read/write faults leave machine unsuable
Summary: [NVE7] read/write faults leave machine unsuable
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
Depends on:
Reported: 2014-10-16 11:45 UTC by tethys
Modified: 2016-04-12 17:03 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:

nouveau errors (from journalctl) (7.11 KB, text/plain)
2014-10-16 11:45 UTC, tethys
no flags Details
xrandr output (1.72 KB, text/plain)
2014-10-16 11:46 UTC, tethys
no flags Details
Screenshot (381.69 KB, image/jpeg)
2014-10-16 11:51 UTC, tethys
no flags Details
Full boot log (from journalctl -k -b bootid) (125.73 KB, text/x-log)
2014-10-16 18:09 UTC, tethys
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description tethys 2014-10-16 11:45:10 UTC
Created attachment 107921 [details]
nouveau errors (from journalctl)

I came in this morning to find my machine at a text console, displaying
the following error:

Oct 15 20:57:55 localhost.localdomain kernel: nouveau E[Xorg[4149]] failed to idle channel 0xcccc0000 [Xorg[4149]]
Oct 15 20:58:10 localhost.localdomain kernel: nouveau E[Xorg[4149]] failed to idle channel 0xcccc0000 [Xorg[4149]]
Oct 15 20:58:15 localhost.localdomain kernel: nouveau E[   PFIFO][0000:01:00.0] read fault at 0x0001e60000 [PTE] from CE2/GR_COPY on channel 0x007f8ef000 [unknown]

Possibly relevant things:

tet:~# uname -a
Linux tet.colet10.gambit 3.17.0 #9 SMP Mon Oct 13 16:26:23 BST 2014 x86_64 x86_64 x86_64 GNU/Linux

tet:~# lspci -v | sed -n '/VGA/,/^$/p'
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [NVS 510] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation Device 0967
        Flags: bus master, fast devsel, latency 0, IRQ 25
        Memory at f2000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at f3000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Kernel driver in use: nouveau
        Kernel modules: nvidiafb

tet:~# fgrep Monitor /var/log/Xorg.0.log
[    51.449] (**) |   |-->Monitor "<default monitor>"
[    51.714] (II) NOUVEAU(0): Monitor name: DELL U2713HM
[    51.748] (II) NOUVEAU(0): Monitor name: SDM-S205F/K
[    51.817] (II) NOUVEAU(0): Monitor name: DELL U2711

tet:~# rpm -qa | fgrep nouveau

tet:~# rpm -qa | fgrep mesa
Comment 1 tethys 2014-10-16 11:46:14 UTC
Created attachment 107922 [details]
xrandr output

The only option I had after the error was to reboot.
Comment 2 tethys 2014-10-16 11:51:34 UTC
Created attachment 107925 [details]

Possibly relevant. The on screen output seems to have converted what I
assume is a tab character into a checkerboard pattern.
Comment 3 Tobias Klausmann 2014-10-16 15:56:56 UTC
The first step to solve this would be a way to reproduce the problem. With this you could bisect or test patches and make sure it does not happen again.
Comment 4 Ilia Mirkin 2014-10-16 16:09:39 UTC
In an effort to reduce irrelevant information, you've cut out potentially relevant things. Best to attach unredacted logs...

(a) Is this an optimus setup? If so, can you try booting with nouveau.runpm=0 [this will cause the nvidia card not to auto-suspend when not used]
(b) Please attach a full kernel log, from boot until the errors.

It would appear that the errors are brought on by a channel hang, from which nouveau recovers in a... less-than-graceful fashion. This is, generally, a known "issue" (more like 100 inter-related issues). However perhaps you can update your userspace (specifically mesa), and whatever issue causes the hang has been fixed in 10.3.1. Upgrading the ddx to 1.0.11 wouldn't hurt either.

As Tobias suggested, if you can identify an action that causes the issue, that would be extremely helpful.
Comment 5 tethys 2014-10-16 18:09:11 UTC
Created attachment 107942 [details]
Full boot log (from journalctl -k -b bootid)

It's been doing this for a while, and I'd been advised to try a newer
kernel as there had apparently been a fair amount of nouveau related
improvements. So I tried various -rc kernels and now the final 3.17.0
kernel. It seems (although I don't have enough data to be certain)
that it's improved things slightly in that the crashes seem less
frequent now. But it obviously still crashes.

Reproducing it is simply a matter of waiting. If I leave the machine
overnight, there's a reasonable chance that it'll be crashed when I
get back into work the following morning. In terms of specific
behaviour to trigger the crash, then no, I can't really give any
information other than that I leave the machine basically idle
overnight, running one of the less demanding screensavers from
xscreensaver (cubicgrid).

I don't think it's an optimus setup, but I'm not familiar enough with
the Nvidia product range to say for sure. It's a PCI GeForce card in
a desktop machine.

By upgrading the ddx, I'm assuming you mean the xorg-x11-drv-nouveau
package? I'm already at the latest distribution supplied version (as
I am for mesa), but I guess I could look at compiling it from source
if you think it might help.
Comment 6 Ilia Mirkin 2014-10-16 18:55:04 UTC
(In reply to tethys from comment #5)
> By upgrading the ddx, I'm assuming you mean the xorg-x11-drv-nouveau
> package? I'm already at the latest distribution supplied version (as
> I am for mesa), but I guess I could look at compiling it from source
> if you think it might help.

Right... but it'd be silly to chase bugs already resolved upstream. Distros tend to be cautious with new versions, which makes sense if your starting position is that things work. When things don't work, get the latest :)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.