Bug 27211 - endless PROTECTION_FAULT logs, Nouveau drm, TNT card
Summary: endless PROTECTION_FAULT logs, Nouveau drm, TNT card
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/other (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-20 12:28 UTC by Brent
Modified: 2013-05-24 14:03 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
/var/log/messages.log (31.81 KB, text/plain)
2010-03-20 12:28 UTC, Brent
no flags Details
Xorg log (1.7.6-3, 0.0.15_git20100314-1, 2.4.19-2, 2.6.34-rc5) (16.93 KB, text/plain)
2010-05-12 14:59 UTC, Brent
no flags Details
messages.log (74.62 KB, text/plain)
2010-05-13 09:51 UTC, Brent
no flags Details
Xorg log (1.7.6-3, 0.0.16, 2.4.20, 2.6.34-rc5) (18.74 KB, application/octet-stream)
2010-05-13 12:56 UTC, Brent
no flags Details
Xorg log (1.7.6-3, 0.0.16, 2.4.20, 2.6.34-rc5) (18.74 KB, text/plain)
2010-05-13 13:01 UTC, Brent
no flags Details
kernel log spam (18.31 KB, application/octet-stream)
2010-05-13 13:07 UTC, Brent
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Brent 2010-03-20 12:28:43 UTC
Created attachment 34263 [details]
/var/log/messages.log

I put Arch Linux on a Pentium II with a Diamond Viper V550 card (Riva TNT).  If I use the VESA driver, it works fine.  But if I use Nouveau, the boot sequence never finishes.  After the udev message, the screen goes blank and I hear endless disk activity.  Thousands of these messages were being logged:

Mar 17 07:39:33 talc kernel: [drm] nouveau 0000:01:00.0: PGRAPH_NOTIFY - nSource: PATCH_EXCEPTION, nStatus: INVALID_STATE PROTECTION_FAULT
Mar 17 07:39:33 talc kernel: [drm] nouveau 0000:01:00.0: PGRAPH_NOTIFY - Ch 0/3 Class 0x004a Mthd 0x0c18 Data 0x00000720:0x00000000

Relevant packages:
[root@talc log]# pacman -Qs kernel26
local/kernel26 2.6.32.10-1 (base)
    The Linux Kernel and modules
local/kernel26-firmware 2.6.32.10-1 (base)
    The included firmware files of the Linux Kernel
[root@talc log]# pacman -Qs nouveau
local/nouveau-drm 0.0.15_20100212-1
    nvidia opensource X driver
local/xf86-video-nouveau 0.0.15_git20100221-1
    Open Source 3D acceleration driver for nVidia cards (experimental)
Comment 1 Marcin Kościelnicki 2010-03-20 12:42:01 UTC
Sigh. I know that one.

The issue here is that you're using the original NV04, which doesn't
implement some setup methods in hardware on the 2d objects, and we
need to write emulation for them.

I'll tackle that soon, in the meantime you can use "noaccel=1" to work
around the problem...
Comment 2 Brent 2010-03-20 19:58:16 UTC
I tried "noaccel=1" and just "noaccel" on the kernel boot parameters in /boot/grub/menu.lst.  Didn't help.  X is still using VESA drivers, and when I try to switch to a text screen with ctrl-alt-F1, I get a blank screen, and those messages continuously logged until I switch back with ctrl-alt-F7.

But thanks for saying it'll be worked on, rather than saying something like "oh, we decided we're not going to support NV04 after all."  I'll keep an eye out for updates, and try them when I see them.

I am currently working around the problem by having "MODULES=(!drm)" in /etc/rc.conf, and using VESA drivers of course.
Comment 3 Marcin Kościelnicki 2010-03-21 01:48:30 UTC
It has to be "nouveau.noaccel=1" if passed on main kernel line.
Comment 4 Pekka Paalanen 2010-03-21 01:59:21 UTC
The noaccel parameter is a Nouveau module parameter, so either pass it to the nouveau module on load (modprobe.conf), or if nouveau is built-in, nouveau.noaccel=1 on kernel command line.

Your X uses the Vesa driver? That will screw up the card, if Nouveau is loaded.
Either use Nouveau DRM with Nouveau DDX, or no Nouveau at all with Vesa DDX.
Comment 5 Brent 2010-03-22 17:40:00 UTC
Thank you for that clarification.  To /etc/modprobe.d/modprobe.conf, I added this line:

options nouveau noaccel=1

And now I can use the Nouveau driver.  Both the text screen and the graphics screen work.  Also had to make a config file for X to stop it from defaulting to vesa, with "xorg --config".  I have HAL running, and seems without any config file, X tries for the nv driver first, then fbdev, then vesa.  X is now using Nouveau.

A clarification of my original report.  The computer did finish booting up.  I just couldn't tell because the text screen was blank, the disk never quit spinning, and I was still setting up the system and hadn't yet switched to runlevel 5.  So long as it was trying to display the text screen, those errors were being logged.
Comment 6 Marcin Kościelnicki 2010-04-11 11:19:03 UTC
Could you try with http://0x04.net/~mwk/0002-drm-nv04-Implement-missing-nv04-PGRAPH-methods-in-so.patch and without the noaccel option?
Comment 7 Brent 2010-05-11 23:35:47 UTC
Tried the above patch against the stock 2.6.33.3 kernel, with acceleration turned on, and got  "(EE) [drm] failed to open device" in Xorg.0.log.  But it did give me a 80x50 text console, so it seems the framebuffer works.  And it is no longer filling the logs with error messages.  So that's something.

From what I read, nouveau and drm are very touchy about matching versions.  I started with the kernel configuration for Arch Linux, but ended up removing many modules as otherwise that 350 MHz Pentium II needed 6 hours to compile.  I updated the system before I tried the patch.  These are the current versions for Arch Linux:

nouveau-drm 0.0.16_20100313-2
xf86-video-nouveau 0.0.15_git20100314-1
kernel26 2.6.33.3-2


Here is the tail of Xorg.0.log:

(II) Module nouveau: vendor="X.Org Foundation"
        compiled for 1.7.5.902, module version = 0.0.15
        Module class: X.Org Video Driver
        ABI class: X.Org Video Driver, version 6.0
(II) NOUVEAU driver 
(II) NOUVEAU driver for NVIDIA chipset families :
        RIVA TNT    (NV04)
        RIVA TNT2   (NV05)
        GeForce 256 (NV10)
        GeForce 2   (NV11, NV15)
        GeForce 4MX (NV17, NV18)
        GeForce 3   (NV20)
        GeForce 4Ti (NV25, NV28)
        GeForce FX  (NV3x)
        GeForce 6   (NV4x)
        GeForce 7   (G7x)
        GeForce 8   (G8x)
(II) Primary Device is: PCI 01@00:00:0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 7, (OK)
drmOpenByBusid: Searching for BusID pci:0000:01:00.0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 7, (OK)
drmOpenByBusid: drmOpenMinor returns 7
drmOpenByBusid: drmGetBusid reports pci:0000:01:00.0
(EE) [drm] failed to open device
(EE) No devices detected.

Fatal server error:
no screens found
Comment 8 Xavier 2010-05-12 02:45:17 UTC
I am not sure I understood, but if you built a custom kernel, you need to rebuild the external modules for your custom kernel :
nouveau-drm 0.0.16_20100313-2
Because this package only provides modules built for Arch kernel. Check list of installed files with pacman -Ql nouveau-drm.

It's also in nouveau-drm (external modules) that you need to apply the patch. Did you apply the patch to a 2.6.33 kernel, and were trying to use the nouveau module shipped with the kernel ? If so, that will give you abi 0.0.15. So you also need to downgrade both libdrm and xf86-video-nouveau to the previous version to match the abi.

Imo the simplest way is to just follow the instructions here to fetch the kernel from git, then apply mwk patch with git am, then build&install&enjoy :
http://nouveau.freedesktop.org/wiki/InstallDRM

If you insist on using a package, you could try :
http://aur.archlinux.org/packages.php?ID=30158

By getting it from git, you will have nouveau 0.0.16 which will work with your current libdrm and xf86-video-nouveau.
Comment 9 Brent 2010-05-12 14:59:26 UTC
Created attachment 35601 [details]
Xorg log (1.7.6-3, 0.0.15_git20100314-1, 2.4.19-2, 2.6.34-rc5)

xorg-server 1.7.6-3
xf86-video-nouveau 0.0.15_git20100314-1
libdrm 2.4.19-2
kernel 2.6.34-rc5
Comment 10 Brent 2010-05-12 15:01:37 UTC
I built a custom kernel with the source of 2.6.33.3 from kernel.org, not from an Arch Linux package.  I patched it with the above mwk patch.  That resulted in the error I gave in the previous comment.  

I am not clear on what all parts of Nouveau are what.  The nouveau-drm package seems to be only kernel modules.  When I built my custom kernel, I made sure it had its own Nouveau modules.  I am not sure what is in libdrm (/usr/lib/libdrm_nouveau.so), but Pacman reports a version number of 2.4.19-2.  From what I read, the version of Nouveau in 2.6.33.3 is too old to work with versions of libdrm from 2.4.18 on.  I suppose this means that the Arch Linux people must have patched their version of Linux kernel 2.6.33.3, otherwise it should not have worked with that version of libdrm?  Finally, there is the xf86-video-nouveau package, which now is version 0.0.15_git20100314-1.

This time, I cloned the git repository of the Nouveau sources with this command:
"git clone --depth 1 git://anongit.freedesktop.org/nouveau/linux-2.6", which turned out to be version 2.6.34-rc5 of the Linux kernel.  The patch utility refused to apply the mwk patch, as they were already there.  So I tried that kernel as is, and this time X segfaulted, dropping the computer back to the 80x50 text screen.  Should I build different versions of libdrm, or xf86-video-nouveau?

Attached is the Xorg.0.log resulting from this combination of Nouveau kernel 2.6.34-rc5, libdrm 2.4.19-2, and xf86-video-nouveau 0.0.15_git20100314-1.
Comment 11 Xavier 2010-05-13 02:10:38 UTC
AFAIK, your last combination is fine and is what I wanted you to try.
Can you also paste a full kernel log with that ?

I am afraid that very few people use nouveau on a tnt card, so it's much less likely to work than anything newer (even the first geforce cards).

I doubt updating libdrm and xf86-video-nouveau would fix your problem. But this should at least have the benefit to provide debug symbols and not strip them, which could provide a better segfault trace.
Instructions are there : http://nouveau.freedesktop.org/wiki/InstallNouveau

(btw Arch Linux doesn't patch the kernel. As you noticed yourself, they ship a nouveau-drm package with external modules, which can be more recent than what the kernel version provides).
Comment 12 Brent 2010-05-13 09:51:11 UTC
Created attachment 35628 [details]
messages.log

3 boots in this log:

Arch Linux kernel 2.6.33.3, noaccel
Nouveau Linux kernel 2.6.34-rc5, noaccel
Nouveau Linux kernel 2.6.34-rc5, accel
Comment 13 Brent 2010-05-13 12:56:48 UTC
Created attachment 35639 [details]
Xorg log (1.7.6-3, 0.0.16, 2.4.20, 2.6.34-rc5)

Tried again, this time with the latest source code packages, obtained with git from Freedesktop.org according to the instructions given above.  (Was painful trying to figure out all the packages that were needed.  Here is the list:  autoconf, automake, libtools, pkgconfig, pth, git, xf86driproto,
dri2proto, xorg-util-macros.  And need libpthread-stubs from x.org)

So with kernel 2.6.34-rc5, libdrm 2.4.20, and xf86-video-nouveau 0.0.16, it worked!  For about 15 minutes, and then X crashed :(.  Here are some logs.
Comment 14 Brent 2010-05-13 13:01:29 UTC
Created attachment 35640 [details]
Xorg log (1.7.6-3, 0.0.16, 2.4.20, 2.6.34-rc5)
Comment 15 Brent 2010-05-13 13:07:23 UTC
Created attachment 35641 [details]
kernel log spam

Spam generated by adding "options drm debug=1" to /etc/modprobe.d/modprobe.conf
Comment 16 Marcin Kościelnicki 2010-05-13 13:28:57 UTC
The original bug is fixed already, please open a new one if you're seeing other problems.
Comment 17 Mauro Molinari 2013-05-24 14:03:59 UTC
Hi Marcin,
I have a Creative Labs Graphics Blaster Riva TNT and I encountered this problem on both Debian Squeeze (with 2.6.32 kernel) and Linux Mint 12 LXDE (with 3.0 kernel). Symptom: during boot, the screen goes blank (some garbage shown) and the boot process does not terminate. dmesg shows an endless list of "PGRAPH_NOTIFY - nSource: PATCH_EXCEPTION, nStatus: INVALID_STATE PROTECTION_FAULT" errors.

To fix the problem I was using the "nomodeset" kernel booting parameter, then I found this bug report and I ended up with "nouveau.noaccel=1" which at least gives me a high definition console.

As far as I understand it correctly, now the problem is solved (i.e.: no need for nouveau.noaccel=1 anymore), but which kernel version does carry the fixed nouveau driver?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.