Bug 28095 - X crash with PFIFO_CACHE_ERROR. (Nouveau on Riva TNT).
Summary: X crash with PFIFO_CACHE_ERROR. (Nouveau on Riva TNT).
Status: NEEDINFO
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-13 14:38 UTC by Brent
Modified: 2015-03-17 16:10 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
lernel log (14.32 KB, application/octet-stream)
2010-05-13 14:38 UTC, Brent
no flags Details
Xorg log (1.7.6-3, 0.0.16, 2.4.20, 2.6.34-rc5) (18.74 KB, text/plain)
2010-05-13 14:39 UTC, Brent
no flags Details
/var/log/messages, Puppy Linux, kernel version 3.14.20 (9.99 KB, text/plain)
2015-01-26 01:12 UTC, Brent
no flags Details
/var/log/Xorg.0.log, Puppy Linux, kernel version 3.14.20 (4.46 KB, text/plain)
2015-01-26 01:13 UTC, Brent
no flags Details
/var/log/messages, Puppy Linux, kernel version 3.14.20 (50.11 KB, text/plain)
2015-01-26 01:15 UTC, Brent
no flags Details
/var/log/Xorg.0.log, Puppy Linux, kernel version 3.14.20 (18.04 KB, text/plain)
2015-01-26 01:16 UTC, Brent
no flags Details
picture of screen in 800x600 mode (365.85 KB, image/jpeg)
2015-02-11 00:25 UTC, Brent
no flags Details
/var/log/messages with debug, Puppy Linux, kernel version 3.17.7 (75.49 KB, text/plain)
2015-02-17 21:15 UTC, Brent
no flags Details
Xorg.0.log with debug, Puppy Linux 6 with kernel 3.17.7 (17.46 KB, text/plain)
2015-02-17 21:18 UTC, Brent
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Brent 2010-05-13 14:38:57 UTC
Created attachment 35642 [details]
lernel log

System is a Pentium II with a Diamond Viper V550 (Riva TNT) card.  OS is Arch Linux, with latest versions of Nouveau from git repositories on freedesktop.org:

kernel 2.6.34-rc5
libdrm 2.4.20
xf86-video-nouveau 0.0.16

I have "options drm debug=1" in /etc/modprobe.d/modprobe.conf.  Acceleration is enabled.

With this setup, X works for a few minutes and then crashes.  I was running Firefox and lxterminal (am using lxde with openbox window manager).  The first problem I saw was the titlebar for lxterminal was not always rendered.  Grabbed the graphics from the web page behind it.  The crash happened when I tried to search for something in Firefox's search dialog at the upper left.  Went into a loop in which the pop up menu of suggested search terms flickered on and off perhaps 10 times, then X crashed.  Saw many of these messages in kernel.log:

[drm] nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 
15/5 Mthd 0x0638 Data 0x00dcdad5
Comment 1 Brent 2010-05-13 14:39:28 UTC
Created attachment 35643 [details]
Xorg log (1.7.6-3, 0.0.16, 2.4.20, 2.6.34-rc5)
Comment 2 Pierre Moreau 2014-12-09 17:57:06 UTC
Moving to Nouveau.

Could you please retest with a recent kernel?
(The bug went unnoticed as it wasn't linked to Nouveau.)
Comment 3 Brent 2015-01-26 01:11:29 UTC
Tested with Puppy Linux with kernel 3.14.20.  This is with Tahrpup 6.0, with an updated kernel released by the maintainer of Tahrpup.

The Xserver has no mouse pointer, and displays only vertical stripes.  But ctrl-alt-F2 worked and I can still use the text screen console.  Looked at /var/log/messages and saw PROTECTION_FAULT errors.

Tahrpup 6.0 is here:
http://distro.ibiblio.org/puppylinux/puppy-tahr/iso/tahrpup%20-6.0-CE/tahr-6.0-CE_PAE.iso
And the update to kernel 3.14.20 is here:
http://distro.ibiblio.org/puppylinux/puppy-tahr/kernels/kernel-3.14.20_PAE_update.tar.xz

I see that there is an newer update to kernel 3.17, which I will try next.

Sample of the error messages:

Jan 26 16:04:59 puppypc29622 user.err kernel: nouveau E[  PGRAPH][0000:01:00.0]  NOTIFY nsource: PROTECTION_ERROR nstatus: PROTECTION_FAULT
Jan 26 16:04:59 puppypc29622 user.err kernel: nouveau E[  PGRAPH][0000:01:00.0] ch 1 [X[10566]] subc 2 class 0x0042 mthd 0x0180 data 0x00003a04
Comment 4 Brent 2015-01-26 01:12:50 UTC
Created attachment 112819 [details]
/var/log/messages, Puppy Linux, kernel version 3.14.20
Comment 5 Brent 2015-01-26 01:13:41 UTC
Created attachment 112820 [details]
/var/log/Xorg.0.log, Puppy Linux, kernel version 3.14.20
Comment 6 Brent 2015-01-26 01:15:35 UTC
Created attachment 112821 [details]
/var/log/messages, Puppy Linux, kernel version 3.14.20
Comment 7 Brent 2015-01-26 01:16:04 UTC
Created attachment 112822 [details]
/var/log/Xorg.0.log, Puppy Linux, kernel version 3.14.20
Comment 8 Brent 2015-01-27 21:14:57 UTC
Tried Puppy Linux 6 with Linux kernel version 3.17.7, and got the identical errors as with 3.14.20.

I can't say if the PFIFO_CACHE_ERROR is still present, because I can't get past this error to run any XWindows apps.  X is running, it has not crashed, but the display is corrupted and unusable.
Comment 9 Ilia Mirkin 2015-01-27 21:18:42 UTC
(In reply to Brent from comment #8)
> Tried Puppy Linux 6 with Linux kernel version 3.17.7, and got the identical
> errors as with 3.14.20.
> 
> I can't say if the PFIFO_CACHE_ERROR is still present, because I can't get
> past this error to run any XWindows apps.  X is running, it has not crashed,
> but the display is corrupted and unusable.

Those protection errors are expected... there's some minor difference between NV04 and NV05, and so those happen.... iirc with the NOP handle or class or something. I looked into it at one point and decided I didn't care enough (see https://bugs.freedesktop.org/show_bug.cgi?id=68854). Does it cause any actual issues though?
Comment 10 Ilia Mirkin 2015-01-27 21:33:50 UTC
(In reply to Brent from comment #8)
> X is running, it has not crashed, but the display is corrupted and unusable.

Oh, that's the issue. OK. At what point does the display become corrupted? Can you describe the corruption, and/or take a photo?
Comment 11 Brent 2015-02-04 02:26:02 UTC
The display shows thin vertical stripes.  The very top line is solid, the rest of the lines alternate between white and light gray.  Best as I can make out, the stripes are 3 pixels wide.  I do not see a mouse pointer.  But if I move the mouse to the upper left and try a right click, the left side of the screen turns black, and if I hit Esc while this black area is on the screen, it turns back to the stripes.  I'll have to run Puppy Linux on another computer to see what menu I'm calling up when I right click, but X is accepting mouse and keyboard input.
Comment 12 Brent 2015-02-04 02:29:32 UTC
The display is corrupt at the start.  After Puppy leaves the text console during boot, the screen goes black for a minute, then straight to the vertical stripes.
Comment 13 Brent 2015-02-11 00:25:23 UTC
Created attachment 113332 [details]
picture of screen in 800x600 mode

What happened to ctrl-alt + and - to change resolutions?  Was able to switch between 1024x768 and 800x600 with xrandr.
Comment 14 Ilia Mirkin 2015-02-14 20:54:55 UTC
So by the sounds of it, we broke the display stuff at some point between pre-history (i.e. before nouveau was in mainline) and... now. I'm fairly sure that it's a fairly self-contained problem -- my NV05 TNT2 M64 works fine (or at least did recently, I think I last tested after 3.10 or so). Another user's NV04 also worked OK on linux-3.9 (from bug 68854), although it was the Creative Labs one. However he also had the resolution issue (maxing out at 1024x768).

You have one of the (infamous?) BMP v0 vbioses, may be interesting to attach it to this bug, although I sincerely doubt that is the source of the issues (before BMP v2 or so, we just ignore it entirely). Without nouveau loaded, you can get it from /sys/kernel/debug/dri/0/vbios.rom .

The unfortunate reality here is that you are in posession of fairly unique piece of hardware, and that (I believe) similar HW works fine. So... it's something "funny" going on. 

You might try booting with

nouveau.debug=debug drm.debug=14

and capturing the kernel messages from that. Perhaps something interesting will come up.
Comment 15 Brent 2015-02-17 18:43:35 UTC
Yes, someone else said that hardware was unusual.  But Nouveau did work with it, at elast somewhat, in the later 2.6 kernels.  If it helps at all, the Diamond Viper V550 video card reports a version of 1.93E during POST.

Tried older versions of Puppy Linux, and ran into other problems.  First, the older Puppy tried to load the saved configuration files from the newer Puppy.  Once I stopped that, I got this.  In Xorg.0.log, 5.3.3 Slacko Puppy says:

failed to load kernel module "nouveau"

Puppy falls back to the vesa driver, which, while very slow, does work, and at 1280x1024.

If I do "modprobe nouveau" I get 7 error messages of the form "WARNING: Error inserting xxx (/lib/modules/3.1.10-slacko_4gA/kernel/drivers/.." where xxx is:

button
wmi
agpgart
drm
ttm
drm_kms_helper
nouveau

Tried the debug boot parameters you suggested, with Slacko Puppy and Tahr Puppy but they had no effect.  There is no debug subdirectory in /proc/sys/kernel.  No doubt the stock Puppy kernel does not have debugging included.  Suppose I'll have to build my own kernel to get that, which means a good bit of work.
Comment 16 Ilia Mirkin 2015-02-17 19:03:30 UTC
(In reply to Brent from comment #15)
> Tried the debug boot parameters you suggested, with Slacko Puppy and Tahr
> Puppy but they had no effect.  There is no debug subdirectory in
> /proc/sys/kernel.  No doubt the stock Puppy kernel does not have debugging
> included.  Suppose I'll have to build my own kernel to get that, which means
> a good bit of work.

/sys/kernel/debug not /proc/sys/kernel/debug
Comment 17 Brent 2015-02-17 21:15:53 UTC
Created attachment 113580 [details]
/var/log/messages with debug, Puppy Linux, kernel version 3.17.7

There is a /sys/kernel/debug directory, but it is empty.  However, /var/log/messages has a lot more lines, so I have attached a copy.
Comment 18 Brent 2015-02-17 21:18:48 UTC
Created attachment 113581 [details]
Xorg.0.log with debug, Puppy Linux 6 with kernel 3.17.7
Comment 19 Brent 2015-03-10 22:58:53 UTC
Been trying to track down the oldest kernel that shows the corrupted display, but have run into further problems.  It seems nouveau with the early 3.x kernels does not work at all on this NV04 video card.  I think 3.4 is the first version with nouveau included in the kernel?  Have tried 3.1 (Slacko Puppy 5.3.3), a 3.4.106 kernel I built on the Arch Linux system with the 2.6.35 kernel that does work, and 3.5 (AntiX Linux 12).  Puppy and AntiX fall back to the VESA driver, and Arch with the custom 3.4.106 kernel and X 1.8.1.902 just drops to a text screen.  All try to use nouveau, and all give the same "(EE) [drm] failed to open device" message in Xorg.0.log.
Comment 20 Ilia Mirkin 2015-03-11 18:41:12 UTC
(In reply to Brent from comment #19)
> Been trying to track down the oldest kernel that shows the corrupted
> display, but have run into further problems.  It seems nouveau with the
> early 3.x kernels does not work at all on this NV04 video card.  I think 3.4
> is the first version with nouveau included in the kernel?  Have tried 3.1
> (Slacko Puppy 5.3.3), a 3.4.106 kernel I built on the Arch Linux system with
> the 2.6.35 kernel that does work, and 3.5 (AntiX Linux 12).  Puppy and AntiX
> fall back to the VESA driver, and Arch with the custom 3.4.106 kernel and X
> 1.8.1.902 just drops to a text screen.  All try to use nouveau, and all give
> the same "(EE) [drm] failed to open device" message in Xorg.0.log.

nouveau was first merged in 2.6.33 as a staging driver. It moved out of staging in kernel 3.4. It's possible that your userspace expects a newer abi than the old nouveau provided.

This feels like some sort of tiling fail. Do things work better if you boot with

nouveau.noaccel=1 nouveau.nofbaccel=1
Comment 21 Brent 2015-03-17 16:10:13 UTC
Tried the noaccel and nofbaccel kernel options under Puppy Linux 6.0 with the 3.17.7 kernel.  Didn't make any difference to the display, it is still bad, in the same way as without those options.  Xorg.0.log confirms that acceleration is indeed off.  It is mostly the same as the log I uploaded here, except after the line "(--) Depth 24 pixmap format is 32 bpp", I see:

(EE) NOUVEAU(0): Error creating GPU channel: -19
(EE) NOUVEAU(0): Error initialising acceleration.  Falling back to NoAccel
(==) NOUVEAU(0): Backing store enabled
...

I also tried kernel 3.10.71 on the Arch system.  It was no different than 3.4.106-- [drm] failed to open device.  Yes, I know there could be an API mismatch, and maybe the 3.4 and 3.10 kernels would work with a more recent XWindows.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.