Bug 71659

Summary:

[NVD9] Hangs under load with ![ PFIFO][0000:01:00.0] unhandled status 0x00800000

Product:

xorg

Reporter:

Rebecca Palmer <rebecca_palmer>

Component:

Driver/nouveau

Assignee:

Nouveau Project <nouveau>

Status:

RESOLVED MOVED

QA Contact:

Xorg Project Team <xorg-team>

Severity:

major

Priority:

high

CC:

gamaro100, jv356, rapiteanu.catalin

Version:

git

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
Xorg log	none
kernel log	none
kernel log 2	none
Xorg log 2	none
Ubuntu 14.04 (Linux 3.13.0) log	none
kernel log 3.13rc7 and 3.11+firmware	none
kernel log 3.13rc+firmware	none
kernel log 3.14rc-Mar-26	none
Xorg log 3.14rc-Mar-26	none
kernel log 3.15rc-Jun-27	none
system log	none
system log nouveau kernel linux-3.16 branch	none
Xorg-21-11-15 - Nouveau Xorg failure	none
dmesg log after the freeze occured.	none
Added a more a dmesg debug log for the problem.	none
Full dmesg log with kernel 4.4.1	none

Description Rebecca Palmer 2013-11-15 20:14:42 UTC

With the Nouveau driver selected, my NVS 4200M/Ubuntu 13.10 system frequently hangs when under heavy graphics load (flightgear usually triggers this within a few minutes) and/or on battery power.  This does not happen with the nvidia-319 binary driver.

The mouse pointer continues to move at first, but often freezes later; the keyboard LEDs do not react.  Sounds already playing continue until finished, but no new ones start (i.e. applications are frozen).  Alt+SysRq works.

The kernel log always has the error
nouveau ![   PFIFO][0000:01:00.0] unhandled status 0x00800000
usually preceded by several instances of the warning
nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
and followed by a wide range of other errors.

This happens with both the default (3.11.0) kernel (Ubuntu bug https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1243557 ) and the latest git kernel, with the default userspace (libdrm-nouveau 2.4.46/xserver-xorg-video-nouveau 1.0.9/mesa 9.2.1); the attached logs are with the latter (crashes at Nov 15 18:05 and 18:42 in the kernel log, the Xorg log is from the second of these).

With the latest git userspace (xorg/proto/dri3proto, mesa/drm, xcb/libxcb, xorg/lib/libxshmfence, mesa/mesa, xorg/proto/presentproto, xcb/proto, nouveau/xf86-video-nouveau), the system hangs on boot (at a correctly displayed graphical splash screen), with Alt+SysRq not working and nothing recognisable as an error in the logs.  I suspect this is due to an incompatible combination of Ubuntu and latest-git (the recommended mesa/drm, nouveau/xf86-video-nouveau, mesa/mesa wouldn't compile on its own, as it needs dri3 which isn't in Ubuntu yet).

Comment 1 Rebecca Palmer 2013-11-15 20:16:10 UTC

Created attachment 89286 [details]
Xorg log

Comment 2 Ilia Mirkin 2013-11-15 20:19:30 UTC

There was a bug introduced in 3.11 (I think) for nvc1,nvd7,nvd9 which is fixed by http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=89ad841ffd3eccd06e2a12649f4a5028ecb973b7. I'm not sure what the user-visible effect of the bug is, I suspect it depends on a lot of local configuration settings. I don't know if this will help you, but I believe it's worth trying.

Comment 3 Rebecca Palmer 2013-11-15 20:20:09 UTC

Created attachment 89287 [details]
kernel log

Comment 4 Rebecca Palmer 2013-11-15 20:29:11 UTC

> http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=89ad841ffd3eccd06e2a12649f4a5028ecb973b7

The git kernel I tested was yesterday's, so would already have that fix.

Comment 5 Rebecca Palmer 2013-11-18 18:54:11 UTC

Created attachment 89421 [details]
kernel log 2

Comment 6 Rebecca Palmer 2013-11-18 18:54:51 UTC

Created attachment 89422 [details]
Xorg log 2

Comment 7 Rebecca Palmer 2013-11-18 18:59:34 UTC

Switching to the 9.2 branch of mesa makes the git userspace start, but it still has the original bug.  In the attached logs:

Nov 18 16:29 and Xorg log: git kernel, git userspace (libdrm head, xf86-video-nouveau head, mesa 9.2 branch head)
Nov 18 16:37: Ubuntu kernel, git userspace

Comment 8 Rebecca Palmer 2013-11-18 23:46:11 UTC

This bug seemed to become harder to trigger when I installed git-as-of-Oct-26, then easier again when I installed git-as-of-Nov-14, but given that this change seemed to persist after returning to 3.11 (suggesting a left-behind configuration change) and the general randomness of the bug, this would not give a reliable bisection (so the date mismatch doesn't rule out #71662 being the same bug).

In 3.8.0 the same symptoms occur but the log message is
nouveau ![   PFIFO][0000:01:00.0] unhandled status 0x01000000
I suspect (but have not tested) that this change was http://cgit.freedesktop.org/nouveau/linux-2.6/commit/drivers/gpu/drm/nouveau/core/engine/fifo/nvc0.c?id=32256c87ead3edec86bed5023a0ff96a6d907931 ,i.e. this error was what is now the warning.

Is 0x00800000 unhandled because it is an inherently fatal error, or because nobody outside Nvidia knows what it means? https://github.com/envytools/envytools/blob/master/hwdocs/fifo/nvc0-pfifo.rst is a contentless stub.

Comment 9 Ilia Mirkin 2013-11-27 05:06:57 UTC

*** Bug 71662 has been marked as a duplicate of this bug. ***

Comment 10 Rebecca Palmer 2014-01-09 21:25:51 UTC

Created attachment 91782 [details]
Ubuntu 14.04 (Linux 3.13.0) log

This bug is easier to trigger in Ubuntu Trusty (kernel 3.13.0, libdrm-nouveau2 2.4.50, xserver-xorg-video-nouveau 1.0.10, mesa 10.0.1), occurring immediately on heavy graphics load and within a few minutes even in ordinary use.

The attached log contains more instances of the warning than were normal in 13.10, but not the final error (I suspect it wasn't synced to disk).

Comment 11 Ilia Mirkin 2014-01-09 21:34:04 UTC

Out of curiousity, does using blob pgraph fw help with this issue?

You can extract it yourself by using the instructions at http://nouveau.freedesktop.org/wiki/NVC0_Firmware/ or you can try using my script (https://raw2.github.com/imirkin/re-vp2/master/extract_firmware.py) with the 325.15 blob, although I'm not _100%_ sure that it produces the correct graph fw for nvd9; that's a very recently-added feature.

Note that you'll need to add nouveau.config=NvGrUseFW=1 to your kernel cmdline, and make sure that the fw is reachable when the nouveau module loads.

Comment 12 Rebecca Palmer 2014-01-10 20:39:35 UTC

Created attachment 91842 [details]
kernel log 3.13rc7 and 3.11+firmware

The 3.13-rc7 upstream kernel in Ubuntu 13.10 also sometimes crashes, but not as often as the near-identical kernel in Trusty; this might mean the regression is in userspace, or might be the same persistence (presumably a left-behind configuration change from 3.12) we saw earlier.

The blob firmware (in the standard 3.11 kernel) turns the login screen blank, with no recognisable error in the log.

Comment 13 Ilia Mirkin 2014-01-10 20:49:25 UTC

(In reply to comment #12)
> The blob firmware (in the standard 3.11 kernel) turns the login screen
> blank, with no recognisable error in the log.

Was this with mmiotrace'd firmware or with firmware extracted using my script? If the latter, try moving nvd7_fuc*[cd] over nvd9_fuc*[cd] -- perhaps they're swapped. Also 3.11 contained a bug for nvd7/nvd9 (and nvc1) which was fixed by 89ad841ffd3e and backported to 3.12.x iirc. Although I'm not sure if that's important when the blob fw is used.

Comment 14 Rebecca Palmer 2014-01-13 23:08:54 UTC

Your script, and kernel 3.13-rc7 with either the "nvd9" or "nvd7" firmware also gives a blank screen.

Comment 15 Kelly Doran 2014-01-15 03:40:30 UTC

Jan 10 20:09:20 lap14 kernel: [    1.610566] nouveau  [  PGRAPH][0000:01:00.0] using external firmware
Jan 10 20:09:20 lap14 kernel: [    1.611187] nouveau E[  PGRAPH][0000:01:00.0] failed to load fuc409c
Jan 10 20:09:20 lap14 kernel: [    1.611190] nouveau E[  DEVICE][0000:01:00.0] failed to create 0x1800d915, -22
Jan 10 20:09:20 lap14 kernel: [    1.611193] nouveau E[     DRM] failed to create 0x80000080, -22

You must have named the fuc files wrong or they were not in the initrd, if nouveau fails to load the firmware like this it simply bails out and the driver won't finish loading, hence the blank screen.

Comment 16 Rebecca Palmer 2014-01-15 09:28:56 UTC

Created attachment 92127 [details]
kernel log 3.13rc+firmware

What does "not in the initrd" mean?  The files were in /lib/firmware/nouveau with the names the script gave them (nvd9_fuc*).

The errors with kernel 3.13-rc7 (attached) are similar:
Jan 13 22:47:57 lap14 kernel: [    1.653189] nouveau  [  PGRAPH][0000:01:00.0] using external firmware
Jan 13 22:47:57 lap14 kernel: [    1.653203] nouveau 0000:01:00.0: Direct firmware load failed with error -2
Jan 13 22:47:57 lap14 kernel: [    1.653204] nouveau 0000:01:00.0: Falling back to user helper
Jan 13 22:47:57 lap14 kernel: [    1.653447] nouveau 0000:01:00.0: Direct firmware load failed with error -2
Jan 13 22:47:57 lap14 kernel: [    1.653452] nouveau 0000:01:00.0: Falling back to user helper
Jan 13 22:47:57 lap14 kernel: [    1.653651] nouveau E[  PGRAPH][0000:01:00.0] failed to load fuc409c
Jan 13 22:47:57 lap14 kernel: [    1.653658] nouveau E[  DEVICE][0000:01:00.0] failed to create 0x1800d916, -22
Jan 13 22:47:57 lap14 kernel: [    1.653663] nouveau E[     DRM] failed to create 0x80000080, -22
Jan 13 22:47:57 lap14 kernel: [    1.654258] nouveau: probe of 0000:01:00.0 failed with error -22

Comment 17 Ilia Mirkin 2014-01-15 09:34:51 UTC

(In reply to comment #16)
> Created attachment 92127 [details]
> kernel log 3.13rc+firmware
> 
> What does "not in the initrd" mean?  The files were in /lib/firmware/nouveau
> with the names the script gave them (nvd9_fuc*).

But was this /lib/firmware/nouveau available when the nouveau module loaded? E.g. if it loads from initrd, this needs to be in the initrd. If nouveau is built-in, the firmware files need to be baked into the kernel (I think).

Comment 18 Rebecca Palmer 2014-01-15 19:20:35 UTC

It still doesn't find the firmware after updating the initramfs (sudo update-initramfs -u -k all), whether it is placed at /lib/firmware/nouveau/nvd9_fuc*, /lib/firmware/nouveau/fuc*, /lib/firmware/<kernel_version>/nouveau/nvd9_fuc* or /lib/firmware/<kernel_version>/nouveau/fuc*.  (The script uses nvd9_fuc* but http://nouveau.freedesktop.org/wiki/NVC0_Firmware/ says just fuc* for pre-NVE0 cards.)

My 3.13-rc7 test kernel was compiled with the procedure in https://wiki.ubuntu.com/KernelTeam/GitKernelBuild; the .config lines containing "nouveau" are
CONFIG_DRM_NOUVEAU=m
CONFIG_NOUVEAU_DEBUG=5
CONFIG_NOUVEAU_DEBUG_DEFAULT=3
CONFIG_DRM_NOUVEAU_BACKLIGHT=y
and kernel/drivers/gpu/drm/nouveau/nouveau.ko is on the modules.order list, not the modules.builtin list.

Comment 19 Ilia Mirkin 2014-01-15 19:25:53 UTC

(In reply to comment #18)
> It still doesn't find the firmware after updating the initramfs (sudo
> update-initramfs -u -k all), whether it is placed at
> /lib/firmware/nouveau/nvd9_fuc*, /lib/firmware/nouveau/fuc*,
> /lib/firmware/<kernel_version>/nouveau/nvd9_fuc* or
> /lib/firmware/<kernel_version>/nouveau/fuc*.  (The script uses nvd9_fuc* but
> http://nouveau.freedesktop.org/wiki/NVC0_Firmware/ says just fuc* for
> pre-NVE0 cards.)

Erm, that's a lie. I've been meaning to fix it. I'm like 99.999% sure it always has to be nvXX_fucYYY[cd]. I'll double-check it again before updating the wiki. Putting it in /lib/firmware/nouveau is sufficient -- adding in the kernel version also works, but you'd normally just do that for firmware that was kernel version dependent, which this isn't.

> 
> My 3.13-rc7 test kernel was compiled with the procedure in
> https://wiki.ubuntu.com/KernelTeam/GitKernelBuild; the .config lines
> containing "nouveau" are
> CONFIG_DRM_NOUVEAU=m
> CONFIG_NOUVEAU_DEBUG=5
> CONFIG_NOUVEAU_DEBUG_DEFAULT=3
> CONFIG_DRM_NOUVEAU_BACKLIGHT=y
> and kernel/drivers/gpu/drm/nouveau/nouveau.ko is on the modules.order list,
> not the modules.builtin list.

Don't know what to say. It just uses request_firmware() so it's whatever the kernel's normal mechanism for loading firmware.

Comment 20 Rebecca Palmer 2014-03-30 11:45:48 UTC

Created attachment 96610 [details]
kernel log 3.14rc-Mar-26

Comment 21 Rebecca Palmer 2014-03-30 11:46:25 UTC

Created attachment 96611 [details]
Xorg log 3.14rc-Mar-26

Comment 22 Rebecca Palmer 2014-03-30 11:53:52 UTC

This bug still exists in 3.14rc commit f217c44ebd41ce7369d2df07622b2839479183b0 (26 Mar Linus' tree, Ubuntu userspace; as the nouveau/master branch hasn't been used for 5 months, should we stop suggesting that people test with it?).

Is there anything else I can do to help?  I will probably only have this machine for a few more months.

Comment 23 Rebecca Palmer 2014-06-29 21:58:58 UTC

Created attachment 101990 [details]
kernel log 3.15rc-Jun-27

This bug still exists in drm-nouveau-next (commit 242a42eadfc17448a0d5b2ffc0cb191c8b51971a) with Ubuntu 14.04 userspace.  The error message has changed to "E[   PFIFO][0000:01:00.0] INTR 0x00800000", and some of the "INTR 0x01000000: 0x00000005" warnings now come _after_ it.

In Ubuntu 14.04 (not 13.10) with either this git kernel or its standard 3.13, there is also a hang on resume from suspend (https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1333417), but it is not clear whether this is a driver or BIOS problem.

In the attached, the original bug is at 08:09:47 and 08:20:06, the resume failure (which can log "GPU lockup", "failed to idle chanel 0xcccc0001 [Xorg[1185]]", or nothing) is at 08:14:53.

Comment 24 Adrian Băcîrcea 2014-07-22 20:28:30 UTC

Hi, sorry for barging in but I'm also hitting this on the same graphics card (NVS 4200M) on a Dell Latitude E6520 running Arch Linux.
I'm trying to use PRIME for a VA-API + VDPAU setup and also for some games but as soon as I start I get this message in dmesg : 
kernel: nouveau E[   PFIFO][0000:01:00.0] INTR 0x00800000
kernel: nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005
I'm running the 3.15.6-1-ARCH kernel with the following packages:
intel-dri-git 10.3.0_devel.64000-1
lib32-intel-dri-git 10.3.0_devel.64007-1
lib32-mesa-git 10.3.0_devel.64007-1
lib32-mesa-libgl-git 10.3.0_devel.64007-1
lib32-nouveau-dri-git 10.3.0_devel.64007-1
lib32-opencl-mesa-git 10.3.0_devel.64007-1
lib32-svga-dri-git 10.3.0_devel.64007-1
libdrm-git 2.4.54.19.gc0b34dc-1
mesa-git 10.3.0_devel.64000-1
mesa-libgl-git 10.3.0_devel.64000-1
nouveau-dri-git 10.3.0_devel.64000-1
opencl-mesa-git 10.3.0_devel.64000-1
xf86-video-nouveau-git 1.0.10.34.gedd1608-1
If I stop the application the uses the NVIDIA card as soon as I see the message, everything is OK. But if I continue, X freezes and the only way I can regain control is by a cold reboot.

Comment 25 Adrian Băcîrcea 2014-07-22 20:29:54 UTC

Created attachment 103302 [details]
system log

Comment 26 Adrian Băcîrcea 2014-07-22 23:12:19 UTC

Created attachment 103307 [details]
system log nouveau kernel linux-3.16 branch

I've compiled the kernel on the linux-3.16 branch from http://cgit.freedesktop.org/nouveau/linux-2.6/ and X still freezes but now it recovers temporarily. 
After it I can close the program that caused the freeze and then I can't use the nouveau card anymore (glxgears would cause another temporary freeze followed by a black window). 
After a while, though, everything started to become unresponsive and I had to cold reboot to do anything.

Comment 27 Viorel-Cătălin Răpițeanu 2015-11-20 22:47:37 UTC

I'm having the same problem on a Dell Latitude e6420 having an NVIDIA Corporation GF119M [NVS 4200M] GPU with the following software installed:
xf86-video-nouveau 1.0.11+31+g1ff13a9-1
mesa 11.0.5-1
xorg-server 1.18.0-3

The error I'm getting on dmesg is the following:
[  215.375729] nouveau E[   PFIFO][0000:01:00.0] INTR 0x00800000

I've attached the complete Xorg log under the name "Xorg-21-11-15.log".

Comment 28 Viorel-Cătălin Răpițeanu 2015-11-20 22:52:09 UTC

Created attachment 120001 [details]
Xorg-21-11-15 - Nouveau Xorg failure

Comment 29 Viorel-Cătălin Răpițeanu 2015-11-22 04:27:07 UTC

Downgrading libdrm from 2.4.65 to 2.4.64, the kernel to 4.2.3 and xorg-server to 1.17.2 hasn't changed the behaviour. If there is anything else I should try, leave a message.

Comment 30 Yoram 2015-11-23 10:46:43 UTC

I experience the same issue.
I have noticed that it only occurs if plasma is built with gles2 support (on gentoo, requiring the while QT+KDE stack to be built with gles2).

A very similar freeze occurs with enlightenment/wayland, so I assume it's a bug in nouveau.

Comment 31 Viorel-Cătălin Răpițeanu 2015-12-02 22:18:26 UTC

Created attachment 120290 [details]
dmesg log after the freeze occured.

Comment 32 Viorel-Cătălin Răpițeanu 2016-01-12 21:27:57 UTC

Created attachment 120994 [details]
Added a more a dmesg debug log for the problem.

The freeze occurs after this error:
[  409.260043] nouveau E[   PFIFO][0000:01:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]

As it can be seen imediately after the freeze, this error is spammed on dmesg every 5 seconds.

Comment 33 Viorel-Cătălin Răpițeanu 2016-01-22 14:18:26 UTC

I've tested this problem again with the latest nouveau driver and the latest devel kernel kernel and is still reproducible.

Comment 34 Viorel-Cătălin Răpițeanu 2016-02-01 23:37:10 UTC

Created attachment 121445 [details]
Full dmesg log with kernel 4.4.1

Comment 35 dusan 2018-02-11 11:41:37 UTC

I have Dell 6520 with NVS 4200M.

Problem is still there. With Mag6 64bit latest update. I mean problem is with graphic ram manegement because when I play video (with XVideo) desktop is freezing much sooner how when using programs.
 

kernel-desktop-4.14.16
lib64drm_nouveau2-2.4.89
x11-driver-video-nouveau-1.0.15

Comment 36 Martin Peres 2019-12-04 08:40:50 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/74.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.