Summary: | [NVD9] Hangs under load with ![ PFIFO][0000:01:00.0] unhandled status 0x00800000 | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Rebecca Palmer <rebecca_palmer> | ||||||||||||||||||||||||||||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||||||||||||||||||||||||||
Status: | RESOLVED MOVED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||||||||||||||||||||
Severity: | major | ||||||||||||||||||||||||||||||||||||
Priority: | high | CC: | gamaro100, jv356, rapiteanu.catalin | ||||||||||||||||||||||||||||||||||
Version: | git | ||||||||||||||||||||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Rebecca Palmer
2013-11-15 20:14:42 UTC
Created attachment 89286 [details]
Xorg log
There was a bug introduced in 3.11 (I think) for nvc1,nvd7,nvd9 which is fixed by http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=89ad841ffd3eccd06e2a12649f4a5028ecb973b7. I'm not sure what the user-visible effect of the bug is, I suspect it depends on a lot of local configuration settings. I don't know if this will help you, but I believe it's worth trying. Created attachment 89287 [details]
kernel log
> http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=89ad841ffd3eccd06e2a12649f4a5028ecb973b7
The git kernel I tested was yesterday's, so would already have that fix.
Created attachment 89421 [details]
kernel log 2
Created attachment 89422 [details]
Xorg log 2
Switching to the 9.2 branch of mesa makes the git userspace start, but it still has the original bug. In the attached logs: Nov 18 16:29 and Xorg log: git kernel, git userspace (libdrm head, xf86-video-nouveau head, mesa 9.2 branch head) Nov 18 16:37: Ubuntu kernel, git userspace This bug seemed to become harder to trigger when I installed git-as-of-Oct-26, then easier again when I installed git-as-of-Nov-14, but given that this change seemed to persist after returning to 3.11 (suggesting a left-behind configuration change) and the general randomness of the bug, this would not give a reliable bisection (so the date mismatch doesn't rule out #71662 being the same bug). In 3.8.0 the same symptoms occur but the log message is nouveau ![ PFIFO][0000:01:00.0] unhandled status 0x01000000 I suspect (but have not tested) that this change was http://cgit.freedesktop.org/nouveau/linux-2.6/commit/drivers/gpu/drm/nouveau/core/engine/fifo/nvc0.c?id=32256c87ead3edec86bed5023a0ff96a6d907931 ,i.e. this error was what is now the warning. Is 0x00800000 unhandled because it is an inherently fatal error, or because nobody outside Nvidia knows what it means? https://github.com/envytools/envytools/blob/master/hwdocs/fifo/nvc0-pfifo.rst is a contentless stub. *** Bug 71662 has been marked as a duplicate of this bug. *** Created attachment 91782 [details]
Ubuntu 14.04 (Linux 3.13.0) log
This bug is easier to trigger in Ubuntu Trusty (kernel 3.13.0, libdrm-nouveau2 2.4.50, xserver-xorg-video-nouveau 1.0.10, mesa 10.0.1), occurring immediately on heavy graphics load and within a few minutes even in ordinary use.
The attached log contains more instances of the warning than were normal in 13.10, but not the final error (I suspect it wasn't synced to disk).
Out of curiousity, does using blob pgraph fw help with this issue? You can extract it yourself by using the instructions at http://nouveau.freedesktop.org/wiki/NVC0_Firmware/ or you can try using my script (https://raw2.github.com/imirkin/re-vp2/master/extract_firmware.py) with the 325.15 blob, although I'm not _100%_ sure that it produces the correct graph fw for nvd9; that's a very recently-added feature. Note that you'll need to add nouveau.config=NvGrUseFW=1 to your kernel cmdline, and make sure that the fw is reachable when the nouveau module loads. Created attachment 91842 [details]
kernel log 3.13rc7 and 3.11+firmware
The 3.13-rc7 upstream kernel in Ubuntu 13.10 also sometimes crashes, but not as often as the near-identical kernel in Trusty; this might mean the regression is in userspace, or might be the same persistence (presumably a left-behind configuration change from 3.12) we saw earlier.
The blob firmware (in the standard 3.11 kernel) turns the login screen blank, with no recognisable error in the log.
(In reply to comment #12) > The blob firmware (in the standard 3.11 kernel) turns the login screen > blank, with no recognisable error in the log. Was this with mmiotrace'd firmware or with firmware extracted using my script? If the latter, try moving nvd7_fuc*[cd] over nvd9_fuc*[cd] -- perhaps they're swapped. Also 3.11 contained a bug for nvd7/nvd9 (and nvc1) which was fixed by 89ad841ffd3e and backported to 3.12.x iirc. Although I'm not sure if that's important when the blob fw is used. Your script, and kernel 3.13-rc7 with either the "nvd9" or "nvd7" firmware also gives a blank screen. Jan 10 20:09:20 lap14 kernel: [ 1.610566] nouveau [ PGRAPH][0000:01:00.0] using external firmware Jan 10 20:09:20 lap14 kernel: [ 1.611187] nouveau E[ PGRAPH][0000:01:00.0] failed to load fuc409c Jan 10 20:09:20 lap14 kernel: [ 1.611190] nouveau E[ DEVICE][0000:01:00.0] failed to create 0x1800d915, -22 Jan 10 20:09:20 lap14 kernel: [ 1.611193] nouveau E[ DRM] failed to create 0x80000080, -22 You must have named the fuc files wrong or they were not in the initrd, if nouveau fails to load the firmware like this it simply bails out and the driver won't finish loading, hence the blank screen. Created attachment 92127 [details]
kernel log 3.13rc+firmware
What does "not in the initrd" mean? The files were in /lib/firmware/nouveau with the names the script gave them (nvd9_fuc*).
The errors with kernel 3.13-rc7 (attached) are similar:
Jan 13 22:47:57 lap14 kernel: [ 1.653189] nouveau [ PGRAPH][0000:01:00.0] using external firmware
Jan 13 22:47:57 lap14 kernel: [ 1.653203] nouveau 0000:01:00.0: Direct firmware load failed with error -2
Jan 13 22:47:57 lap14 kernel: [ 1.653204] nouveau 0000:01:00.0: Falling back to user helper
Jan 13 22:47:57 lap14 kernel: [ 1.653447] nouveau 0000:01:00.0: Direct firmware load failed with error -2
Jan 13 22:47:57 lap14 kernel: [ 1.653452] nouveau 0000:01:00.0: Falling back to user helper
Jan 13 22:47:57 lap14 kernel: [ 1.653651] nouveau E[ PGRAPH][0000:01:00.0] failed to load fuc409c
Jan 13 22:47:57 lap14 kernel: [ 1.653658] nouveau E[ DEVICE][0000:01:00.0] failed to create 0x1800d916, -22
Jan 13 22:47:57 lap14 kernel: [ 1.653663] nouveau E[ DRM] failed to create 0x80000080, -22
Jan 13 22:47:57 lap14 kernel: [ 1.654258] nouveau: probe of 0000:01:00.0 failed with error -22
(In reply to comment #16) > Created attachment 92127 [details] > kernel log 3.13rc+firmware > > What does "not in the initrd" mean? The files were in /lib/firmware/nouveau > with the names the script gave them (nvd9_fuc*). But was this /lib/firmware/nouveau available when the nouveau module loaded? E.g. if it loads from initrd, this needs to be in the initrd. If nouveau is built-in, the firmware files need to be baked into the kernel (I think). It still doesn't find the firmware after updating the initramfs (sudo update-initramfs -u -k all), whether it is placed at /lib/firmware/nouveau/nvd9_fuc*, /lib/firmware/nouveau/fuc*, /lib/firmware/<kernel_version>/nouveau/nvd9_fuc* or /lib/firmware/<kernel_version>/nouveau/fuc*. (The script uses nvd9_fuc* but http://nouveau.freedesktop.org/wiki/NVC0_Firmware/ says just fuc* for pre-NVE0 cards.) My 3.13-rc7 test kernel was compiled with the procedure in https://wiki.ubuntu.com/KernelTeam/GitKernelBuild; the .config lines containing "nouveau" are CONFIG_DRM_NOUVEAU=m CONFIG_NOUVEAU_DEBUG=5 CONFIG_NOUVEAU_DEBUG_DEFAULT=3 CONFIG_DRM_NOUVEAU_BACKLIGHT=y and kernel/drivers/gpu/drm/nouveau/nouveau.ko is on the modules.order list, not the modules.builtin list. (In reply to comment #18) > It still doesn't find the firmware after updating the initramfs (sudo > update-initramfs -u -k all), whether it is placed at > /lib/firmware/nouveau/nvd9_fuc*, /lib/firmware/nouveau/fuc*, > /lib/firmware/<kernel_version>/nouveau/nvd9_fuc* or > /lib/firmware/<kernel_version>/nouveau/fuc*. (The script uses nvd9_fuc* but > http://nouveau.freedesktop.org/wiki/NVC0_Firmware/ says just fuc* for > pre-NVE0 cards.) Erm, that's a lie. I've been meaning to fix it. I'm like 99.999% sure it always has to be nvXX_fucYYY[cd]. I'll double-check it again before updating the wiki. Putting it in /lib/firmware/nouveau is sufficient -- adding in the kernel version also works, but you'd normally just do that for firmware that was kernel version dependent, which this isn't. > > My 3.13-rc7 test kernel was compiled with the procedure in > https://wiki.ubuntu.com/KernelTeam/GitKernelBuild; the .config lines > containing "nouveau" are > CONFIG_DRM_NOUVEAU=m > CONFIG_NOUVEAU_DEBUG=5 > CONFIG_NOUVEAU_DEBUG_DEFAULT=3 > CONFIG_DRM_NOUVEAU_BACKLIGHT=y > and kernel/drivers/gpu/drm/nouveau/nouveau.ko is on the modules.order list, > not the modules.builtin list. Don't know what to say. It just uses request_firmware() so it's whatever the kernel's normal mechanism for loading firmware. Created attachment 96610 [details]
kernel log 3.14rc-Mar-26
Created attachment 96611 [details]
Xorg log 3.14rc-Mar-26
This bug still exists in 3.14rc commit f217c44ebd41ce7369d2df07622b2839479183b0 (26 Mar Linus' tree, Ubuntu userspace; as the nouveau/master branch hasn't been used for 5 months, should we stop suggesting that people test with it?). Is there anything else I can do to help? I will probably only have this machine for a few more months. Created attachment 101990 [details] kernel log 3.15rc-Jun-27 This bug still exists in drm-nouveau-next (commit 242a42eadfc17448a0d5b2ffc0cb191c8b51971a) with Ubuntu 14.04 userspace. The error message has changed to "E[ PFIFO][0000:01:00.0] INTR 0x00800000", and some of the "INTR 0x01000000: 0x00000005" warnings now come _after_ it. In Ubuntu 14.04 (not 13.10) with either this git kernel or its standard 3.13, there is also a hang on resume from suspend (https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1333417), but it is not clear whether this is a driver or BIOS problem. In the attached, the original bug is at 08:09:47 and 08:20:06, the resume failure (which can log "GPU lockup", "failed to idle chanel 0xcccc0001 [Xorg[1185]]", or nothing) is at 08:14:53. Hi, sorry for barging in but I'm also hitting this on the same graphics card (NVS 4200M) on a Dell Latitude E6520 running Arch Linux. I'm trying to use PRIME for a VA-API + VDPAU setup and also for some games but as soon as I start I get this message in dmesg : kernel: nouveau E[ PFIFO][0000:01:00.0] INTR 0x00800000 kernel: nouveau W[ PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005 I'm running the 3.15.6-1-ARCH kernel with the following packages: intel-dri-git 10.3.0_devel.64000-1 lib32-intel-dri-git 10.3.0_devel.64007-1 lib32-mesa-git 10.3.0_devel.64007-1 lib32-mesa-libgl-git 10.3.0_devel.64007-1 lib32-nouveau-dri-git 10.3.0_devel.64007-1 lib32-opencl-mesa-git 10.3.0_devel.64007-1 lib32-svga-dri-git 10.3.0_devel.64007-1 libdrm-git 2.4.54.19.gc0b34dc-1 mesa-git 10.3.0_devel.64000-1 mesa-libgl-git 10.3.0_devel.64000-1 nouveau-dri-git 10.3.0_devel.64000-1 opencl-mesa-git 10.3.0_devel.64000-1 xf86-video-nouveau-git 1.0.10.34.gedd1608-1 If I stop the application the uses the NVIDIA card as soon as I see the message, everything is OK. But if I continue, X freezes and the only way I can regain control is by a cold reboot. Created attachment 103302 [details]
system log
Created attachment 103307 [details] system log nouveau kernel linux-3.16 branch I've compiled the kernel on the linux-3.16 branch from http://cgit.freedesktop.org/nouveau/linux-2.6/ and X still freezes but now it recovers temporarily. After it I can close the program that caused the freeze and then I can't use the nouveau card anymore (glxgears would cause another temporary freeze followed by a black window). After a while, though, everything started to become unresponsive and I had to cold reboot to do anything. I'm having the same problem on a Dell Latitude e6420 having an NVIDIA Corporation GF119M [NVS 4200M] GPU with the following software installed: xf86-video-nouveau 1.0.11+31+g1ff13a9-1 mesa 11.0.5-1 xorg-server 1.18.0-3 The error I'm getting on dmesg is the following: [ 215.375729] nouveau E[ PFIFO][0000:01:00.0] INTR 0x00800000 I've attached the complete Xorg log under the name "Xorg-21-11-15.log". Created attachment 120001 [details]
Xorg-21-11-15 - Nouveau Xorg failure
Downgrading libdrm from 2.4.65 to 2.4.64, the kernel to 4.2.3 and xorg-server to 1.17.2 hasn't changed the behaviour. If there is anything else I should try, leave a message. I experience the same issue. I have noticed that it only occurs if plasma is built with gles2 support (on gentoo, requiring the while QT+KDE stack to be built with gles2). A very similar freeze occurs with enlightenment/wayland, so I assume it's a bug in nouveau. Created attachment 120290 [details]
dmesg log after the freeze occured.
Created attachment 120994 [details]
Added a more a dmesg debug log for the problem.
The freeze occurs after this error:
[ 409.260043] nouveau E[ PFIFO][0000:01:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]
As it can be seen imediately after the freeze, this error is spammed on dmesg every 5 seconds.
I've tested this problem again with the latest nouveau driver and the latest devel kernel kernel and is still reproducible. Created attachment 121445 [details]
Full dmesg log with kernel 4.4.1
I have Dell 6520 with NVS 4200M. Problem is still there. With Mag6 64bit latest update. I mean problem is with graphic ram manegement because when I play video (with XVideo) desktop is freezing much sooner how when using programs. kernel-desktop-4.14.16 lib64drm_nouveau2-2.4.89 x11-driver-video-nouveau-1.0.15 -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/74. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.