Bug 53101 - [NVC0+] Fedora 17 "PFIFO - playlist update failed" on boot
[NVC0+] Fedora 17 "PFIFO - playlist update failed" on boot
Status: NEEDINFO
Product: xorg
Classification: Unclassified
Component: Driver/nouveau
unspecified
Other All
: high critical
Assigned To: Nouveau Project
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-03 17:53 UTC by mog55356
Modified: 2015-01-16 23:31 UTC (History)
3 users (show)

See Also:


Attachments
Kernel log from boot (95.56 KB, text/plain)
2012-08-03 17:53 UTC, mog55356
no flags Details
W520-4276CTO-NVC0 dmesg commitish-872dcac gdm + suspend/resume cycle (242.82 KB, text/plain)
2012-08-15 17:47 UTC, michael.weirauch
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description mog55356 2012-08-03 17:53:25 UTC
Created attachment 65099 [details]
Kernel log from boot

Problem:
    Can no longer start x server.

Steps to reproduce:
    1. Boot the computer
Expected behaviour:
    User expects to reach the login prompt.
Actual behaviour:
    X Server fails to start

History:
    Updated to latest kernel on Fedora 17 x86_64: Linux version 3.5.0-2.fc17.x86_64 (mockbuild@buildvm-16.phx2.fedoraproject.org) (gcc version 4.7.0 20120507 (Red Hat 4.7.0-5) (GCC) ) #1 SMP Mon Jul 30 14:48:59 UTC 2012
    Rebooted
    Saw fedora begin to boot up, but instead of being presented with a login screen I saw noise/leftover images from previous boot
    X server terminated and I saw some nouveau errors on screen:
        [   43.155163] [drm] nouveau 0000:01:00.0: PFIFO - playlist update failed
        [   53.020045] [drm] nouveau 0000:01:00.0: Failed to idle channel 1.
        [   57.019076] [drm] nouveau 0000:01:00.0: PFIFO - playlist update failed
        [   60.017783] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
        [   64.016807] [drm] nouveau 0000:01:00.0: PFIFO - playlist update failed
    The screen then went back to noise/leftovers for a few seconds, then displayed those error messages again in sequence
    This continued endlessly until I boot with the previous kernel.

Hardware information:
    The model is a GTX 580m. According to the wiki, this is an NVCE (GF114).
    sudo lspci -v | less found this:
        01:00.0 VGA compatible controller: nVidia Corporation Device 1211 (rev a1) (prog-if 00 [VGA controller])
            Subsystem: CLEVO/KAPOK Computer Device 7100
            Flags: bus master, fast devsel, latency 0, IRQ 16
            Memory at f4000000 (32-bit, non-prefetchable) [size=32M]
            Memory at e8000000 (64-bit, prefetchable) [size=128M]
            Memory at f0000000 (64-bit, prefetchable) [size=64M]
            I/O ports at e000 [size=128]
            Expansion ROM at f6000000 [disabled] [size=512K]
            Capabilities: [60] Power Management version 3
            Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
            Capabilities: [78] Express Endpoint, MSI 00
            Capabilities: [b4] Vendor Specific Information: Len=14 <?>
            Capabilities: [100] Virtual Channel
            Capabilities: [128] Power Budgeting <?>
            Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
            Kernel driver in use: nouveau

Attached files:
    messages.txt
        This is the kernel log from the boot with the new kernel. The Fatal X server error and the PFIFO errors can be found near the end of the log. If I had let the computer keep running, the last few messages would have looped, presumably endlessly.
        NOTE: THE LOG CAN ALSO BE ACCESSED HERE - http://pastebin.com/rrVddzgq

Thank you for taking the time to look into this matter. Please let me know if you require any additional information.
Comment 1 mog55356 2012-08-03 18:54:38 UTC
I was asked to try booting with option nouveau.noaccel=1. Grub didn't complain when I added it to the boot instructions, but the results were identical so I'm not sure whether or not the command "took." Below is a pastebin link to the new /var/log/messages. I hope it is useful.

http://pastebin.com/t39ZHCwP
Comment 2 michael.weirauch 2012-08-15 17:44:45 UTC
Hijacking this bug as I get the same messages, just after resume.

ThinkPad W520 4276CTO NVC0 (2000M)
openSUSE 12.2 + nouveau 20120813 872dcac

* proposed nouveau.noaccel=1 crashes kernel (nouveau_abi16_ioctl_channel_alloc>nouveau_channel_new)

* Booting works (nox2apic, W520 ACPI table issue)
* gdm has graphics distortions though (see early dmesg excerpt)
* double ctrl+alt+backspace "fixes" this and gdm looks good
* suspend from gnome-shell 3.4.2 works
* resume shows gdm-password prompt and usually a white-noise background
** the gnome-shellish top-panel looks intact, though
** mouse cursor not movable, cpu load
** looks like "something" tries to restart gdm/X over and over again
* switching to vt possible with some insisting
* restarting gdm does lock up the system
* the "channel x kick timeout" seems new since some commits IIRC

repeatedly in dmesg:
[  156.925301] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  159.924800] nouveau E[     DRM][0000:01:00.0] failed to idle channel 0xcccc0000
[  161.924690] nouveau E[   PFIFO][0000:01:00.0] channel 1 kick timeout
[  161.924787] nouveau  [   PFIFO][0000:01:00.0] unknown status 0x00000100
[  163.924603] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  163.989722] nouveau  [   PFIFO][0000:01:00.0] unknown status 0x00000100
[  165.989535] nouveau E[   PFIFO][0000:01:00.0] channel 3 kick timeout
[  165.989670] nouveau  [   PFIFO][0000:01:00.0] unknown status 0x00000100
[  167.989455] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  167.989517] nouveau ![   PFIFO][0000:01:00.0] unhandled status 0x00000001
[  170.649537] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  172.660200] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  185.103713] nouveau E[     DRM][0000:01:00.0] failed to idle channel 0xcccc0001
[  187.103627] nouveau E[   PFIFO][0000:01:00.0] channel 2 kick timeout

I tried a fc17 install and the original kernel (3.3.4-5.fc17.x86_64) worked. Suspend/resume fine at least when not in docking station. After updating that test install to 3.5.1-1.fc17.x86_64 the same issues cropped up I see in openSUSE 12.2. So this looks distribution agnostic.

Any pointers on what to try to help diagnose this issue are welcome.
Comment 3 michael.weirauch 2012-08-15 17:47:07 UTC
Created attachment 65608 [details]
W520-4276CTO-NVC0 dmesg commitish-872dcac gdm + suspend/resume cycle
Comment 4 mog55356 2012-08-18 23:39:51 UTC
*** Bug 53566 has been marked as a duplicate of this bug. ***
Comment 5 michael.weirauch 2012-08-21 19:23:43 UTC
Bisection rounds testing successful suspend/resume cycles on NVC0/2000M:
note:
* gdm greeter is showing garbage (screen content from before reboot) somewhere before the last known good commits
** this issue was ignored and still present in the last good commit but is not the topic of this bug

$ git bisect log
# bad: [f9b495fca46836a6a05cedde8058ccb8a3e62c3d] drm/nouveau: use ioread32_native/iowrite32_native for fifo control registers
# good: [f887c425f9eeed8ffbca64c8be45da62b07096c0] drm/nouveau: bump version to 1.0.0
git bisect start 'HEAD' 'f887c425f9eeed8ffbca64c8be45da62b07096c0' '--' 'drivers/gpu/drm/nouveau/'
# bad: [9bd0c15fcfb42f6245447c53347d65ad9e72080b] drm/nouveau/fbcon: using nv_two_heads is not a good idea
git bisect bad 9bd0c15fcfb42f6245447c53347d65ad9e72080b
# good: [5132f37700210740117f5163b5df7aa1c8469a55] drm/nve0/fifo: initial implementation
git bisect good 5132f37700210740117f5163b5df7aa1c8469a55
# bad: [71af5e62db5d7d6348e838d0f79533653e2f8cfe] drm/nv50/gr: make sure NEXT_TO_CURRENT is executed even if nothing done
git bisect bad 71af5e62db5d7d6348e838d0f79533653e2f8cfe
# good: [afada5e0bb3cac8530c2ae36aa0abca41d60e063] drm/nv04/disp: disable vblank interrupts when disabling display
git bisect good afada5e0bb3cac8530c2ae36aa0abca41d60e063
# bad: [5e120f6e4b3f35b741c5445dfc755f50128c3c44] drm/nouveau/fence: convert to exec engine, and improve channel sync
git bisect bad 5e120f6e4b3f35b741c5445dfc755f50128c3c44
# good: [35bcf5d55540e47091a67e5962f12b88d51d7131] drm/nouveau: move flip-related channel setup to software engine
git bisect good 35bcf5d55540e47091a67e5962f12b88d51d7131
# good: [d375e7d56dffa564a6c337d2ed3217fb94826100] drm/nouveau/fence: minor api changes for an upcoming rework
git bisect good d375e7d56dffa564a6c337d2ed3217fb94826100


5e120f6e4b3f35b741c5445dfc755f50128c3c44 is the first bad commit
commit 5e120f6e4b3f35b741c5445dfc755f50128c3c44
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Mon Apr 30 13:55:29 2012 +1000

    drm/nouveau/fence: convert to exec engine, and improve channel sync
    
    Now have a somewhat simpler semaphore sync implementation for nv17:nv84,
    and a switched to using semaphores as fences on nv84+ and making use of
    the hardware's >= acquire operation.
    
    Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

:040000 040000 8f2ca4ddf4969c75f688a96fdb152e449fda4852 da67a1bd8d608577e659a26715cf8af3644d8efe M	drivers
Comment 6 Vlad K 2012-08-21 19:46:12 UTC
Michael, either your bug is a different regression and needs new bug report, or I will reopen bug 53566.
Comment 7 michael.weirauch 2012-08-21 19:53:27 UTC
(In reply to comment #6)
> Michael, either your bug is a different regression and needs new bug report, or
> I will reopen bug 53566.

I am not even sure bug 53566 is a duplicate as your bisection determined first bad commit is different to what I bisected.

What's the stance from the devs on this?
Reopen 53566? Me filing a new bug (replicating the info here)? Both?
Comment 8 mog55356 2012-08-21 22:59:40 UTC
Based on the description the bug Michael is describing sounds different from mine. Your description of the problem in 53566 sounds exactly like my problem, and matches what I saw in my own kernel log. I must have done a poor job explaining the problem because when Michael hijacked this bug he said that he thought it was the same problem I was having; it obviously is not. His problem probably belongs in a different bug.
Comment 9 michael.weirauch 2012-08-22 05:44:54 UTC
I was basing my assumption that I am hitting the same issue like you based on your log output with "PFIFO - playlist update
failed" and "Failed to idle channel x" which is exactly the errors I get when resuming. (Just not on boot)

I will create a new bug. Sorry for the noise guys. Perhaps we are bitten by the same root cause, nevertheless.
Comment 10 michael.weirauch 2012-08-22 06:56:46 UTC
Ok, after finding out the bad commit and looking for it around here I have found bug 50121 where I attached my info (again).
Comment 11 Kelly Doran 2012-08-24 02:34:33 UTC
I have been playing around with this a bit and made some progress.  It seems to affect any nvc0 card (I have a GTX 580).  I went through the commits between 3.4.0 and 3.5.0-rc1 and determined that the cause of the error is http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=1a46098e910b96337f0fe3838223db43b923bad4

The cards work fine with the latest nouveau git tree if you comment out:
		{ "COPY1", 5, 0x90b8, nvc0_bo_move_copy, nvc0_bo_move_init },
		{ "COPY0", 4, 0x90b5, nvc0_bo_move_copy, nvc0_bo_move_init },

which seems to imply that the nvc0_bo_move_copy function is not working correctly.  I don't know nearly enough about nouveau to try to fix this function or know what consequence commenting out these lines has, but hopefully this helps.

On a possibly related note, running glxinfo seems to crash xorg and produce some more PFIFO errors in dmesg, I have no idea if this is related to those lines being commented out or not (this is the first time I have ever gotten nouveau working on this computer).  Everything else seems stable... so far...
Comment 12 3vi1 2012-08-25 17:14:20 UTC
I seem to be seeing the exact same thing at boot with the current Ubuntu 12.10 alphas and my GTX560 Ti (also a GF114).

Shouldn't this be marked as a high priority regression?  I would expect that in a month and a half we're going to see a lot of sad pandas saying that Linux sucks when they try the new Ubuntu release and get a looping LightDM crash.
Comment 13 mog55356 2012-08-25 18:17:13 UTC
Sorry, when I created this bug I had no idea it was affecting other nvc0 cards. I Googled extensively and couldn't find anyone else who had my exact error, so I assumed that it was some esoteric detail about my specific hardware configuration. I didn't want to make it seem like a big deal if it wasn't. Since this seems to be affecting all nvc0's on 3.5+, I'll mark it as high priority critical. If those are not the correct importance settings just let me know.
Comment 14 Vlad K 2012-08-25 18:23:30 UTC
In the meantime, you can just revert commit 1a46098e910b96337f0fe3838223db43b923bad4, which allowed me to boot properly. Ubuntu devs can do the same if its not fixed in time for release.
Comment 15 3vi1 2012-08-25 20:11:37 UTC
>> ...when I created this bug I had no idea it was affecting other
>> nvc0 cards.  I Googled extensively and couldn't find anyone
>> else who had my exact error...

Understandable.  I would imagine that most users with these card models are using the proprietary drivers for performance reasons.  I wouldn't have even noticed it myself, if the new xserver 1.13 hadn't been pushed into Quantal before the supporting nvidia-current package was ready.

I'll open a bug in Ubuntu's launchpad with a reference to this one, as I don't think they're aware of the problem yet.
Comment 16 Andrei Amuraritei 2012-09-17 16:47:32 UTC
Same problem here, ever since kernel 3.5.x. Using an Nvidia GTX 570 card with the nouveau driver and kernel 3.5.x results in no X start-up. Same messages, and I`ve tried Fedora 17 x64 - Fedora 18 Alpha x64 and Ubuntu 12.10 x64.

lspci -v

01:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 570] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: eVga.com. Corp. Device 1570
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
	Memory at f0000000 (64-bit, prefetchable) [size=128M]
	Memory at f8000000 (64-bit, prefetchable) [size=32M]
	I/O ports at cc00 [size=128]
	[virtual] Expansion ROM at fe900000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nvidia

uname -a
Linux 3.5.3-1.fc17.x86_64 #1 SMP Wed Aug 29 18:46:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Comment 17 Raphael Groner 2013-02-28 11:56:43 UTC
Well, I think I can reproduce. 
But I have a current ArchLinux with Cinnamon and a NV44.

https://bugs.freedesktop.org/show_bug.cgi?id=61463
https://bugs.freedesktop.org/show_bug.cgi?id=61611


Someone else asks on the kernel mailing list.
http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/01611.html
Comment 18 Raphael Groner 2013-02-28 11:57:24 UTC
Upstream bug:
https://bugzilla.redhat.com/show_bug.cgi?id=855568
Comment 19 Ilia Mirkin 2013-08-31 07:16:24 UTC
This is pseudo-similar to bug 53566 which I closed earlier. Do these issues persist, or are they all fixed in recent kernels?
Comment 20 sdlarsen 2013-09-12 06:26:46 UTC
I'm seeing this for the first time on a 3.11 kernel, so I'd say it's still a problem. I haven't seen it on 3.9.* or 3.10.* kernels though.
Comment 21 Giorgio Pretto 2013-10-29 16:27:28 UTC
Same here.
recently switched to a nvidia card (gt630) on kernel 3.11.6
after startx, blank screen with top left cursor, and then wide stair-like black white stripes with noise.

after trying git kernel 3.12.0rc7 no stripes, different kind of noise but same result.

i installed the video card today, so I can't tell if it worked before (yet, but i am installing a <3.11 kernel now)

If further information are helpful i will provide them
Comment 22 Giorgio Pretto 2013-10-29 16:38:56 UTC
http://bpaste.net/show/144880/
Xorg.0.log

While this problem persist even with nouveau.noaccel=1, with modesetting everything works fine
Comment 23 Tobias Klausmann 2015-01-16 23:31:00 UTC
Can you still reproduce this with a newer kernel?