Created attachment 65099 [details] Kernel log from boot Problem: Can no longer start x server. Steps to reproduce: 1. Boot the computer Expected behaviour: User expects to reach the login prompt. Actual behaviour: X Server fails to start History: Updated to latest kernel on Fedora 17 x86_64: Linux version 3.5.0-2.fc17.x86_64 (mockbuild@buildvm-16.phx2.fedoraproject.org) (gcc version 4.7.0 20120507 (Red Hat 4.7.0-5) (GCC) ) #1 SMP Mon Jul 30 14:48:59 UTC 2012 Rebooted Saw fedora begin to boot up, but instead of being presented with a login screen I saw noise/leftover images from previous boot X server terminated and I saw some nouveau errors on screen: [ 43.155163] [drm] nouveau 0000:01:00.0: PFIFO - playlist update failed [ 53.020045] [drm] nouveau 0000:01:00.0: Failed to idle channel 1. [ 57.019076] [drm] nouveau 0000:01:00.0: PFIFO - playlist update failed [ 60.017783] [drm] nouveau 0000:01:00.0: Failed to idle channel 2. [ 64.016807] [drm] nouveau 0000:01:00.0: PFIFO - playlist update failed The screen then went back to noise/leftovers for a few seconds, then displayed those error messages again in sequence This continued endlessly until I boot with the previous kernel. Hardware information: The model is a GTX 580m. According to the wiki, this is an NVCE (GF114). sudo lspci -v | less found this: 01:00.0 VGA compatible controller: nVidia Corporation Device 1211 (rev a1) (prog-if 00 [VGA controller]) Subsystem: CLEVO/KAPOK Computer Device 7100 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at f4000000 (32-bit, non-prefetchable) [size=32M] Memory at e8000000 (64-bit, prefetchable) [size=128M] Memory at f0000000 (64-bit, prefetchable) [size=64M] I/O ports at e000 [size=128] Expansion ROM at f6000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100] Virtual Channel Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nouveau Attached files: messages.txt This is the kernel log from the boot with the new kernel. The Fatal X server error and the PFIFO errors can be found near the end of the log. If I had let the computer keep running, the last few messages would have looped, presumably endlessly. NOTE: THE LOG CAN ALSO BE ACCESSED HERE - http://pastebin.com/rrVddzgq Thank you for taking the time to look into this matter. Please let me know if you require any additional information.
I was asked to try booting with option nouveau.noaccel=1. Grub didn't complain when I added it to the boot instructions, but the results were identical so I'm not sure whether or not the command "took." Below is a pastebin link to the new /var/log/messages. I hope it is useful. http://pastebin.com/t39ZHCwP
Hijacking this bug as I get the same messages, just after resume. ThinkPad W520 4276CTO NVC0 (2000M) openSUSE 12.2 + nouveau 20120813 872dcac * proposed nouveau.noaccel=1 crashes kernel (nouveau_abi16_ioctl_channel_alloc>nouveau_channel_new) * Booting works (nox2apic, W520 ACPI table issue) * gdm has graphics distortions though (see early dmesg excerpt) * double ctrl+alt+backspace "fixes" this and gdm looks good * suspend from gnome-shell 3.4.2 works * resume shows gdm-password prompt and usually a white-noise background ** the gnome-shellish top-panel looks intact, though ** mouse cursor not movable, cpu load ** looks like "something" tries to restart gdm/X over and over again * switching to vt possible with some insisting * restarting gdm does lock up the system * the "channel x kick timeout" seems new since some commits IIRC repeatedly in dmesg: [ 156.925301] nouveau E[ PFIFO][0000:01:00.0] playlist update failed [ 159.924800] nouveau E[ DRM][0000:01:00.0] failed to idle channel 0xcccc0000 [ 161.924690] nouveau E[ PFIFO][0000:01:00.0] channel 1 kick timeout [ 161.924787] nouveau [ PFIFO][0000:01:00.0] unknown status 0x00000100 [ 163.924603] nouveau E[ PFIFO][0000:01:00.0] playlist update failed [ 163.989722] nouveau [ PFIFO][0000:01:00.0] unknown status 0x00000100 [ 165.989535] nouveau E[ PFIFO][0000:01:00.0] channel 3 kick timeout [ 165.989670] nouveau [ PFIFO][0000:01:00.0] unknown status 0x00000100 [ 167.989455] nouveau E[ PFIFO][0000:01:00.0] playlist update failed [ 167.989517] nouveau ![ PFIFO][0000:01:00.0] unhandled status 0x00000001 [ 170.649537] nouveau E[ PFIFO][0000:01:00.0] playlist update failed [ 172.660200] nouveau E[ PFIFO][0000:01:00.0] playlist update failed [ 185.103713] nouveau E[ DRM][0000:01:00.0] failed to idle channel 0xcccc0001 [ 187.103627] nouveau E[ PFIFO][0000:01:00.0] channel 2 kick timeout I tried a fc17 install and the original kernel (3.3.4-5.fc17.x86_64) worked. Suspend/resume fine at least when not in docking station. After updating that test install to 3.5.1-1.fc17.x86_64 the same issues cropped up I see in openSUSE 12.2. So this looks distribution agnostic. Any pointers on what to try to help diagnose this issue are welcome.
Created attachment 65608 [details] W520-4276CTO-NVC0 dmesg commitish-872dcac gdm + suspend/resume cycle
*** Bug 53566 has been marked as a duplicate of this bug. ***
Bisection rounds testing successful suspend/resume cycles on NVC0/2000M: note: * gdm greeter is showing garbage (screen content from before reboot) somewhere before the last known good commits ** this issue was ignored and still present in the last good commit but is not the topic of this bug $ git bisect log # bad: [f9b495fca46836a6a05cedde8058ccb8a3e62c3d] drm/nouveau: use ioread32_native/iowrite32_native for fifo control registers # good: [f887c425f9eeed8ffbca64c8be45da62b07096c0] drm/nouveau: bump version to 1.0.0 git bisect start 'HEAD' 'f887c425f9eeed8ffbca64c8be45da62b07096c0' '--' 'drivers/gpu/drm/nouveau/' # bad: [9bd0c15fcfb42f6245447c53347d65ad9e72080b] drm/nouveau/fbcon: using nv_two_heads is not a good idea git bisect bad 9bd0c15fcfb42f6245447c53347d65ad9e72080b # good: [5132f37700210740117f5163b5df7aa1c8469a55] drm/nve0/fifo: initial implementation git bisect good 5132f37700210740117f5163b5df7aa1c8469a55 # bad: [71af5e62db5d7d6348e838d0f79533653e2f8cfe] drm/nv50/gr: make sure NEXT_TO_CURRENT is executed even if nothing done git bisect bad 71af5e62db5d7d6348e838d0f79533653e2f8cfe # good: [afada5e0bb3cac8530c2ae36aa0abca41d60e063] drm/nv04/disp: disable vblank interrupts when disabling display git bisect good afada5e0bb3cac8530c2ae36aa0abca41d60e063 # bad: [5e120f6e4b3f35b741c5445dfc755f50128c3c44] drm/nouveau/fence: convert to exec engine, and improve channel sync git bisect bad 5e120f6e4b3f35b741c5445dfc755f50128c3c44 # good: [35bcf5d55540e47091a67e5962f12b88d51d7131] drm/nouveau: move flip-related channel setup to software engine git bisect good 35bcf5d55540e47091a67e5962f12b88d51d7131 # good: [d375e7d56dffa564a6c337d2ed3217fb94826100] drm/nouveau/fence: minor api changes for an upcoming rework git bisect good d375e7d56dffa564a6c337d2ed3217fb94826100 5e120f6e4b3f35b741c5445dfc755f50128c3c44 is the first bad commit commit 5e120f6e4b3f35b741c5445dfc755f50128c3c44 Author: Ben Skeggs <bskeggs@redhat.com> Date: Mon Apr 30 13:55:29 2012 +1000 drm/nouveau/fence: convert to exec engine, and improve channel sync Now have a somewhat simpler semaphore sync implementation for nv17:nv84, and a switched to using semaphores as fences on nv84+ and making use of the hardware's >= acquire operation. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> :040000 040000 8f2ca4ddf4969c75f688a96fdb152e449fda4852 da67a1bd8d608577e659a26715cf8af3644d8efe M drivers
Michael, either your bug is a different regression and needs new bug report, or I will reopen bug 53566.
(In reply to comment #6) > Michael, either your bug is a different regression and needs new bug report, or > I will reopen bug 53566. I am not even sure bug 53566 is a duplicate as your bisection determined first bad commit is different to what I bisected. What's the stance from the devs on this? Reopen 53566? Me filing a new bug (replicating the info here)? Both?
Based on the description the bug Michael is describing sounds different from mine. Your description of the problem in 53566 sounds exactly like my problem, and matches what I saw in my own kernel log. I must have done a poor job explaining the problem because when Michael hijacked this bug he said that he thought it was the same problem I was having; it obviously is not. His problem probably belongs in a different bug.
I was basing my assumption that I am hitting the same issue like you based on your log output with "PFIFO - playlist update failed" and "Failed to idle channel x" which is exactly the errors I get when resuming. (Just not on boot) I will create a new bug. Sorry for the noise guys. Perhaps we are bitten by the same root cause, nevertheless.
Ok, after finding out the bad commit and looking for it around here I have found bug 50121 where I attached my info (again).
I have been playing around with this a bit and made some progress. It seems to affect any nvc0 card (I have a GTX 580). I went through the commits between 3.4.0 and 3.5.0-rc1 and determined that the cause of the error is http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=1a46098e910b96337f0fe3838223db43b923bad4 The cards work fine with the latest nouveau git tree if you comment out: { "COPY1", 5, 0x90b8, nvc0_bo_move_copy, nvc0_bo_move_init }, { "COPY0", 4, 0x90b5, nvc0_bo_move_copy, nvc0_bo_move_init }, which seems to imply that the nvc0_bo_move_copy function is not working correctly. I don't know nearly enough about nouveau to try to fix this function or know what consequence commenting out these lines has, but hopefully this helps. On a possibly related note, running glxinfo seems to crash xorg and produce some more PFIFO errors in dmesg, I have no idea if this is related to those lines being commented out or not (this is the first time I have ever gotten nouveau working on this computer). Everything else seems stable... so far...
I seem to be seeing the exact same thing at boot with the current Ubuntu 12.10 alphas and my GTX560 Ti (also a GF114). Shouldn't this be marked as a high priority regression? I would expect that in a month and a half we're going to see a lot of sad pandas saying that Linux sucks when they try the new Ubuntu release and get a looping LightDM crash.
Sorry, when I created this bug I had no idea it was affecting other nvc0 cards. I Googled extensively and couldn't find anyone else who had my exact error, so I assumed that it was some esoteric detail about my specific hardware configuration. I didn't want to make it seem like a big deal if it wasn't. Since this seems to be affecting all nvc0's on 3.5+, I'll mark it as high priority critical. If those are not the correct importance settings just let me know.
In the meantime, you can just revert commit 1a46098e910b96337f0fe3838223db43b923bad4, which allowed me to boot properly. Ubuntu devs can do the same if its not fixed in time for release.
>> ...when I created this bug I had no idea it was affecting other >> nvc0 cards. I Googled extensively and couldn't find anyone >> else who had my exact error... Understandable. I would imagine that most users with these card models are using the proprietary drivers for performance reasons. I wouldn't have even noticed it myself, if the new xserver 1.13 hadn't been pushed into Quantal before the supporting nvidia-current package was ready. I'll open a bug in Ubuntu's launchpad with a reference to this one, as I don't think they're aware of the problem yet.
Same problem here, ever since kernel 3.5.x. Using an Nvidia GTX 570 card with the nouveau driver and kernel 3.5.x results in no X start-up. Same messages, and I`ve tried Fedora 17 x64 - Fedora 18 Alpha x64 and Ubuntu 12.10 x64. lspci -v 01:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 570] (rev a1) (prog-if 00 [VGA controller]) Subsystem: eVga.com. Corp. Device 1570 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Memory at f0000000 (64-bit, prefetchable) [size=128M] Memory at f8000000 (64-bit, prefetchable) [size=32M] I/O ports at cc00 [size=128] [virtual] Expansion ROM at fe900000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100] Virtual Channel Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nvidia uname -a Linux 3.5.3-1.fc17.x86_64 #1 SMP Wed Aug 29 18:46:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Well, I think I can reproduce. But I have a current ArchLinux with Cinnamon and a NV44. https://bugs.freedesktop.org/show_bug.cgi?id=61463 https://bugs.freedesktop.org/show_bug.cgi?id=61611 Someone else asks on the kernel mailing list. http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/01611.html
Upstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=855568
This is pseudo-similar to bug 53566 which I closed earlier. Do these issues persist, or are they all fixed in recent kernels?
I'm seeing this for the first time on a 3.11 kernel, so I'd say it's still a problem. I haven't seen it on 3.9.* or 3.10.* kernels though.
Same here. recently switched to a nvidia card (gt630) on kernel 3.11.6 after startx, blank screen with top left cursor, and then wide stair-like black white stripes with noise. after trying git kernel 3.12.0rc7 no stripes, different kind of noise but same result. i installed the video card today, so I can't tell if it worked before (yet, but i am installing a <3.11 kernel now) If further information are helpful i will provide them
http://bpaste.net/show/144880/ Xorg.0.log While this problem persist even with nouveau.noaccel=1, with modesetting everything works fine
Can you still reproduce this with a newer kernel?
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/27.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.