recent git and the screen freezes after launching X. When just running a few xterms, it works ok for a while then it freezes. The mouse continues to move ok, and you can still ssh to the box etc. It _seems_ the freeze can be triggered by starting firefox. This is on kernel 2.6.31-rc6
Logs are missing.
Created attachment 29025 [details] log from boot to hang (powered off) And here is the log
From your log: Aug 30 12:10:02 tippex nvidiafb: EDID found from BUS1 Aug 30 12:10:02 tippex nvidiafb: Using CRT on CRTC 0 Aug 30 12:10:02 tippex nvidiafb: MTRR set to ON Aug 30 12:10:02 tippex nvidiafb: PCI nVidia NV2 framebuffer (16MB @ 0xE2000000) nvidiafb is bound to give problems. You must get rid of it. Reference: http://nouveau.freedesktop.org/wiki/FAQ Troubleshooting See also the KMS troubleshooting related items if you try KMS.
Douh. Sorry about that. Removing the nvidiafb modules does make it better (and I get kms!!). However, leaving the machine for the night locks the screensaver. Wiggling the mouse/kbd doesn't light up the screen. The machine continues to route packets though. I get a bunch of these in the log: Sep 1 19:11:13 tippex [drm:drm_ioctl], pid=2376, cmd=0x40086485, nr=0x85, dev 0xe200, auth=1 Sep 1 19:11:13 tippex [drm:drm_ioctl], ret = fffffff5 Sep 1 19:11:13 tippex [drm:drm_ioctl], pid=2376, cmd=0x40086485, nr=0x85, dev 0xe200, auth=1 Sep 1 19:11:13 tippex [drm:drm_ioctl], ret = fffffff5 Sep 1 19:11:13 tippex [drm:drm_ioctl], pid=2376, cmd=0x40086485, nr=0x85, dev 0xe200, auth=1 Sep 1 19:11:13 tippex [drm:drm_ioctl], ret = fffffff5 Sep 1 19:11:13 tippex [drm:drm_ioctl], pid=2376, cmd=0x40086485, nr=0x85, dev 0xe200, auth=1 Sep 1 19:11:13 tippex [drm:drm_ioctl], ret = fffffff5 Do you want the entire log from boot?
(In reply to comment #4) > Douh. Sorry about that. Removing the nvidiafb modules does make it better (and > I get kms!!). Cool. > However, leaving the machine for the night locks the screensaver. Wiggling the > mouse/kbd doesn't light up the screen. The machine continues to route packets > though. Which screensaver is that, does it run an animation? > I get a bunch of these in the log: > > Sep 1 19:11:13 tippex [drm:drm_ioctl], pid=2376, cmd=0x40086485, nr=0x85, dev > 0xe200, auth=1 > Sep 1 19:11:13 tippex [drm:drm_ioctl], ret = fffffff5 This is still DRM_NOUVEAU_GEM_CPU_PREP returning EAGAIN. This is a sort of a soft lockup, where the graphics memory manager gets stuck. Well, assuming it gets stuck here, a bunch of those might appear even without a bug, I guess. > Do you want the entire log from boot? That would be nice, yes.
(In reply to comment #5) > (In reply to comment #4) > > Douh. Sorry about that. Removing the nvidiafb modules does make it better (and > > I get kms!!). > > Cool. > > > However, leaving the machine for the night locks the screensaver. Wiggling the > > mouse/kbd doesn't light up the screen. The machine continues to route packets > > though. > > Which screensaver is that, does it run an animation? > The in-kernel one I guess. The led on the display goes to "sleep" mode and the display powers off . Waiting a long time (e.g. over night) triggers it. I have DPMS to sleep/off after 10 minutes, and these dpms induced sleeps can be undone by kbd/mouse. It's the long sleep sessions (hours) that tend to lock it up. I have no traditional singing-dancing screensaver in use. > > I get a bunch of these in the log: > > > > Sep 1 19:11:13 tippex [drm:drm_ioctl], pid=2376, cmd=0x40086485, nr=0x85, dev > > 0xe200, auth=1 > > Sep 1 19:11:13 tippex [drm:drm_ioctl], ret = fffffff5 > > This is still DRM_NOUVEAU_GEM_CPU_PREP returning EAGAIN. This is a sort of a > soft lockup, where the graphics memory manager gets stuck. Well, assuming it > gets stuck here, a bunch of those might appear even without a bug, I guess. > I get about 100 roundtrips of them each second, and nothing else. Normal? > > Do you want the entire log from boot? > > That would be nice, yes. > I'll post that tonight.
Created attachment 29120 [details] new log. boot to hang part1 The tool forces me to split up the log...
Created attachment 29121 [details] part2
Created attachment 29122 [details] part3
Created attachment 29123 [details] part4/4
Argh. I looked only at the first part. The hang happens at 22:41:24. First, that log is not from boot, or then the kernel message buffer has overflown FAST. (Which might not be a surprise, since it is logging crtc register access and whatnot.) Browsing the log, it is clear that the kernel message buffer repeatedly overflows. Not to mention that that log is a bitch to search through. Okay, I'll see if I could limit the crtc etc. logging in Nouveau during the weekend. The ioctl flood is a bit more annoying. The kernel screen saver is just blanking and DPMS and mouse does not wake it up (not sure if with gpm it would). AFAIK X disables the kernel screen saver. Do you have X running, or just nouveaufb virtual terminal? I'm asking again, since the information from when nvidiafb was enabled is useless. Was this nv05? You probably do have X running since all the activity in the log. What apps are running there? Could you stop them (to get smaller logs), does it hang then? It would be useful to repeat the still valid basic information here that you mentioned in emails to the list. People reading this bug don't know we discussed it on the list.
(In reply to comment #11) > Argh. I looked only at the first part. The hang happens at 22:41:24. > > First, that log is not from boot, or then the kernel message buffer has > overflown FAST. (Which might not be a surprise, since it is logging crtc > register access and whatnot.) > I was suprised by the order of the initial stuff too. > Browsing the log, it is clear that the kernel message buffer repeatedly > overflows. Not to mention that that log is a bitch to search through. > Any ideas how to get around the overflows? > Okay, I'll see if I could limit the crtc etc. logging in Nouveau during the > weekend. The ioctl flood is a bit more annoying. > > ok > The kernel screen saver is just blanking and DPMS and mouse does not wake it up > (not sure if with gpm it would). AFAIK X disables the kernel screen saver. Do > you have X running, or just nouveaufb virtual terminal? I'm always in X when it hangs. I cannot recall a non-X hang. In vt mode, I use gpm, it thatäs relevant. I'm asking again, since > the information from when nvidiafb was enabled is useless. Was this nv05? > lspci says: 01:00.0 VGA compatible controller: nVidia Corporation NV5 [RIVA TNT2/TNT2 Pro] (rev 15) but the nouveay and/or drm code speaks of nv04 on probing...? I'll attach an Xorg log too. > You probably do have X running since all the activity in the log. yes. What apps are > running there? windowmaker, a couple of xterms, exmh (a tcl-tk app), sometimes firefox. I've seen hangs on x11 startup, at about the time exmh starts (ff starts manually) Could you stop them (to get smaller logs), does it hang then? I'll let it run overnight with no wm and no apps, and we'll see what happens. > > It would be useful to repeat the still valid basic information here that you > mentioned in emails to the list. People reading this bug don't know we > discussed it on the list. > Right. See above. Some hangs are in display-on mode, at the middle of operations. The mouse moves, but all windows freezes. Some other hangs are discovered e.g. after the night, when the display is off and wonät come back on on mouse/kbd. In almost all cases, I can ssh etc to the machine just fine, and I've seen nothing interesting in any log apart from the EBUSYthing already talked about. As a newbie to nouveau, I made one interesting observation: The X screen when starting X (even after warm-reboot) is initially painted with the contents of the last X screen. If I bog down the cpu with a cpu bound task (such as drm logging to syslog) I can watch the old contents for ca 5 seconds before it's cleared and the WM starts. That smells like a BIG security issue for public terminals. Shouldn't the GPU/fb memory be clearedbefore handed to the process (as RAM is when provided to a process?)
Created attachment 29191 [details] Xorg.0.log
(In reply to comment #12) > lspci says: > 01:00.0 VGA compatible controller: nVidia Corporation NV5 [RIVA TNT2/TNT2 Pro] > (rev 15) > but the nouveay and/or drm code speaks of nv04 on probing...? I'll attach an > Xorg log too. > > I've been trying to reproduce this for a while on another nv05 without success, I'm using master though. You said you were on master-compat but later that you were using a 2.6.31 release candidate so, which branch are you using exactly? There might be an specific acceleration request deterministically triggering the lockup. Have you tried e.g. "$ x11perf -all"? It would be interesting to know if any of the tests it does makes your system hang. > [...] > As a newbie to nouveau, I made one interesting observation: The X screen when > starting X (even after warm-reboot) is initially painted with the contents of > the last X screen. If I bog down the cpu with a cpu bound task (such as drm > logging to syslog) I can watch the old contents for ca 5 seconds before it's > cleared and the WM starts. That smells like a BIG security issue for public > terminals. Shouldn't the GPU/fb memory be clearedbefore handed to the process > (as RAM is when provided to a process?) > That could be easily done. In fact it's already done for the non-KMS case. I guess patches are welcome.
Created attachment 29248 [details] 40 min boot to crash log 40 minutes run. Boot to hang. The nouveau.ko module (and deps) was manually insmodded after boot, hence the initial rows in the log are more readable. There are still the occational overrun downstream though. This time no screensaver was involved. The display was on for the entire run.
> I've been trying to reproduce this for a while on another nv05 without success, > I'm using master though. You said you were on master-compat but later that you > were using a 2.6.31 release candidate so, which branch are you using exactly? > vanilla 2.6.31-rc6 kernel nouveau-drm-99999999 xf86-video-nouveau-9999 libdrm-9999 from gentoo's x11 overlay. Looking inside the nouveau-drm-99999999 ebuild, I see that it pulls: http://people.freedesktop.org/~pq/nouveau-drm/master-compat.tar.gz The vanilla kernel has no nouveau stuff in it. The master-compat tarball builds fine against it. > There might be an specific acceleration request deterministically triggering > the lockup. Have you tried e.g. "$ x11perf -all"? Runnig it now. I've seen one hang so far, but rebooting and running just that specific test failed to trigger it. Just a thought...As the failure mode is "looping on ebusy", and it seems it's on the same (ioctl) call all the time, are there any debuggning things which can be enabled for that particular call? printf'ing the reason for the ebusy might give some lead... > > That could be easily done. In fact it's already done for the non-KMS case. I > guess patches are welcome. > It seems I'm being lured into actuially looking at the code. :-) I might end up doing exactly that, but I cannot seem to find the time...
(In reply to comment #16) > > I've been trying to reproduce this for a while on another nv05 without success, > > I'm using master though. You said you were on master-compat but later that you > > were using a 2.6.31 release candidate so, which branch are you using exactly? > > > vanilla 2.6.31-rc6 kernel > > nouveau-drm-99999999 > xf86-video-nouveau-9999 > libdrm-9999 > > from gentoo's x11 overlay. > > Looking inside the nouveau-drm-99999999 ebuild, I see that it pulls: > http://people.freedesktop.org/~pq/nouveau-drm/master-compat.tar.gz > > The vanilla kernel has no nouveau stuff in it. The master-compat tarball builds > fine against it. > This sounds like a master-compat issue then. Would you mind to confirm it is by installing the DRM from master?
> This sounds like a master-compat issue then. Would you mind to confirm it is by > installing the DRM from master? > Running 2.6.31-rc6-g1889587 now. Let's see what happens.
(In reply to comment #18) > > This sounds like a master-compat issue then. Would you mind to confirm it is by > > installing the DRM from master? > > > > Running 2.6.31-rc6-g1889587 now. Let's see what happens. > Same hangs with master, unfortunately. I'll try to get a sort log and post it. The local pattern (loop on ebusy) is the same though. Any chance this is caused by anything outside nouveau? should I back down any of the other modules (libdrm, xf86-video-nouveau,...) to non-git versions?
Created attachment 29381 [details] hang log from master Here's a log from a hang using 2.6.31-rc6-g1889587. Effectively starting kernel without nouveau. insmodding it with modeset=1. xinit. manually start firefox. hang while ff starts.
Created attachment 29519 [details] [review] fence_emit_race.patch I think I've finally reproduced your problem (for some reason it seldom happens here, I guess you either have a faster card or a slower CPU). Does the attached patch help?
(In reply to comment #21) > Created an attachment (id=29519) [details] > fence_emit_race.patch > > I think I've finally reproduced your problem (for some reason it seldom happens > here, I guess you either have a faster card or a slower CPU). Does the attached > patch help? > I applied the patch and let it sit in X overnight. The machine was still up when I checked it this morning. This is definetly a good sign. I'll keep you posted. Machine details: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 3 model name : Pentium II (Klamath) stepping : 4 cpu MHz : 300.664 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov mmx bogomips : 601.32 clflush size : 32 power management: 01:00.0 VGA compatible controller: nVidia Corporation NV5 [RIVA TNT2/TNT2 Pro] (rev 15) Definetly a slow machine by today's standards, but maybe the gpu is even slower??
(In reply to comment #22) > I applied the patch and let it sit in X overnight. The machine was still up > when I checked it this morning. This is definetly a good sign. I'll keep you > posted. > > Machine details: > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 3 > model name : Pentium II (Klamath) > stepping : 4 > cpu MHz : 300.664 > cache size : 512 KB > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 2 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > mmx > bogomips : 601.32 > clflush size : 32 > power management: > Ok, so that's it. > > 01:00.0 VGA compatible controller: nVidia Corporation NV5 [RIVA TNT2/TNT2 Pro] > (rev 15) > > Definetly a slow machine by today's standards, but maybe the gpu is even > slower?? > Another annoying thing I realized: > (II) NOUVEAU(0): Allocated 1MiB VRAM for offscreen pixmaps FWIW you can just enable driver pixmaps if you want take advantage of the whole VRAM.
I'm happy to report that the machine is still up and running after another 24h session. I'm inclined to think that the bug is fixed. If it's still running by the end of the week, I'll close the bug. (In reply to comment #23) > (In reply to comment #22) > > I applied the patch and let it sit in X overnight. The machine was still up > > when I checked it this morning. This is definetly a good sign. I'll keep you > > posted. > > > > Machine details: <snip> > > bogomips : 601.32 > > clflush size : 32 > > power management: > > > Ok, so that's it. > Hmm. What do you mean? That this is a relatively fast machine? > Another annoying thing I realized: > > (II) NOUVEAU(0): Allocated 1MiB VRAM for offscreen pixmaps > > FWIW you can just enable driver pixmaps if you want take advantage of the whole > VRAM. > I'd be happy to do so. How? Searching the man pages & web gave me nothing.
(In reply to comment #24) > (In reply to comment #23) > > (In reply to comment #22) > > > I applied the patch and let it sit in X overnight. The machine was still up > > > when I checked it this morning. This is definetly a good sign. I'll keep you > > > posted. > > > > > > Machine details: > <snip> > > > bogomips : 601.32 > > > clflush size : 32 > > > power management: > > > > > Ok, so that's it. > > > Hmm. What do you mean? That this is a relatively fast machine? > I meant your odds are worse with such a slow machine :-) > > > Another annoying thing I realized: > > > (II) NOUVEAU(0): Allocated 1MiB VRAM for offscreen pixmaps > > > > FWIW you can just enable driver pixmaps if you want take advantage of the whole > > VRAM. > > > I'd be happy to do so. How? Searching the man pages & web gave me nothing. > You need a newer X server, at least a 1.7.0 release candidate, if you're using KMS it's then enabled automatically.
*** Bug 23086 has been marked as a duplicate of this bug. ***
Having tested the patch (and a recent git where the patch is included) for > 1 week without any issues, I believe the bug is fixed, so I close it.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.