Summary: | X freezes (100%CPU, mouse moves) with Nvidia card but not with S3 S2k | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Stefan Huszics <sauron> | ||||||
Component: | Driver/nVidia (open) | Assignee: | Aaron Plattner <aplattner> | ||||||
Status: | RESOLVED WONTFIX | QA Contact: | Xorg Project Team <xorg-team> | ||||||
Severity: | critical | ||||||||
Priority: | high | CC: | adampaetznick, benjamin.monate, benjsc, erik.andren, fdbug, MostAwesomedude, oholzer, omry, thomas.muehlfriedel | ||||||
Version: | unspecified | ||||||||
Hardware: | x86 (IA32) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
Stefan Huszics
2005-04-30 23:17:23 UTC
Hi, This is not only Nvidia sole problem, their rival has exactly the same problem. It seems these two companies don't care about their linux customers anymore. I have been crying to ATI to fix this problem since last year. Yet, this bug is not even listed as their known problem. Nvidia is never acknowledge this problem either. Check out my thread in Rage3D forum. Mouse move but screen freeze, ssh-able, X process consumes 99% CPU utilization, killing X only makes the system completely freeze. http://www.rage3d.com/board/showthread.php?t=33800697 I mentioned earlier that it never locks once I start looking at a movie. Well never say never, it did just that 2 days ago (though this is the only time sofar). Also, when I read my own bugreport I notice I forgot to mention that it is possible to SSH in from a remote mashine and killall -9 X startx to again get graphics working, so a compleate reboot is not nessecary (if you happen to have access to a second system). There's not really enough information in this bug report to even remotely conclude if it is a video driver bug, X server bug, mouse driver bug, kernel bug, proprietary kernel module bug, buggy motherboard or BIOS, overheating problem, bad memory, or any one of tens of other potential problems. If you want anyone to investigate the problem, you're more or less going to have to provide a lot more detail than what's here, and try to narrow the problem down as far as possible. Join xorg@freedesktop.org and try to find other people who have the same problem. See what their system has in common with yours (if anything), including any motherboard, chipset, video card brand/model, revision, BIOS version, kernel version, compiler options used, absolutely anything at all that can provide enough evidence that it is: - A common problem being experienced by more than 1 person. Where "more", is more than 2 people if possible, and if it's 10 or 50 people, that's even better. - Perhaps common to certain hardware and/or software combination and/or configuration options for the video card, CMOS, kernel, X server, etc. If possible debug the X server via remote, and try to narrow down where the problem is happening. This is notoriously difficult to impossible if you're using proprietary drivers, unless you get insanely lucky. The reason I say all of this, is to try to point out what needs to happen for you to realistically have a chance of someone actively trying to investigate the problem, because: - Open source developers don't have the source code for proprietary drivers, and generally speaking can't debug and fix problems that only occur when using them. They generally turn out to be bugs in the drivers themselves. I say "generally", not "always", so no need for anyone to chime in with "this one time, I had a problem and it was the X server, not Nvidia|ATI|whoever" one-count stats. The fact is, problems tend to be in the drivers wether they are open or closed source, and so that leaves the likelyhood of it being fixed generally up to the company who wrote the driver. - Proprietary driver developers, working at the hardware companies are not likely to investigate bug reports one single user is reporting, or even 2 users, unless the problem report is very highly detailed and contains enough compelling information for the vendor to conclude it is probably a driver problem and assess the problem as being something they consider important enough to investigate and fix. So far, while I do see a problem being reported here, someone who experiences it, is probably going to have to take out their secret decoder ring, and fill bugzilla with config files, log files, stack traces, and other debugging information before it's useful to X.org developers, Nvidia, or anyone really. Sorry to be blunt, but I'm just trying to help by steering you in the right direction. Merely complaining about things wont get a solution for the problem. Hope this helps. Same behavior on Suse 9.3 for x86-64 running on Dell Precision 370 (P4 3.2Ghz witn HT, 2Gb Ram, NVIDIA Quattro 330 PCIE card and "nv" driver). Same behavior on Suse 9.3 for x86-64 running on Dell Precision 370 (P4 3.2 Ghz with HT, 2Gb RAM, nVidia Corporation NV37GL PCIE card and "nv" driver). It is as the original complaint describes. The X hang only hangs when the X server is redrawing the screen. Initially, my box (Nvidia 4200) would hang when my screen saver ran for a while. I thought the problem to be the screen saver itself so I disabled it. The hangs often occur when Firefox or Mozilla render images or if I quickly resize a window. The CPU goes to about 100% and the X server becomes unresponsive. The mouse moves but clicks are not registered. If I ssh in from another machine I can do a kill -9 on X to reset the session. No other kill sig seems to make a difference. After the X session is killed, it is impossible to get a character session to appear. That is, if you ctr-alt-F1 the text console never appears. If I shutdown from the login screen at that point, the screen becomes corrupted (lots of colors and blinking characters) until the machine shuts down. I swapped in an Nvidia 5200 to test that my card was not bad and the same thing happens. The hangs happen in both the KDE and Gnome desktops. Same problem on Fedora Core 4, x86_64,GeForce FX5200 with nvidia or nv driver loaded. I have alike problem. System: Slackware 10, kernel 2.6.12 (and 2.6.11 too), X.org 6.8.1 (and 6.7.0 too), GeForce2 MX/MX 400 with NVidia drivers (7176 and 7667) with vesa drivers i dont watch this problem. I know other man who encouters a alike problems with GeForce4 mx440. With TNT2 (my previous card) all works good. And one more observation, X freeze arises on several (i know only few) html pages, when they opened by mozilla. If mozilla window not enough large nothing bad happens! But if I maximize window... After I update X.org to 6.8.1 some bad pages, becomes good and i can view they freely. I know method to "unfreeze" computer without remote console. For it before system become unsuitable, one should startx second X server (for example on vt8). And after freeze: Alt-SysRq-K, Alt-F8. (Alt-SysRq-R don't help). After this one can switch to vc/1, for example, or start xterm (on second server) and kill X. Without second server, i'm unable to reset console (on framebuffer) to suitable state. -------------------- It also freezes with the "nv" driver and with the "nvidia" driver from www.nvidia.com. I think this is really a x.org problem because on xfree86 it does not freeze. Please do something because otherwise i will forced to switch back to xfree86 :( I have a agp riva tnt2. This problem apears to affect many people. It is discussed at this address: http://www.nvnews.net/vbulletin/showthread.php?t=49117&page=4 -- Slackware 10.1 ( kernel 2.4.29 and 2.6.12.2 - crashes on both after some time ... last time when i scrolled something in kate ( kde editor ) ) me@darkstar:~# lspci 00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP] 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev 06) 00:07.2 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0 controller] (rev 16) 00:07.3 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0 controller] (rev 16) 00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40) 00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio Controller (rev 50) 00:0c.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 04) 01:00.0 VGA compatible controller: nVidia Corporation NV5M64 [RIVA TNT2 Model 64/Model 64 Pro] (rev 15) AFTER the freeze i log on remotely with ssh and 'killall -9 X' and then restart server. I see this ( for example ) in dmesg after every crash .... NVRM: Xid: 6, PE0000 03fc ffffffff 00000000 0014a7ed 00010001 you can see that a lot of people have this bug. Thank you in advance !!! -------------------- me@darkstar:~# dmesg | grep NVRM | grep Xid NVRM: Xid: 6, PE0000 03fc ffffffff 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0300 00000006 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 07ec 000b0000 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001 this happened today .... till i switched to a newest version of xfree86 ... no freeze ... hope this will help you I have riva tnt2. Steps to reproduce with nvidia driver: 1. Open a font-dialog in kde. 2. Select a scalable font, and set it's size to 64 3. X locked With the nv driver from xorg the xserver will run fine. This is the only way I got the x-server to stall. That is, if I go to a web page with very large fonts, the same thing happens. My setup: GeForce 4800 SE, Abit NF7-S (nForce2 chipset) up to date Gentoo, relevant software: [ebuild R ] media-video/nvidia-glx-1.0.7667 [ebuild R ] media-video/nvidia-kernel-1.0.7667 [ebuild R ] x11-base/opengl-update-2.2.1 [ebuild R ] x11-base/xorg-x11-6.8.2-r2 seems that with the "nvidia" module it freezes on xfree86 also but for the moment with xfree86 with the "nv" driver it does not ... after one day. ( maybe there are some strange things with the xorg "nv" driver ? ) I remind you that xorg 6.8.2 from my slackware freezes with "nv" and "nvidia" and i cannot reproduce this consistently. ( I have riva tnt2 ) I will keep you informed if it crashes with xfree86 "nv" driver also. It is pretty sad ... because without "nvidia" i don't have open gl My xorg freezes both driver , nvidia or nv, on Fedora Core 4 , x86_64 , GeForce FX5200 Has anyone solved this problem with old version of xorg? don't know what to say .... xfree86 does NOT freezes with the "nv" bug. So there may be something usefull. I am refering to xfree86 4.5.0 ( the latest available ). Hoping for a better xorg. yep ... Xfree 4.5.0 with the "nv" driver does not crash. 4 days testing and no freeze. Maybe it is a good point to start. Just a suggestion. (In reply to comment #3) > - Open source developers don't have the source code for proprietary > drivers, and generally speaking can't debug and fix problems that > only occur when using them. They generally turn out to be bugs in > the drivers themselves. I say "generally", not "always", so no need > for anyone to chime in with "this one time, I had a problem and it > was the X server, not Nvidia|ATI|whoever" one-count stats. The fact > is, problems tend to be in the drivers wether they are open or closed > source, and so that leaves the likelyhood of it being fixed generally > up to the company who wrote the driver. I experience this freeze using nothing more than the open-source Xorg and Linux (2.6.12-mm2) Radeon drivers. For the record, it appears to be linked to AGP: I have recompiled the Linux kernel without any AGP support and the freeze is gone. Previously I had disabled direct rendering in the xorg.conf, without improvement, so it is definitely AGP rather than a derivative. component shift to nv. if you are experiencing this bug on other drivers as well please open new bugs for them, in that driver's component. bugzilla is not a forum. I solve my problem. I cannot definitely say were my system was broken, and why X was freeze. But, how I understand, the problem was in bad installed fontconfig. I don't thouch them before, but (by accident) found that fc-list was say: undefined symbol: FcFini. After recompiling fontconfig, installing and ldconfig all becames good. Stefan are you still experiencing this problem using a current version of xorg? Sorry for the slow reply, been up over my head in work lately with 12+h shifts. Am I still experiencing the problem? Well i dont know, still running with my old Savage vidcard from last millenium. Unfortunately right now I have no time at all to try to help with this bug, and when I did have time, 1 year ago, I only got a reply with a lot of BS about how this was everything else then an xorg/nv bug (#3) instead of intelligent feedback from someone that at least know the difference between the open nv driver and the proprietary nvidia driver... I have a radeon 9200se. Fedor core FC5 "yum upgrade" daily. When I use "tvtime", AND watch an mpg/mov with Kaffeine/xine, under KDE. I sometimes get a lockup. This is a hard lock, keyboard is dead completely. The mouse movement works, stays as an arrow, the buttons are dead. I am not on a network so cannot test ssh. Created attachment 6821 [details]
X log and configuration
I have the same problem. The lock happens even with no window manager, just
an with xterm running; i can use the xterm up to the first xterm "scroll".
After that the mouse works fine but keyboard and application freeze.
The only way out is to kill X remotely.
A workaround is to disable a single acceleration in xorg.conf:
Option "XaaNoScreenToScreenCopy"
With this option set everything work fine for me, albeit
scrool is quite slow.
I use a hp xw4300 with:
OS:
Red Hat Enterprise Linux WS release 4 (Nahant Update 4)
vga adapter:
01:00.0 VGA compatible controller: nVidia Corporation NV43GL [Quadro FX 540]
(rev a2)
and xorg:
X Window System Version 6.8.2
Release Date: 9 February 2005
X Protocol Version 11, Revision 0, Release 6.8.2
and kernel:
Linux itvim2rd00087 2.6.9-42.0.2.ELsmp #1 SMP Thu Aug 17 17:57:31 EDT 2006
x86_64 x86_64 x86_64 GNU/Linux
I attach X log and configuration for the working case (apart from the
"XaaNoScreenToScreenCopy" they are just the same for the broken one anyway).
G.
I have this problem on FreeBSD 6.1-RELEASE for AMD64. I'm running an nvidia 6200 LE card with Xorg 7.2-RC3 and nv driver version 1.2.2.1. I have the same problem with Xorg 6.9.0. I was able to isolate the problem to the NVSync() function in nv_xaa.c: void NVSync(ScrnInfoPtr pScrn) { NVPtr pNv = NVPTR(pScrn); if(pNv->DMAKickoffCallback) (*pNv->DMAKickoffCallback)(pScrn); while(READ_GET(pNv) != pNv->dmaPut); while(pNv->PGRAPH[0x0700/4]); } The problem is with the first while loop. Usually READ_GET(pNv) does not equal pNv->dmaPut and it falls right through. Eventually and inevitably this condition fails and the driver goes into an infinite loop. This consumes 100% of the CPU and makes Xorg unusable. I posted this information on the mailing list and was told the FIFO is waiting to be processed by the card but the card is "stuck." I don't really know what that means nor do I know how to "unstick" it. I'm willing to help however I can. I don't know if this is the cause of the problems listed above but I have exactly the same symptoms. Having the same issue with 7.2.0 Often occurs when minimising something to the taskbar in gnome, or scrolling, it doesn't matter if I am using firefox. It seems it doesnt write anything to the X logs, just Xorg process consumes 100% cpu, and screen is unusable until i remotely stop gdm and then kill the Xorg process... Thats if I am using the nv driver. I have found with the nvidia driver, the crash occurs more quickly and is not recoverable. As I can easily reproduce this bug, I will be more than happy to help gather info, just let me know what to do... This bug seems to have reappeared. I'm seeing what is likely the same problem with * xorg-server 1.4, * xf86-video-nv 2.1.6, * FreeBSD 8.0-CURRENT/amd64, * GeForce 6200 LE card. Starting X11 with twm and xterms works, but as soon as I run, say, firefox, the X11 server freezes within a few seconds. It goes into a tight loop (100% CPU) and will not respond to signals. The mouse pointer moves, everything else is frozen. A workaround is to create an xorg.conf file (X -configure) and disable hardware acceleration for nv (Option "NoAccel" "true"). With this, the server will no longer freeze. This is probably the same problem as bug 10341. Created attachment 14960 [details] Tarball containing: Xorg log & config, gdb backtraces, and lspci output Same problem here. X hangs and uses 99% of the CPU. Have to ssh in remotely and kill X to recover. - Fedora Core 8 - GeForce4 MX 4000. - xorg-x11-server-1.3 - xorg-x11-drv-nv-2.1.6 Setting "NoAccel" to "true" (thanks to Christian) eliminates the problem. Attached is a tarball of my Xorg log and config, plus a couple of X gdb backtraces. I have debuginfo for the last backtrace. My results are similar to those in comment 23, but I was hung on the "while(READ_GET(pNv) != pNv->dmaPut);" not "while(pNv->PGRAPH[0x0700/4]);". *** Bug 6161 has been marked as a duplicate of this bug. *** *** Bug 5102 has been marked as a duplicate of this bug. *** *** Bug 16003 has been marked as a duplicate of this bug. *** This bug is common across at least the netbsd,freebsd and linux and is caused by some hardware state. Bug 5102 has a potential patch which fixes the issue. Update, at least for me (FreBSD -Current, Amd64) the patch does not fix the lock Full trace to the cause: 0x000000080239c27d in NVSync (pScrn=0x80c000) at nv_xaa.c:303 303 while(READ_GET(pNv) != pNv->dmaPut); (gdb) bt #0 0x000000080239c27d in NVSync (pScrn=0x80c000) at nv_xaa.c:303 #1 0x0000000803b5635e in XAACopyAreaFallback (pSrc=0x26d0000, pDst=0x845280, pGC=0x85fac0, srcx=0, srcy=0, width=26, height=32, dstx=1727, dsty=100) at xaaFallback.c:83 #2 0x0000000803b58739 in XAACopyArea (pSrcDrawable=0x26d0000, pDstDrawable=0x845280, pGC=0x85fac0, srcx=0, srcy=0, width=26, height=32, dstx=1727, dsty=100) at xaaCpyArea.c:72 #3 0x0000000803baba55 in cwCopyArea (pSrc=0x26d0000, pDst=0x845280, pGC=0x85fac0, srcx=0, srcy=0, w=26, h=32, dstx=1727, dsty=100) at cw_ops.c:201 #4 0x000000000059b591 in damageCopyArea (pSrc=0x26d0000, pDst=0x845280, pGC=0x85fac0, srcx=0, srcy=0, width=26, height=32, dstx=1727, dsty=100) at damage.c:830 #5 0x000000000050b222 in miDCRestoreUnderCursor (pDev=0x84ee80, pScreen=0x829c00, x=1727, y=100, w=26, h=32) at midispcur.c:616 #6 0x00000000005228f5 in miSpriteRemoveCursor (pDev=0x84ee80, pScreen=0x829c00) at misprite.c:938 #7 0x00000000005224cf in miSpriteSetCursor (pDev=0x84ee80, pScreen=0x829c00, pCursor=0x265eb20, x=1736, y=108) at misprite.c:826 #8 0x00000000005225e3 in miSpriteMoveCursor (pDev=0x84ee80, pScreen=0x829c00, x=1736, y=108) at misprite.c:857 #9 0x0000000000518a7f in miPointerUpdateSprite (pDev=0x84ee80) at mipointer.c:451 #10 0x000000000050cda4 in mieqProcessInputEvents () at mieq.c:386 #11 0x00000000004a99ab in ProcessInputEvents () at xf86Events.c:241 #12 0x000000000044a0dd in Dispatch () at dispatch.c:411 #13 0x000000000042dd55 in main (argc=5, argv=0x7fffffffe630, envp=0x7fffffffe660) at main.c:435 debian bug http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=370709 has some work arounds as well as an audit trail of what XAA options seem to prevent the problem. Turns out it is a memory_barrier issue with out of order cpu instruction execution. It was simple to repeat the hang by using ls -R / in a transparent aterm on a non i386 machine. The fix was similar to the NetBSD fix in Bug 5102 though it needs to be slightly different. In nv_local.h we have: #if defined(__i386__) #define _NV_FENCE() outb(0x3D0, 0); #else #define _NV_FENCE() mem_barrier(); #endif #define WRITE_PUT(pNv, data) { \ volatile CARD8 scratch; \ _NV_FENCE() \ scratch = (pNv)->FbStart[0]; \ (pNv)->FIFO[0x0010] = (data) << 2; \ mem_barrier(); \ } Under amd64, mem_barrier is a nop hence scratch = (pNv)->FbStart[0]; \ (pNv)->FIFO[0x0010] = (data) << 2; \ may be executed out of order. The NetBSD fix defined mem_barrier to actually do something. However, this caused a double barrier via _NV_FENCE and mem_barrier. The correct fix is to leave mem_barrier as a nop but define NV_FENCE to be a barrier. Ie: diff --git a/src/nv_local.h b/src/nv_local.h index 74cdc09..ecde69e 100644 --- a/src/nv_local.h +++ b/src/nv_local.h @@ -82,7 +82,7 @@ typedef unsigned int U032; #if defined(__i386__) #define _NV_FENCE() outb(0x3D0, 0); #else -#define _NV_FENCE() mem_barrier(); +#define _NV_FENCE() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)": : :"memory"); #endif #define WRITE_PUT(pNv, data) { \ Hence we end up with one barrier and things work! Hi, I have the same symptoms with a stock Ubuntu 8.04, latest updates, but Ati X800GT + default opensource driver. Since the fix described here obviously does not apply, should I open another bug for ati driver? How likely is ti that the bug is also present in the opensource ati driver? :-) xf86-video-nv has been officially unmaintained for a bit now, and we are closing all -nv bugs. If your problem was not addressed, and -nv is still broken, please try xf86-video-nouveau. Thank you. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.