Bug 3168 - X freezes (100%CPU, mouse moves) with Nvidia card but not with S3 S2k
X freezes (100%CPU, mouse moves) with Nvidia card but not with S3 S2k
Status: RESOLVED WONTFIX
Product: xorg
Classification: Unclassified
Component: Driver/nVidia (open)
unspecified
x86 (IA32) Linux (All)
: high critical
Assigned To: Aaron Plattner
Xorg Project Team
:
: 5102 6161 16003 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-04-30 23:17 UTC by Stefan Huszics
Modified: 2011-09-14 13:29 UTC (History)
9 users (show)

See Also:
i915 platform:
i915 features:


Attachments
X log and configuration (47.39 KB, text/plain)
2006-09-05 06:13 UTC, gianluca.bobbo
no flags Details
Tarball containing: Xorg log & config, gdb backtraces, and lspci output (80.00 KB, application/x-tar)
2008-03-08 10:03 UTC, Adam
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Huszics 2005-04-30 23:17:23 UTC
Details

- The X server freezes: you can move the mouse, but the clicks aren't
recognized, the keyboard is dead (No leds, no ctrl+alt+fX works)

Solutions:
- Hardware reboot
- Use an old Savage 2000 vidcard (driver savage) instead of my Nvidia GF4
(driver nv)

This bug seems VERY similar to 
https://bugs.freedesktop.org/show_bug.cgi?id=2155
but since I've concluded neither Firefox or scrollwheel is the original
reason/culpit AND I have a 100% fix to the problem by changing the hardware, I
decided to make a new bug to help you track down where this problem lies instead
risking you having to chase down multiple nonrelated issues.

What I have tried

Even if I make my mouse (mx500) a regular 3 button with no scrollwheel ability
in the X configfile, the freezes still happens, so its not likely to be a
scrollwheel related problem originally.

If I run Firefox with multiple tabs and switch between the tabs the screen will
freeze up within seconds. But it will crash sooner or later even if I never even
start Firefox. Thus Im pretty confident FF is also not the cause of this, just a
catalyst to making it happen faster. The main "last action" before it freezes
when not running FF is to open MC in a terminal window and maximize it (the
window resizes and the content starts painting but before it finishes screen
freezes). Thus it seems to mostly happen when a lot of the screencontent is
being over/rewritten (switching tab/maximizing window).

However, with the S3 card, I can use Firefox etc all day long without ever
having any problems with the screen freezing up & X to use 100% CPU. Only
problem for me with using the S3 card all the time is that XV isnt working in it
(thus no video in mplayer or TVtime). Thus, if I want to watch a movie/TV I need
to shutdown & swap vidcard and keep my fingers crossed that it doesnt lock up
before I start watching the movie/TV (noteable though is that if I manage to get
eg a movie playing it never freezes/lock up during the show)

Im running on uptodate Gentoo stable with UDEV (no DEVFS) & AMD Barton on an
Asus Nforce2 MB

So, tell me what more info you need to help you track down what the problem is.
Also I'm sort of a Linux newbie so please be relatively detailed in your
instructions of what you need.

== lspci ==

0000:00:00.0 Host bridge: nVidia Corporation nForce2 AGP (different version?)
(rev a2)
0000:00:00.1 RAM memory: nVidia Corporation nForce2 Memory Controller 1 (rev a2)
0000:00:00.2 RAM memory: nVidia Corporation nForce2 Memory Controller 4 (rev a2)
0000:00:00.3 RAM memory: nVidia Corporation nForce2 Memory Controller 3 (rev a2)
0000:00:00.4 RAM memory: nVidia Corporation nForce2 Memory Controller 2 (rev a2)
0000:00:00.5 RAM memory: nVidia Corporation nForce2 Memory Controller 5 (rev a2)
0000:00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a3)
0000:00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2)
0000:00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev a3)
0000:00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev a3)
0000:00:02.2 USB Controller: nVidia Corporation nForce2 USB Controller (rev a3)
0000:00:04.0 Ethernet controller: nVidia Corporation nForce2 Ethernet Controller
(rev a1)
0000:00:08.0 PCI bridge: nVidia Corporation nForce2 External PCI Bridge (rev a3)
0000:00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2)
0000:00:0d.0 FireWire (IEEE 1394): nVidia Corporation nForce2 FireWire (IEEE
1394) Controller (rev a3)
0000:00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev a2)
0000:01:08.0 Multimedia audio controller: Aureal Semiconductor Vortex 2 (rev fe)
0000:01:0a.0 Multimedia controller: Philips Semiconductors SAA7134 (rev 01)
0000:02:00.0 VGA compatible controller: S3 Inc. 86C410 Savage 2000 (rev 02)


== emerge -p nvidia-glx nvidia-kernel opengl-update xorg-x11 ==

[ebuild     UD] media-video/nvidia-kernel-1.0.6629-r4 [1.0.7174]
[ebuild   R   ] media-video/nvidia-glx-1.0.6629-r1
[ebuild   R   ] media-video/nvidia-kernel-1.0.7174
[ebuild   R   ] x11-base/opengl-update-2.1.1-r1
[ebuild   R   ] x11-base/xorg-x11-6.8.2-r1

==  emerge info ==

Portage 2.0.51.19 (default-linux/x86/2005.0, gcc-3.3.5-20050130,
glibc-2.3.4.20041102-r1, 2.6.10-gentoo-r6 i686)
=================================================================
System uname: 2.6.10-gentoo-r6 i686 AMD Athlon(tm) XP 2700+
Gentoo Base System version 1.4.16
Python:              dev-lang/python-2.3.4-r1 [2.3.4 (#1, Feb 18 2005, 10:55:25)]
distcc 2.16 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
dev-lang/python:     2.3.4-r1
sys-devel/autoconf:  2.59-r6, 2.13
sys-devel/automake:  1.7.9-r1, 1.8.5-r3, 1.5, 1.4_p6, 1.6.3, 1.9.4
sys-devel/binutils:  2.15.92.0.2-r7
sys-devel/libtool:   1.5.14
virtual/os-headers:  2.6.8.1-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CFLAGS="-mcpu=athlon-xp -O2 -pipe -fomit-frame-pointer -fstack-protector"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.2/share/config
/usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown
/usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-mcpu=athlon-xp -O2 -pipe -fomit-frame-pointer -fstack-protector"
DISTDIR="/var/portage/distfiles"
FEATURES="autoaddcvs autoconfig ccache distlocks sandbox sfperms strict"
GENTOO_MIRRORS="ftp://ftp.du.se/pub/os/gentoo http://mirror.pudas.net/gentoo"
LC_ALL="sv_SE"
MAKEOPTS="-j4"
PKGDIR="/var/portage/packages"
PORTAGE_TMPDIR="/var/portage/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/portage/local"
SYNC="rsync://rsync.se.gentoo.org/gentoo-portage"
USE="x86 3dnow 3dnowext X aalib alsa apache2 apm arts avi berkdb bitmap-fonts
cdr crypt cups curl dga directfb divx4linux doc dvd dvdread emboss encode esd
fam fbcon flac foomaticdb fortran gdbm gif gimpprint gnome gpm gstreamer gtk
gtk2 imagemagick imlib ipv6 java jpeg junit kde ldap libg++ libwww live mad
matroska mikmod mmx mmxext mng motif mp3 mpeg msn mysql nas ncurses nls nozaptel
ogg oggvorbis opengl oss pam pdflib perl png ppds python qt quicktime readline
rtc samba scanner sdl slang speex spell sse ssl svga tcltk tcpd theora tiff
truetype truetype-fonts type1-fonts unicode usb v4l2 vorbis xine xinerama xml2
xmms xv xvid zlib"
Unset:  ASFLAGS, CBUILD, CTARGET, LANG, LDFLAGS, LINGUAS
Comment 1 Felix 2005-05-05 10:17:24 UTC
Hi, 
This is not only Nvidia sole problem, their rival has exactly the same problem.
It seems these two companies don't care about their linux customers anymore. I
have been crying to ATI to fix this problem since last year. Yet, this bug is
not even listed as their known problem. Nvidia is never acknowledge this problem
either. 
Check out my thread in Rage3D forum. Mouse move but screen freeze, ssh-able, X
process consumes 99% CPU utilization, killing X only makes the system completely
freeze.

http://www.rage3d.com/board/showthread.php?t=33800697

Comment 2 Stefan Huszics 2005-05-05 10:41:47 UTC
I mentioned earlier that it never locks once I start looking at a movie. Well
never say never, it did just that 2 days ago (though this is the only time sofar).

Also, when I read my own bugreport I notice I forgot to mention that it is
possible to SSH in from a remote mashine and
killall -9 X
startx
to again get graphics working, so a compleate reboot is not nessecary (if you
happen to have access to a second system).
Comment 3 Mike A. Harris 2005-05-10 23:09:08 UTC
There's not really enough information in this bug report to even remotely
conclude if it is a video driver bug, X server bug, mouse driver bug,
kernel bug, proprietary kernel module bug, buggy motherboard or BIOS,
overheating problem, bad memory, or any one of tens of other potential
problems.

If you want anyone to investigate the problem, you're more or less going
to have to provide a lot more detail than what's here, and try to narrow
the problem down as far as possible.  Join xorg@freedesktop.org and try
to find other people who have the same problem.  See what their system
has in common with yours (if anything), including any motherboard, chipset,
video card brand/model, revision, BIOS version, kernel version, compiler
options used, absolutely anything at all that can provide enough
evidence that it is:

- A common problem being experienced by more than 1 person.  Where
  "more", is more than 2 people if possible, and if it's 10 or 50
  people, that's even better.

- Perhaps common to certain hardware and/or software combination and/or
  configuration options for the video card, CMOS, kernel, X server,
  etc.

If possible debug the X server via remote, and try to narrow down where
the problem is happening.  This is notoriously difficult to impossible
if you're using proprietary drivers, unless you get insanely lucky.


The reason I say all of this, is to try to point out what needs to happen
for you to realistically have a chance of someone actively trying to
investigate the problem, because:

- Open source developers don't have the source code for proprietary
  drivers, and generally speaking can't debug and fix problems that
  only occur when using them.  They generally turn out to be bugs in
  the drivers themselves.  I say "generally", not "always", so no need
  for anyone to chime in with "this one time, I had a problem and it
  was the X server, not Nvidia|ATI|whoever" one-count stats.  The fact
  is, problems tend to be in the drivers wether they are open or closed
  source, and so that leaves the likelyhood of it being fixed generally
  up to the company who wrote the driver.

- Proprietary driver developers, working at the hardware companies are
  not likely to investigate bug reports one single user is reporting, or
  even 2 users, unless the problem report is very highly detailed and
  contains enough compelling information for the vendor to conclude it
  is probably a driver problem and assess the problem as being something
  they consider important enough to investigate and fix.

So far, while I do see a problem being reported here, someone who experiences
it, is probably going to have to take out their secret decoder ring,
and fill bugzilla with config files, log files, stack traces, and other
debugging information before it's useful to X.org developers, Nvidia,
or anyone really.

Sorry to be blunt, but I'm just trying to help by steering you in the
right direction.  Merely complaining about things wont get a solution
for the problem.

Hope this helps.
Comment 4 FreeDesktop Bugzilla Database Corruption Fix User 2005-06-30 12:16:53 UTC
Same behavior on Suse 9.3 for x86-64 running on Dell Precision 370 (P4 3.2Ghz 
witn HT, 2Gb Ram, NVIDIA Quattro 330 PCIE card and "nv" driver). 
Comment 5 FreeDesktop Bugzilla Database Corruption Fix User 2005-06-30 12:20:05 UTC
Same behavior on  Suse 9.3 for x86-64 running on Dell Precision 370 (P4 3.2 Ghz 
with HT, 2Gb RAM, nVidia Corporation NV37GL PCIE card and "nv" driver). 
Comment 6 FreeDesktop Bugzilla Database Corruption Fix User 2005-07-11 06:11:29 UTC
It is as the original complaint describes.  The X hang only hangs when the X
server is redrawing the screen.  Initially, my box (Nvidia 4200) would hang when
my screen saver ran for a while.  I thought the problem to be the screen saver
itself so I disabled it.  The hangs often occur when Firefox or Mozilla render
images or if I quickly resize a window.  The CPU goes to about 100% and the X
server becomes unresponsive.  The mouse moves but clicks are not registered.  If
I ssh in from another machine I can do a kill -9 on X to reset the session.  No
other kill sig seems to make a difference.  After the X session is killed, it is
impossible to get a character session to appear.  That is, if you ctr-alt-F1 the
text console never appears.  If I shutdown from the login screen at that point,
the screen becomes corrupted (lots of colors and blinking characters) until the
machine shuts down.  I swapped in an Nvidia 5200 to test that my card was not
bad and the same thing happens.  The hangs happen in both the KDE and Gnome
desktops.
Comment 7 Mauro 2005-07-30 19:38:00 UTC
Same problem on Fedora Core 4, x86_64,GeForce FX5200 with nvidia or nv driver
loaded.
Comment 8 rgo 2005-08-01 11:45:58 UTC
I have alike problem.
System: Slackware 10, kernel 2.6.12 (and 2.6.11 too), X.org 6.8.1 (and 6.7.0
too), GeForce2 MX/MX 400 with NVidia drivers (7176 and 7667) with vesa drivers i
dont watch this problem. I know other man who encouters a alike problems with
GeForce4 mx440. With TNT2 (my previous card) all works good.

And one more observation, X freeze arises on several (i know only few) html
pages, when they opened by mozilla. If mozilla window not enough large nothing
bad happens! But if I maximize window... After I update X.org to 6.8.1 some bad
pages, becomes good and i can view they freely.

I know method to "unfreeze" computer without remote console. For it before
system become unsuitable, one should startx second X server (for example on
vt8). And after freeze: Alt-SysRq-K, Alt-F8. (Alt-SysRq-R don't help). After
this one can switch to vc/1, for example, or start xterm (on second server) and
kill X. Without second server, i'm unable to reset console (on framebuffer) to
suitable state.
Comment 9 Iulian Serbanoiu 2005-08-02 16:54:55 UTC
--------------------

It also freezes with the "nv" driver and with the "nvidia" driver from
www.nvidia.com. I think this is really a x.org problem because on
xfree86 it does not freeze.

Please do something because otherwise i will forced to switch back to xfree86 :(

I have a agp riva tnt2. This problem apears to affect many people.

It is discussed at this address:

http://www.nvnews.net/vbulletin/showthread.php?t=49117&page=4

-- Slackware 10.1 ( kernel 2.4.29 and 2.6.12.2 - crashes on both after some
time ... last time when i scrolled something in kate ( kde editor ) )

me@darkstar:~# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev 06)
00:07.2 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0 controller] (rev 16)
00:07.3 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0 controller] (rev 16)
00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio
Controller (rev 50)
00:0c.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 04)
01:00.0 VGA compatible controller: nVidia Corporation NV5M64 [RIVA TNT2 Model
64/Model 64 Pro] (rev 15)

AFTER the freeze i log on remotely with ssh and 'killall -9 X' and then
restart server. 

I see this ( for example ) in dmesg after every crash ....

NVRM: Xid: 6, PE0000 03fc ffffffff 00000000 0014a7ed 00010001

you can see that a lot of people have this bug.

Thank you in advance !!!

--------------------
Comment 10 Iulian Serbanoiu 2005-08-03 01:42:36 UTC
me@darkstar:~# dmesg | grep NVRM | grep Xid
NVRM: Xid: 6, PE0000 03fc ffffffff 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0300 00000006 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 07ec 000b0000 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001
NVRM: Xid: 6, PE0000 0414 00010001 00000000 0014a7ed 00010001


this happened today ....
till i switched to a newest version of xfree86 ... no freeze ...

hope this will help you

I have riva tnt2.
Comment 11 Richard Jonsson 2005-08-03 06:09:13 UTC
Steps to reproduce with nvidia driver:
1. Open a font-dialog in kde.
2. Select a scalable font, and set it's size to 64
3. X locked

With the nv driver from xorg the xserver will run fine.

This is the only way I got the x-server to stall. That is, if I go to a web page
with very large fonts, the same thing happens.

My setup:
GeForce 4800 SE, Abit NF7-S (nForce2 chipset)
up to date Gentoo, relevant software:

[ebuild   R   ] media-video/nvidia-glx-1.0.7667
[ebuild   R   ] media-video/nvidia-kernel-1.0.7667
[ebuild   R   ] x11-base/opengl-update-2.2.1
[ebuild   R   ] x11-base/xorg-x11-6.8.2-r2
Comment 12 Iulian Serbanoiu 2005-08-03 18:27:08 UTC
seems that with the "nvidia" module it freezes on xfree86 also
but for the moment with xfree86 with the "nv" driver it does not ... after
one day.

( maybe there are some strange things with the xorg "nv" driver ? )

I remind you that xorg 6.8.2 from my slackware freezes with "nv" and "nvidia"
and i cannot reproduce this consistently.

( I have riva tnt2 )

I will keep you informed if it crashes with xfree86 "nv" driver also.

It is pretty sad ... because without "nvidia" i don't have open gl
Comment 13 Mauro 2005-08-04 05:44:24 UTC
My xorg freezes both driver , nvidia or nv, on Fedora Core 4 , x86_64 , GeForce
FX5200
Has anyone solved this problem with old version of xorg?
Comment 14 Iulian Serbanoiu 2005-08-04 15:31:49 UTC
don't know what to say ....

xfree86 does NOT freezes with the "nv" bug. So there may be something usefull.

I am refering to xfree86 4.5.0 ( the latest available ).

Hoping for a better xorg.
Comment 15 Iulian Serbanoiu 2005-08-09 16:10:39 UTC
yep ... Xfree 4.5.0 with the "nv" driver does not crash.

4 days testing and no freeze.

Maybe it is a good point to start. Just a suggestion.
Comment 16 Ryan Reich 2005-08-23 20:29:34 UTC
(In reply to comment #3)
> - Open source developers don't have the source code for proprietary
>   drivers, and generally speaking can't debug and fix problems that
>   only occur when using them.  They generally turn out to be bugs in
>   the drivers themselves.  I say "generally", not "always", so no need
>   for anyone to chime in with "this one time, I had a problem and it
>   was the X server, not Nvidia|ATI|whoever" one-count stats.  The fact
>   is, problems tend to be in the drivers wether they are open or closed
>   source, and so that leaves the likelyhood of it being fixed generally
>   up to the company who wrote the driver.

I experience this freeze using nothing more than the open-source Xorg and Linux
(2.6.12-mm2) Radeon drivers.  For the record, it appears to be linked to AGP: I
have recompiled the Linux kernel without any AGP support and the freeze is gone.
 Previously I had disabled direct rendering in the xorg.conf, without
improvement, so it is definitely AGP rather than a derivative.
Comment 17 ajax at nwnk dot net 2005-08-28 14:46:10 UTC
component shift to nv.  if you are experiencing this bug on other drivers as
well please open new bugs for them, in that driver's component.

bugzilla is not a forum.
Comment 18 rgo 2005-09-02 12:18:33 UTC
I solve my problem. I cannot definitely say were my system was broken, and why X
was freeze. But, how I understand, the problem was in bad installed fontconfig.
I don't thouch them before, but (by accident) found that fc-list was say:
undefined symbol: FcFini. After recompiling fontconfig, installing and ldconfig
all becames good.
Comment 19 Erik Andren 2006-04-23 06:39:22 UTC
Stefan are you still experiencing this problem using a current version of xorg?
Comment 20 Stefan Huszics 2006-05-10 13:33:15 UTC
Sorry for the slow reply, been up over my head in work lately with 12+h shifts.

Am I still experiencing the problem? Well i dont know, still running with my old
Savage vidcard from last millenium. Unfortunately right now I have no time at
all to try to help with this bug, and when I did have time, 1 year ago, I only
got a reply with a lot of BS about how this was everything else then an xorg/nv
bug (#3) instead of intelligent feedback from someone that at least know the
difference between the open nv driver and the proprietary nvidia driver...
Comment 21 Mick Mearns 2006-05-17 08:15:41 UTC
I have a radeon 9200se.
Fedor core FC5 "yum upgrade" daily.

When I use "tvtime", AND watch an mpg/mov with Kaffeine/xine, under KDE.
I sometimes get a lockup.
This is a hard lock, keyboard is dead completely.
The mouse movement works, stays as an arrow, the buttons are dead.
I am not on a network so cannot test ssh.
Comment 22 gianluca.bobbo 2006-09-05 06:13:21 UTC
Created attachment 6821 [details]
X log and configuration

I have the same problem. The lock happens even with no window manager, just
an with xterm running; i can use the xterm up to the first xterm "scroll".
After that the mouse works fine but keyboard and application freeze.
The only way out is to kill X remotely.

A workaround is to disable a single acceleration in xorg.conf:

Option "XaaNoScreenToScreenCopy"

With this option set everything work fine for me, albeit
scrool is quite slow.

I use a hp xw4300 with:

OS:
Red Hat Enterprise Linux WS release 4 (Nahant Update 4)

vga adapter:
01:00.0 VGA compatible controller: nVidia Corporation NV43GL [Quadro FX 540]
(rev a2)

and xorg: 
X Window System Version 6.8.2
Release Date: 9 February 2005
X Protocol Version 11, Revision 0, Release 6.8.2

and kernel:
Linux itvim2rd00087 2.6.9-42.0.2.ELsmp #1 SMP Thu Aug 17 17:57:31 EDT 2006
x86_64 x86_64 x86_64 GNU/Linux

I attach X log and configuration for the working case (apart from the
"XaaNoScreenToScreenCopy" they are just the same for the broken one anyway).

G.
Comment 23 Chris Radlinski 2007-01-12 07:34:12 UTC
I have this problem on FreeBSD 6.1-RELEASE for AMD64.  I'm running an nvidia
6200 LE card with Xorg 7.2-RC3 and nv driver version 1.2.2.1.  I have the same
problem with Xorg 6.9.0.  I was able to isolate the problem to the NVSync()
function in nv_xaa.c:

void NVSync(ScrnInfoPtr pScrn)
{
    NVPtr pNv = NVPTR(pScrn);

    if(pNv->DMAKickoffCallback)
       (*pNv->DMAKickoffCallback)(pScrn);

    while(READ_GET(pNv) != pNv->dmaPut);

    while(pNv->PGRAPH[0x0700/4]);
}

The problem is with the first while loop.  Usually READ_GET(pNv) does not equal
pNv->dmaPut and it falls right through.  Eventually and inevitably this
condition fails and the driver goes into an infinite loop.  This consumes 100%
of the CPU and makes Xorg unusable.

I posted this information on the mailing list and was told the FIFO is waiting
to be processed by the card but the card is "stuck."  I don't really know what
that means nor do I know how to "unstick" it.  I'm willing to help however I can.

I don't know if this is the cause of the problems listed above but I have
exactly the same symptoms.
Comment 24 Isaac Sutcliffe 2007-08-21 19:05:55 UTC
Having the same issue with 7.2.0

Often occurs when minimising something to the taskbar in gnome, or scrolling, it doesn't matter if I am using firefox.

It seems it doesnt write anything to the X logs, just Xorg process consumes 100% cpu, and screen is unusable until i remotely stop gdm and then kill the Xorg process...

Thats if I am using the nv driver.

I have found with the nvidia driver, the crash occurs more quickly and is not recoverable.

As I can easily reproduce this bug, I will be more than happy to help gather info, just let me know what to do...
Comment 25 Isaac Sutcliffe 2007-09-19 11:16:10 UTC
This bug seems to have reappeared.
Comment 26 Christian Weisgerber 2008-01-26 11:14:41 UTC
I'm seeing what is likely the same problem with
* xorg-server 1.4,
* xf86-video-nv 2.1.6,
* FreeBSD 8.0-CURRENT/amd64,
* GeForce 6200 LE card.

Starting X11 with twm and xterms works, but as soon as I run, say, firefox, the X11 server freezes within a few seconds.  It goes into a tight loop (100% CPU) and will not respond to signals.  The mouse pointer moves, everything else is frozen.

A workaround is to create an xorg.conf file (X -configure) and disable hardware acceleration for nv (Option "NoAccel" "true").  With this, the server will no longer freeze.

This is probably the same problem as bug 10341.
Comment 27 Adam 2008-03-08 10:03:01 UTC
Created attachment 14960 [details]
Tarball containing: Xorg log & config, gdb backtraces, and lspci output

Same problem here.  X hangs and uses 99% of the CPU.  Have to ssh in remotely and kill X to recover.
- Fedora Core 8 
- GeForce4 MX 4000.
- xorg-x11-server-1.3
- xorg-x11-drv-nv-2.1.6

Setting "NoAccel" to "true" (thanks to Christian) eliminates the problem.

Attached is a tarball of my Xorg log and config, plus a couple of X gdb backtraces.  I have debuginfo for the last backtrace.  My results are similar to those in comment 23, but I was hung on the "while(READ_GET(pNv) != pNv->dmaPut);" not "while(pNv->PGRAPH[0x0700/4]);".
Comment 28 Benjamin Close 2008-05-19 19:32:47 UTC
*** Bug 6161 has been marked as a duplicate of this bug. ***
Comment 29 Benjamin Close 2008-05-19 19:33:58 UTC
*** Bug 5102 has been marked as a duplicate of this bug. ***
Comment 30 Benjamin Close 2008-05-19 19:34:41 UTC
*** Bug 16003 has been marked as a duplicate of this bug. ***
Comment 31 Benjamin Close 2008-05-19 19:37:35 UTC
This bug is common across at least the netbsd,freebsd and linux and is caused by some hardware state. Bug 5102 has a potential patch which fixes the issue.
Comment 32 Benjamin Close 2008-05-19 20:22:09 UTC
Update, at least for me (FreBSD -Current, Amd64) the patch does not fix the lock
Comment 33 Benjamin Close 2008-05-20 22:50:57 UTC
Full trace to the cause:

0x000000080239c27d in NVSync (pScrn=0x80c000) at nv_xaa.c:303
303         while(READ_GET(pNv) != pNv->dmaPut);
(gdb) bt
#0  0x000000080239c27d in NVSync (pScrn=0x80c000) at nv_xaa.c:303
#1  0x0000000803b5635e in XAACopyAreaFallback (pSrc=0x26d0000, pDst=0x845280, pGC=0x85fac0, srcx=0, srcy=0, width=26, height=32, dstx=1727, dsty=100)
    at xaaFallback.c:83
#2  0x0000000803b58739 in XAACopyArea (pSrcDrawable=0x26d0000, pDstDrawable=0x845280, pGC=0x85fac0, srcx=0, srcy=0, width=26, height=32, dstx=1727, dsty=100)
    at xaaCpyArea.c:72
#3  0x0000000803baba55 in cwCopyArea (pSrc=0x26d0000, pDst=0x845280, pGC=0x85fac0, srcx=0, srcy=0, w=26, h=32, dstx=1727, dsty=100) at cw_ops.c:201
#4  0x000000000059b591 in damageCopyArea (pSrc=0x26d0000, pDst=0x845280, pGC=0x85fac0, srcx=0, srcy=0, width=26, height=32, dstx=1727, dsty=100) at damage.c:830
#5  0x000000000050b222 in miDCRestoreUnderCursor (pDev=0x84ee80, pScreen=0x829c00, x=1727, y=100, w=26, h=32) at midispcur.c:616
#6  0x00000000005228f5 in miSpriteRemoveCursor (pDev=0x84ee80, pScreen=0x829c00) at misprite.c:938
#7  0x00000000005224cf in miSpriteSetCursor (pDev=0x84ee80, pScreen=0x829c00, pCursor=0x265eb20, x=1736, y=108) at misprite.c:826
#8  0x00000000005225e3 in miSpriteMoveCursor (pDev=0x84ee80, pScreen=0x829c00, x=1736, y=108) at misprite.c:857
#9  0x0000000000518a7f in miPointerUpdateSprite (pDev=0x84ee80) at mipointer.c:451
#10 0x000000000050cda4 in mieqProcessInputEvents () at mieq.c:386
#11 0x00000000004a99ab in ProcessInputEvents () at xf86Events.c:241
#12 0x000000000044a0dd in Dispatch () at dispatch.c:411
#13 0x000000000042dd55 in main (argc=5, argv=0x7fffffffe630, envp=0x7fffffffe660) at main.c:435
Comment 34 Benjamin Close 2008-05-20 22:56:58 UTC
debian bug http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=370709 has some work arounds as well as an audit trail of what XAA options seem to prevent the problem.
Comment 35 Benjamin Close 2008-06-02 22:47:27 UTC
Turns out it is a memory_barrier issue with out of order cpu instruction execution. It was simple to repeat the hang by using ls -R / in a transparent aterm on a non i386 machine.

The fix was similar to the NetBSD fix in Bug 5102 though it needs to be slightly different.

In nv_local.h we have:

#if defined(__i386__)
#define _NV_FENCE() outb(0x3D0, 0);
#else
#define _NV_FENCE() mem_barrier();
#endif

#define WRITE_PUT(pNv, data) {       \
  volatile CARD8 scratch;            \
  _NV_FENCE()                        \
  scratch = (pNv)->FbStart[0];       \
  (pNv)->FIFO[0x0010] = (data) << 2; \
  mem_barrier();                     \
}


Under amd64, mem_barrier is a nop hence 
 scratch = (pNv)->FbStart[0];       \
  (pNv)->FIFO[0x0010] = (data) << 2; \

may be executed out of order. The NetBSD fix defined mem_barrier to actually do something. However, this caused a double barrier via _NV_FENCE and mem_barrier.
The correct fix is to leave mem_barrier as a nop but define NV_FENCE to be a barrier. Ie:

diff --git a/src/nv_local.h b/src/nv_local.h
index 74cdc09..ecde69e 100644
--- a/src/nv_local.h
+++ b/src/nv_local.h
@@ -82,7 +82,7 @@ typedef unsigned int   U032;
 #if defined(__i386__)
 #define _NV_FENCE() outb(0x3D0, 0);
 #else
-#define _NV_FENCE() mem_barrier();
+#define _NV_FENCE() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)": : :"memory");
 #endif
 
 #define WRITE_PUT(pNv, data) {       \


Hence we end up with one barrier and things work!
Comment 36 tomsen 2008-09-29 01:08:38 UTC
Hi,
I have the same symptoms with a stock Ubuntu 8.04, latest updates, but Ati X800GT + default opensource driver. Since the fix described here obviously does not apply, should I open another bug for ati driver? How likely is ti that the bug is also present in the opensource ati driver? :-) 
Comment 37 Corbin Simpson 2011-09-14 12:35:26 UTC
xf86-video-nv has been officially unmaintained for a bit now, and we are closing all -nv bugs. If your problem was not addressed, and -nv is still broken, please try xf86-video-nouveau. Thank you.