Bug 3581

Summary:	X "sporadically" heavily loading CPU
Product:	xorg	Reporter:	Daniel Stone <daniel>
Component:	Server/General	Assignee:	Xorg Project Team <xorg-team>
Status:	RESOLVED INVALID	QA Contact:	Xorg Project Team <xorg-team>
Severity:	critical
Priority:	highest	CC:	m.debruijne
Version:	6.8.2
Hardware:	Other
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:

Description FreeDesktop Bugzilla Database Corruption Fix User 2005-06-20 11:29:50 UTC

This bug report derives from http://bugs.gentoo.org/show_bug.cgi?id=96538: 
 
Without identifiable cause X does sometimes (well, too often to work with the  
system) heavily load the CPU. strace -p freezes the machine most of the time.  
I was able to provide a log with shows an endless gettimeofday loop.  
 
gettimeofday({1119210368, 403147}, NULL) = 0   
read(26, "5\30\4\0002\301\252\2F\0\0\0 \0 \0>\377\7\0\313\306\273"..., 4120) =   
4120   
read(26, "\377\377\377\377\377\377\377\377\250\250\250\377\375\375"..., 260) =   
260   
read(26, "5\1\4\0006\301\252\2F\0\0\0 \0 \0007\377\4\0007\301\252"..., 4120) =   
1416   
writev(26, [{"\16\211m\"4\301\252\2\0\0>\r\0\0\0\0\0\0\0\0 \0\0\0 \0"..., 64},   
{"\377\377\377\377\241\241\241\377\317\317\317   
\377\317\317"..., 4096}], 2) = 4160   
read(26, 0xd935640, 4120)               = -1 EAGAIN (Resource temporarily   
unavailable)   
gettimeofday({1119210368, 404304}, NULL) = 0   
select(256, [1 3 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27   
28 29 30 31 32 33 34 35 38], NULL, NULL, {1344,   
469000}) = 1 (in [26], left {1344, 469000})   
gettimeofday({1119210368, 404716}, NULL) = 0   
read(26, "I\2\5\0:\304\273\2\0\0\0\0 \0 \0\377\377\377\377", 4120) = 20   
read(26, 0xd935640, 4120)               = -1 EAGAIN (Resource temporarily   
unavailable)   
writev(26, [{"\1\10\203\"\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0@V\223"...,   
1056}], 1) = 1056   
gettimeofday({1119210368, 405298}, NULL) = 0   
select(256, [1 3 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27   
28 29 30 31 32 33 34 35 38], NULL, NULL, {1344,   
468000}) = 1 (in [26], left {1344, 468000})   
gettimeofday({1119210368, 405831}, NULL) = 0   
read(26, "5\30\4\0<\301\252\2F\0\0\0 \0 \0>\377\7\0007\304\273\2"..., 4120) =   
4120   
read(26, "\377\377\377\377\377\377\377\377\250\250\250\377\375\375"..., 260) =   
260   
read(26, "5\1\4\0@\301\252\2F\0\0\0 \0 \0007\377\4\0A\301\252\2@"..., 4120) =   
1416   
writev(26, [{"\16A\217\">\301\252\2\0\0>\t\0\0\0\0\0\0\0\0 \0\0\0 \0"..., 64},   
{"\377\377\377\377\377\377\377\377\317\317\317   
\377\317\317"..., 4096}], 2) = 4160   
read(26, 0xd935640, 4120)               = -1 EAGAIN (Resource temporarily   
unavailable)   
gettimeofday({1119210368, 407633}, NULL) = 0   
select(256, [1 3 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27   
28 29 30 31 32 33 34 35 38], NULL, NULL, {1344,   
466000}) = 1 (in [19], left {1344, 466000})

Comment 1 FreeDesktop Bugzilla Database Corruption Fix User 2005-06-20 11:31:43 UTC

Created attachment 2930 [details] [review]
Test case and fix, ready for CVS.

Comment 2 FreeDesktop Bugzilla Database Corruption Fix User 2005-06-20 11:32:01 UTC

Created attachment 2931 [details]
clip-nesting-image-out.png

Comment 3 FreeDesktop Bugzilla Database Corruption Fix User 2005-06-20 11:34:00 UTC

I have this problem on my SuSE 9.3 desktop and my gentoo notebook, two quite 
different systems. 
 
I run nx and vnc on demand.

Comment 4 FreeDesktop Bugzilla Database Corruption Fix User 2005-06-24 11:44:33 UTC

Well, the machine freezes when I try to attach to the X process with gdb (so  
does strace -p) :(

Comment 5 FreeDesktop Bugzilla Database Corruption Fix User 2005-06-25 11:42:35 UTC

gdb backtrace in such situations: 
 
#0  0xffffe410 in ?? ()  
#1  0xbffff688 in ?? ()  
#2  0x00000000 in ?? ()  
#3  0x08216440 in AnyClientsWriteBlocked ()  
#4  0xb7f0df9d in ___newselect_nocancel () from /lib/tls/libc.so.6  
#5  0x080e477e in WaitForSomething ()  
#6  0x080c5c15 in Dispatch ()  
#7  0x080d29d6 in main ()

Comment 6 FreeDesktop Bugzilla Database Corruption Fix User 2005-07-23 19:50:31 UTC

I have relatively high X load right now however fat from the usual 70+%:  
 
CPU: CPU with timer interrupt, speed 0 MHz (estimated) 
Profiling through timer interrupt 
Processes with a thread ID of 14147 
Processes with a thread ID of 14558 
Processes with a thread ID of all 
samples  %        samples  %        samples  %        image name               
symbol name 
175      11.8403  68        1.1210  0              0  Xorg                     
XYToWindow 
111       7.5101  579       9.5450  0              0  libc-2.3.5.so            
(no symbols) 
83        5.6157  0              0  0              0  vmlinux                  
unix_poll 
56        3.7889  0              0  0              0  vmlinux                  
do_select 
54        3.6536  0              0  0              0  vmlinux                  
fget 
53        3.5859  0              0  0              0  vmlinux                  
sock_poll 
39        2.6387  0              0  0              0  vmlinux                  
__copy_to_user_ll 
35        2.3681  0              0  0              0  vmlinux                  
add_wait_queue 
34        2.3004  0              0  0              0  vmlinux                  
schedule 
33        2.2327  127       2.0936  16        0.8408  Xorg                     
StandardReadRequestFromClient 
31        2.0974  89        1.4672  0              0  Xorg                     
SecurityLookupIDByType 
30        2.0298  156       2.5717  0              0  Xorg                     
SecurityLookupIDByClass 
28        1.8945  0              0  0              0  vmlinux                  
remove_wait_queue 
22        1.4885  0              0  0              0  vmlinux                  
buffered_rmqueue 
21        1.4208  101       1.6650  29        1.5239  Xorg                     
WaitForSomething 
20        1.3532  104       1.7145  9         0.4729  Xorg                     
FlushClientCaches

Comment 7 FreeDesktop Bugzilla Database Corruption Fix User 2005-07-23 19:51:53 UTC

X just reached the 85+ right now. oprofile look quite the same however: 
 
CPU: CPU with timer interrupt, speed 0 MHz (estimated) 
Profiling through timer interrupt 
Processes with a thread ID of 14147 
Processes with a thread ID of 14558 
Processes with a thread ID of all 
samples  %        samples  %        samples  %        image name               
symbol name 
175      11.8403  191       1.8374  0              0  Xorg                     
XYToWindow 
111       7.5101  911       8.7638  0              0  libc-2.3.5.so            
(no symbols) 
83        5.6157  0              0  0              0  vmlinux                  
unix_poll 
56        3.7889  0              0  0              0  vmlinux                  
do_select 
54        3.6536  0              0  0              0  vmlinux                  
fget 
53        3.5859  0              0  0              0  vmlinux                  
sock_poll 
39        2.6387  0              0  0              0  vmlinux                  
__copy_to_user_ll 
35        2.3681  0              0  0              0  vmlinux                  
add_wait_queue 
34        2.3004  0              0  0              0  vmlinux                  
schedule 
33        2.2327  214       2.0587  16        0.8408  Xorg                     
StandardReadRequestFromClient 
31        2.0974  158       1.5200  0              0  Xorg                     
SecurityLookupIDByType 
30        2.0298  232       2.2318  0              0  Xorg                     
SecurityLookupIDByClass 
28        1.8945  0              0  0              0  vmlinux                  
remove_wait_queue 
22        1.4885  0              0  0              0  vmlinux                  
buffered_rmqueue 
21        1.4208  161       1.5488  29        1.5239  Xorg                     
WaitForSomething 
20        1.3532  146       1.4045  9         0.4729  Xorg                     
FlushClientCaches

Comment 8 darquan2000 2005-08-06 18:27:42 UTC

I seem to have the same problem.

I can reproduce it by running prboom for a while, sometimes it occurs only after
a few hours. I'm running it inside a window, haven't tested fullscreen.

The problem starts by X nothing responding responding anymore. I can continue to
move the mouse cursor for a short while, then that also freezes. The music of
prboom continues.

gdb shows me a long backtrace with a lot of question marks, and it also involves
gettimeofday() and AnyClientsWriteBlocked().

If there is any specific information I can provide to help solve this problem,
please let me know.

By the way, any idea what causes the weird addresses like address 0x00000000 and
0xffffe410 in the gdb backtrace? I see addresses like these, too.

Comment 9 FreeDesktop Bugzilla Database Corruption Fix User 2005-08-06 20:52:34 UTC

I didn't have had the problem for the last 2 weeks. I turned off kcompmgr (and 
use xcompmgr),   
don't use kooldock anymore and uninstalled gtk-qt-engine. 
   
I don't know which of the three might have triggered the problem or if it even   
was a synergetic effect. As I do have the problem on my private machines which   
I run X only for a few hours in the evening on it's hard to make a reliable  
statement here.

Comment 10 darquan2000 2005-08-07 07:51:02 UTC

It's a bit early to tell, but I don't seem to be able to reproduce the problem
after having added the line

    Option      "NvAgp"             "0"

to the Device section of the X config file.

Section "Device"
    Identifier  "GeForce 6800"
    Driver      "nvidia"
    Option      "NoLogo"            "True"
    Option      "CursorShadow"      "True"
->  Option      "NvAgp"             "0"
EndSection

If I understand correctly, this disables AGP support.

I'll keep trying for a while longer, and I will experiment with values "1", "2",
and "3" ("3" is the default according to
/usr/share/doc/nvidia-kernel-1.0.6629-r4/README.gz on my system).

If this does solve the problem, I wonder what actually causes the problem: the
nvidia kernel module or an nvidia library, or the X server, it could even be
bios-related or hardware-related.

There seem to be some more bug reports that could be related to this one, just
search for "cpu" or for "100" (as in "100%").

Comment 11 FreeDesktop Bugzilla Database Corruption Fix User 2005-08-07 20:07:19 UTC

I'll try that, too. However I also have this problem on a ATI X600 mobile 
machine.

Comment 12 FreeDesktop Bugzilla Database Corruption Fix User 2005-08-08 03:03:33 UTC

kompmgr seems to be the problem or at least part of it. I had a X cpu load of 
70% some time after re-activating kompmgr which instantly dropped after I 
switched kompmgr off.

Comment 13 FreeDesktop Bugzilla Database Corruption Fix User 2005-08-08 03:06:54 UTC

I filled a KDE bug report: 
 
https://bugs.kde.org/show_bug.cgi?id=110358

Comment 14 FreeDesktop Bugzilla Database Corruption Fix User 2005-08-08 04:51:41 UTC

I just had the problem on my ATI-based notebook w/o (!) kompmgr. The problem 
was apparently triggered by kooldoock. That is, after pkill -9 kooldock the 
CPU load instantly dropped to a normal level.

Comment 15 darquan2000 2005-08-08 05:58:25 UTC

Apparently the problem on my system has a different cause, leading to similar
symptoms.

The main problem is that the system becomes completely unusable.

I think that the X server should keep responding to (some) user events, whatever
happens. No process should be able to make the system completely unresponsive.
The user should always get a chance to kill a runaway process. Key combinations
like CTRL-ALT-BACKSPACE and CTRL-ALT-F1 should keep working so that there is an
opportunity to rescue the system without having to resort to the reset button.

A warning message with the name and process id of a process that generates
events at a high rate in the X server log file or a configurable event rate
limiter could also be very helpful.

Comment 16 FreeDesktop Bugzilla Database Corruption Fix User 2005-08-12 06:04:43 UTC

Again, 80+%. Without any cause. sysprof also doesn't reveal more details (in 
file /usr/X11R6/bin/Xorg)...

Comment 17 FreeDesktop Bugzilla Database Corruption Fix User 2005-08-13 04:15:52 UTC

Also, I noticed that X consumes a lot of memory after some hours (i.e. when I 
let the machine run day and night). Currently X allocates about half a GB RAM. 
 
I encountered heavy RAM consumpten earlier *in conjunction* with CPU load.

Comment 18 Alan Coopersmith 2005-10-04 00:32:18 UTC

Original bug reporter e-mail and comments lost in bugzilla disk death.
xorg-team archives show:

           Summary: X "sporadically" heavily loading CPU
           Product: xorg
           Version: 6.8.2
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P1
         Component: Server/general
        AssignedTo: xorg-team at lists.x.org
        ReportedBy: freedesktop at nitwit.de


This bug report derives from http://bugs.gentoo.org/show_bug.cgi?id=96538: 
 
Without identifiable cause X does sometimes (well, too often to work with the  
system) heavily load the CPU. strace -p freezes the machine most of the time.  
I was able to provide a log with shows an endless gettimeofday loop.  
 
gettimeofday({1119210368, 403147}, NULL) = 0   
read(26, "5\30\4\0002\301\252\2F\0\0\0 \0 \0>\377\7\0\313\306\273"..., 4120) =   
4120   
read(26, "\377\377\377\377\377\377\377\377\250\250\250\377\375\375"..., 260) =   
260   
read(26, "5\1\4\0006\301\252\2F\0\0\0 \0 \0007\377\4\0007\301\252"..., 4120) =   
1416   
writev(26, [{"\16\211m\"4\301\252\2\0\0>\r\0\0\0\0\0\0\0\0 \0\0\0 \0"..., 64},   
{"\377\377\377\377\241\241\241\377\317\317\317   
\377\317\317"..., 4096}], 2) = 4160   
read(26, 0xd935640, 4120)               = -1 EAGAIN (Resource temporarily   
unavailable)   
gettimeofday({1119210368, 404304}, NULL) = 0   
select(256, [1 3 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27   
28 29 30 31 32 33 34 35 38], NULL, NULL, {1344,   
469000}) = 1 (in [26], left {1344, 469000})   
gettimeofday({1119210368, 404716}, NULL) = 0   
read(26, "I\2\5\0:\304\273\2\0\0\0\0 \0 \0\377\377\377\377", 4120) = 20   
read(26, 0xd935640, 4120)               = -1 EAGAIN (Resource temporarily   
unavailable)   
writev(26, [{"\1\10\203\"\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 at V\223"...,   
1056}], 1) = 1056   
gettimeofday({1119210368, 405298}, NULL) = 0   
select(256, [1 3 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27   
28 29 30 31 32 33 34 35 38], NULL, NULL, {1344,   
468000}) = 1 (in [26], left {1344, 468000})   
gettimeofday({1119210368, 405831}, NULL) = 0   
read(26, "5\30\4\0<\301\252\2F\0\0\0 \0 \0>\377\7\0007\304\273\2"..., 4120) =   
4120   
read(26, "\377\377\377\377\377\377\377\377\250\250\250\377\375\375"..., 260) =   
260   
read(26, "5\1\4\0@\301\252\2F\0\0\0 \0 \0007\377\4\0A\301\252\2@"..., 4120) =   
1416   
writev(26, [{"\16A\217\">\301\252\2\0\0>\t\0\0\0\0\0\0\0\0 \0\0\0 \0"..., 64},   
{"\377\377\377\377\377\377\377\377\317\317\317   
\377\317\317"..., 4096}], 2) = 4160   
read(26, 0xd935640, 4120)               = -1 EAGAIN (Resource temporarily   
unavailable)   
gettimeofday({1119210368, 407633}, NULL) = 0   
select(256, [1 3 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27   
28 29 30 31 32 33 34 35 38], NULL, NULL, {1344,   
466000}) = 1 (in [19], left {1344, 466000})          
     
------- Additional Comments From freedesktop at nitwit.de  2005-06-20 11:31 -------
Created an attachment (id=2930)
 --> (https://bugs.freedesktop.org/attachment.cgi?id=2930&action=view)
strace X log

------- Additional Comments From freedesktop at nitwit.de  2005-06-20 11:32 -------
Created an attachment (id=2931)
 --> (https://bugs.freedesktop.org/attachment.cgi?id=2931&action=view)
strace -p X log

Comment 19 Daniel Stone 2007-02-27 01:27:03 UTC

Sorry about the phenomenal bug spam, guys.  Adding xorg-team@ to the QA contact so bugs don't get lost in future.

Comment 20 Daniel Stone 2007-04-08 03:47:24 UTC

very mysterious old bug, and reporter's account was lost in disk death.  -> closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.