Summary: | Xorg Xprt starts to consume 100% when being idle | ||
---|---|---|---|
Product: | xprint | Reporter: | Roland Mainz <roland.mainz> |
Component: | Server: Other | Assignee: | Roland Mainz <roland.mainz> |
Status: | RESOLVED WONTFIX | QA Contact: | |
Severity: | blocker | ||
Priority: | high | CC: | alan.coopersmith, dberkholz, eich, jay.hobson, kem, matthieu.herrb, MostAwesomedude, roland.mainz, stuart.kreitman |
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | 14771 | ||
Bug Blocks: | |||
Attachments: |
Description
Roland Mainz
2004-08-01 12:32:07 UTC
Is it possible that the recent XEVI, DAMAGE, XFIXES etc. checkins may have caused this (e.g. did the code add any timers or triggers signals etc.) ? Created attachment 553 [details]
truss -u :: output on Solaris 8
I've attached Solaris's "truss" after the last client exists ("xdpyinfo" in the
example above) but before the 100% CPU usage started.
truss shows a tight loop of the following instructions:
-- snip --
-> libc:select(0xff, 0x7ffc14, 0x0, 0x0)
-> libc:.mul(0x0, 0x10624dd3, 0x100, 0x0)
<- libc:.mul() = 0
-> libc:.mul(0x0, 0x10624dd3, 0x0, 0x0)
<- libc:.mul() = 0
-> libc:_poll(0xffbedda8, 0x2, 0x0, 0x0)
poll(0xFFBEDDA8, 2, 0) = 0
<- libc:_poll() = 0
<- libc:select() = 0
-> libc_psr:memset(0xffbee7b8, 0x0, 0x80, 0x0)
<- libc_psr:memset() = 0xffbee7b8
-> libc:gettimeofday(0xffbee5b4, 0x0, 0x0, 0x0)
<- libc:gettimeofday() = 0
-- snip --
And I have no clue what is going on... ;-(
Created attachment 592 [details]
BZip2-compressed "truss -d -u a.out -o xprt_amok_003.log ./Xprt ..." output (part1)
Part 1 of Solaris 8 "truss" output of
-- snip --
% (export MESA_DEBUG=1 LD_LIBRARY_PATH=$PWD/../../exports/lib/ ; export
XPCONFIGDIR=/tmp/xptestinstall007/usr/X11R6/lib/X11/xserver ; truss -d -u a.out
-o xprt_amok_003.log ./Xprt -ac -audit 4 -core -fp
/usr/openwin/lib/X11/fonts/F3/,/usr/openwin/lib/X11/fonts/F3bitmaps/,/usr/openwin/lib/X11/fonts/Type1/,/usr/openwin/lib/X11/fonts/Speedo/,/usr/openwin/lib/X11/fonts/misc/,/usr/openwin/lib/X11/fonts/75dpi/,/usr/openwin/lib/X11/fonts/100dpi/
:40)
-- snip --
This log includes _ALL_ function calls within the Xserver, including startup,
xdpyinfo connect/disconnect and the endless loop (until CTRL-C hit "truss")
Note:
Timestamp 372.0989 is the end of the xdpyinfo connection, after that Xprt waits
exactly 2mins and then goes amok... ;-(
Created attachment 593 [details]
BZip2-compressed "truss -d -u a.out -o xprt_amok_003.log ./Xprt ..." output (part2)
Part 2 of Solaris 8 "truss" output of
-- snip --
% (export MESA_DEBUG=1 LD_LIBRARY_PATH=$PWD/../../exports/lib/ ; export
XPCONFIGDIR=/tmp/xptestinstall007/usr/X11R6/lib/X11/xserver ; truss -d -u a.out
-o xprt_amok_003.log ./Xprt -ac -audit 4 -core -fp
/usr/openwin/lib/X11/fonts/F3/,/usr/openwin/lib/X11/fonts/F3bitmaps/,/usr/openwin/lib/X11/fonts/Type1/,/usr/openwin/lib/X11/fonts/Speedo/,/usr/openwin/lib/X11/fonts/misc/,/usr/openwin/lib/X11/fonts/75dpi/,/usr/openwin/lib/X11/fonts/100dpi/
:40)
-- snip --
This log includes _ALL_ function calls within the Xserver, including startup,
xdpyinfo connect/disconnect and the endless loop (until CTRL-C hit "truss")
Note:
Timestamp 372.0989 is the end of the xdpyinfo connection, after that Xprt waits
exactly 2mins and then goes amok... ;-(
Some more debugging output: -- snip -- % (export MESA_DEBUG=1 LD_LIBRARY_PATH=$PWD/../../exports/lib/ ; export XPCONFIGDIR=/tmp/xptestinstall007/usr/X11R6/lib/X11/xserver ; dbx -r ./Xprt -ac -audit 4 -core -fp /usr/openwin/lib/X11/fonts/F3/,/usr/openwin/lib/X11/fonts/F3bitmaps/,/usr/openwin/lib/X11/fonts/Type1/,/usr/openwin/lib/X11/fonts/Speedo/,/usr/openwin/lib/X11/fonts/misc/,/usr/openwin/lib/X11/fonts/75dpi/,/usr/openwin/lib/X11/fonts/100dpi/ :40) Running: Xprt -ac -audit 4 -core -fp /usr/openwin/lib/X11/fonts/F3/,/usr/openwin/lib/X11/fonts/F3bitmaps/,/usr/openwin/lib/X11/fonts/Type1/,/usr/openwin/lib/X11/fonts/Speedo/,/usr/openwin/lib/X11/fonts/misc/,/usr/openwin/lib/X11/fonts/75dpi/,/usr/openwin/lib/X11/fonts/100dpi/ :40 (process id 9489) Couldn't open RGB_DB '/usr/X11R6/lib/X11/rgb' _XSERVTransmkdir: ERROR: Mode of /tmp/.X11-unix must be set to 1777 _XSERVTransSocketUNIXCreateListener: mkdir(/tmp/.X11-unix) failed, errno = 1 _XSERVTransMakeAllCOTSServerListeners: failed to create listener for unix _XSERVTransmkdir: ERROR: Mode of /tmp/.X11-pipe must be set to 1777 _XSERVTransNAMEDOpenServer: mkdir(/tmp/.X11-pipe) failed, errno = 1 _XSERVTransOpen: transport open failed for named/north:40 _XSERVTransMakeAllCOTSServerListeners: failed to open listener for named Usage: lpget [-k key] [list|(printer) ...] error opening security policy file /usr/X11R6/lib/X11/xserver/SecurityPolicy Could not init font path element /usr/openwin/lib/X11/fonts/F3/, removing from list! Could not init font path element /usr/openwin/lib/X11/fonts/Type1/, removing from list! Could not init font path element /usr/openwin/lib/X11/fonts/Speedo/, removing from list! AUDIT: Tue Aug 10 21:55:08 2004: 9489 Xprt: client 1 connected from IP 127.0.0.1 AUDIT: Tue Aug 10 21:55:08 2004: 9489 Xprt: client 1 disconnected ^Cdbx: warning: Interrupt ignored but forwarded to child. dbx: no symbol table loaded yet -- use "prog -readsyms" to load them signal INT (Interrupt) at 0xe392c at 0xe392c 0x00000000000e392c: sra %l0, 0x1f, %g1 Entering debugger ... (dbx) where dbx: no symbol table loaded yet -- use "prog -readsyms" to load them (dbx) prog -readsyms Reading Xprt Reading ld.so.1 Reading libc.so.1 Reading libdl.so.1 Reading libc_psr.so.1 Reading libsocket.so.1 Reading libnsl.so.1 Reading libmp.so.2 Reading libm.so.1 (dbx) where =>[1] WaitForSomething(pClientsReady = 0xffbee800), line 193 in "WaitFor.c" [2] Dispatch(), line 384 in "dispatch.c" [3] main(argc = 8, argv = 0xffbeecec, envp = 0xffbeed10), line 442 in "main.c" (dbx) print timers timers = 0x8b5a70 (dbx) print wt wt = (nil) (dbx) print timers->expires timers->expires = 1246135329U (dbx) print now now = 1246156936U (dbx) print *timers *timers = { next = (nil) expires = 1246135329U callback = 0xf22f8 = &`Xprt`log.c`AuditFlush(OsTimerPtr timer, CARD32 now, pointer arg) arg = (nil) } (dbx) cont ^Cdbx: warning: Interrupt ignored but forwarded to child. signal INT (Interrupt) in _select at 0xff2cd334 0xff2cd334: _select+0x0148: orcc %o0, %o1, %o3 Current function is WaitForSomething 223 i = Select (MaxClients, &LastSelectMask, NULL, NULL, wt); (dbx) print *timers *timers = { next = (nil) expires = 1246135329U callback = 0xf22f8 = &`Xprt`log.c`AuditFlush(OsTimerPtr timer, CARD32 now, pointer arg) arg = (nil) } (dbx) print timers timers = 0x8b5a70 -- snip -- Somehow I think the audit code may be responsible for this mess (the timer ptr is always the same (and always the same callback) and the timestamp between timer "alarm time" and "now" is always negative) ... And the mysterious 120sec/2min timeout comes from |AUDIT_TIMEOUT| -- snip -- % tgrep AUDIT_TIMEOUT tgrep: Can't stat file/dir ./Xprint/Quarks.c, No such file or directory. Ignored. ./os/log.c :#ifndef AUDIT_TIMEOUT ./os/log.c :#define AUDIT_TIMEOUT ((CARD32)(120 * 1000)) /* 2 mn */ ./os/log.c : return AUDIT_TIMEOUT; ./os/log.c : auditTimer = TimerSet(auditTimer, 0, AUDIT_TIMEOUT, AuditFlush, NULL); -- snip -- Running Xprt without "-audit 4" works around the problem, however then the complaints that "... access to Xprt isn't logged ..." are back... ;-( Is there anyone who is familar with the audit code ? Created attachment 598 [details] [review] Possible workaound patch for 2004-08-11-trunk The patch "wallpapers" over the issue via removing the audit timer when there are no clients left. But I think this bug report is just a symptom of a FAR more serious issue with the timer code since this is at least the second problem with timers (bug 567 ("Xorg Xprt starts to consume 100% CPU when being idle for some time") was the previous one). I really think someone with better knowledge of the timer code should investigate that issue. Comment on attachment 598 [details] [review] Possible workaound patch for 2004-08-11-trunk Kevin: Can you do a quick review of the workaround patch, please ? Comment on attachment 598 [details] [review] Possible workaound patch for 2004-08-11-trunk Actually the "wallpaper" patch doesn't work in all cases. If any audit event occurs _after_ the last client left the Xserver the 100%-CPU consumption will start again. An example are the "cookie expiration" messages from the SECURITY extension. Created attachment 599 [details] [review] Workaround patch for 2004-08-11-trunk which restores X11R6.6 behaviour The patch restores the old audit logging behaviour of X11R6.6 for now until a better solution is found to cure this nightmare (the alternative is to hack the timer code - which is something which I'd like to avoid so close to the code freeze). (In reply to comment #10) > Created an attachment (id=599) > Workaround patch for 2004-08-11-trunk which restores X11R6.6 behaviour > > The patch restores the old audit logging behaviour of X11R6.6 for now until a > better solution is found to cure this nightmare (the alternative is to hack the > timer code - which is something which I'd like to avoid so close to the code > freeze). This patch needs to be reviewed by the person(s) whose code you are reverting in order to understand why the change was made originally and see if reverting the code will have any other unintended consequences. Kevin E. Martin wrote: > This patch needs to be reviewed by the person(s) whose code you are reverting Any idea who wrote the original code ? CVSblame on the Xorg repository isn't much helpfull. > in order to understand why the change was made originally and see if reverting > the code will have any other unintended consequences. The only consequence is that identical log messages are no longer be fold together (one message with the original text and one message that n-1 copies have been supressed). The only side-effect is that log files _may_ be smaller since no duplicates are stored anymore. Okay, I checked the XFree86 logs and found that the change was made by Matthieu Herrb. Adding him to the CC list and would like to see if he could comment on this one. This patch was added to XFree86 to avoid filling the partition holding /var/log to quickly with audit messages in XFree86.0.log when someone tries to brute-force the magic cookie. I'm currently away from my hacking machines, so I can't look at this issue until I come back (around aug, 18). On the release wranglers call, we decided to go ahead and apply the patch, which I have done. We will leave this bug open in the hope that others will be able to help figure out the root cause and suggest alternate solutions before the release. Moving to "held open" bug since this is no longer a blocker this must not be particularly urgent for anyone, since no one has looked at it since 6.8.2. moving out to 7.1. (In reply to comment #17) > this must not be particularly urgent for anyone, since no one has looked at it > since 6.8.2. moving out to 7.1. ... and again to 7.2. For 7.2 I'm probably going to flip this back to coalescing the log file entries, and if that tickles a bug in Xprt, so be it, Xprt needs fixing properly. shunting off the radar, as xprint is now unmaintained. Closing WONTFIX because nobody cares about Xprint. Reopen if you plan to address this bug. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.