Ok, this is a problem that is really annoying me. Unfortunately it is not easily reproducible - the system can run stable for several days (using openoffice) and then xorg will freeze with 100% cpu. I will list the steps to reproduce and some known facts i have collected from other users as well. How to reproduce (not always reproducible): 1. Click on any openoffice menu. 2. Keyboard will freeze. 3. Mouse will keep moving in a jagged way - clicking doens't work though. 4. Xorg shows 100% on top. Known facts: - The problem has been reported at least by 6 people: me, David Liontooth, Joachim Müller, Syd Alsobrook, Jim Watson and Marcelo Roberto (friend of mine). see: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=411287 - It was reported under debian, opensuse and fedora. - Reported with openoffice from 2.0.x to 2.1. - Reported for any openoffice application (confirmed: writer, impress) - Reported for any openoffice menu (usually on "file" but today my computer hung on "view". - Confirmed with xorg 7.2 and 1.2.99.902 (1.3.0 RC 2) - At least 2 people were using nvidia binary driver (me and my friend). But another user reported it for Sunblade100. I think i will try using "nv" driver for a few days so i can rule the nvidia binary out (despite of the sunblade report). Any suggestions are welcomed...
Forgot to mention: - /var/log/Xorg.0.log doesn't show anything special when it freezes. - I have seen an issue with the mga driver (completely unrelated) where a forced "swsusp" effectively unfreezes the machine after resume. this is NOT the case here: machine is still frozen after suspend/resume. I don't know if these are useful, i'm just trying to provide all information i have.
*** Bug 10633 has been marked as a duplicate of this bug. ***
I also have this happen with the mga driver. I think it only started when I upgraded from kernel 2.6.17 to 2.6.19... I don't know how to force a swsusp, but if someone told me, I'd be happy to try it...
I have dropped back to the 2.6.17 kernel to see if that kernel also exhibits this behavior. With what kernels are others seeing this behavior?
> With what kernels are others seeing this behavior? 2.6.18.8 here. note i'm also using X86_64 SMP. - bugs.debian.org #411287 was also originally reported as x86_64 with 2.6.18 but it doesn't say anything about SMP. - qa.openoffice.org #75578 reports 2.6.19-1.2911.6.5.fc6xen #1 SMP kernel _BUT_ running in a single processor machine (athlon xp - 32 bits). - qa.openoffice.org #75578 also reports the problem with Linux sun 2.6.18-4-sparc64 (apparently single processor machine) - bug has being confirmed so far for the following drivers: nvidia (binary), matrox mga and ati rage. xorg devels willing to investigate this bug might consider checking the url below, it contains some interesting information. http://qa.openoffice.org/issues/show_bug.cgi?id=75578
Yes, I forgot to add that I'm running x86_64 SMP kernel on a dual-opteron.
I have found that this bug also affects the 2.6.17 kernel...
A couple more observations: I run Xinerama. When I have the OOo window on the left screen (Screen0), the screen locks up, but the pointer will still move within the left screen. The pointer will not leave the left screen, but the motion will. So if I have moved the mouse about 1/2 screen into Screen1, the pointer won't move but I have to bring it that far back to the left before it will move again on Screen0. When OOo is on the right screen (Screen1), the pointer freezes and never moves at all. After a lockup, there are more lines in Xorg.0.log than after starting it up. The lines after a lockup are: xkb_types { include "%" }; xkb_compatibility { include "%" }; xkb_symbols { include "%" }; xkb_geometry { include "%" }; (EE) Error loading keymap /var/tmp/server-0.xkm That last one might be useful to someone who knows more than I. There is no xerver-0.xkm in /var/tmp. I cannot find a file (with locate) with ".xkm" in it anywhere on my system...
michael, it sounds like you're running a rather old xorg release?
I am running xorg 7.1, xorg-server 1.1.1-r4, and mga driver 1.4.2 (I upgraded that to 1.4.6.1, but I still had the problem). I could easily upgrade to xorg 7.2, xorg-server 1.2.0, and mga driver 1.4.6.1, though they are not yet marked stable in gentoo portage. I'm willing to do that for testing if it might be thought to help.
I got something - not sure if it is valid - the office started briefly than hanged again while i did this. I got the same again later with another locked process. Loaded symbols for /opt/o208/program/libsrtrs1.so 0xf68c40bc in poll () from /lib/libc.so.6 (gdb) bt #0 0xf68c40bc in poll () from /lib/libc.so.6 #1 0xf6cfc72c in _XWaitForReadable (dpy=0x9f5f8) at ../../src/XlibInt.c:498 #2 0xf6cfcbb0 in _XRead (dpy=0xa2c88, data=0xffdcba6c "÷êò\024ÿÜ»\030ÿÜ»\034", size= /build/buildd/gdb-6.6.dfsg/gdb/dwarf2-frame.c:1084: internal-error: Unknown register rule. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) n /build/buildd/gdb-6.6.dfsg/gdb/dwarf2-frame.c:1084: internal-error: Unknown register rule. A problem internal to GDB has been detected, further debugging may prove unreliable. Create a core file of GDB? (y or n) n ) at ../../src/XlibInt.c:1087 #3 0xf6cfd550 in _XReply (dpy=0xa2c88, rep=0xffdcba6c, extra=32, discard=20) at ../../src/XlibInt.c:1714 #4 0xf6d4e894 in XkbGetKeyboardByName (dpy=0xa2c88, deviceSpec=<value optimized out>, names=0x0, want=<value optimized out>, need=<value optimized out>, load=<value optimized out>) at ../../../src/xkb/XKBGetByName.c:136 ---Type <return> to continue, or q <return> to quit--- #5 0xf539eb5c in SalDisplay::GetKeyboardName () from /opt/o208/program/libvclplug_gen680ls.so #6 0xf53927c0 in SalDisplay::GetKeyNameFromKeySym () from /opt/o208/program/libvclplug_gen680ls.so #7 0xf53938e4 in SalDisplay::GetKeyName () from /opt/o208/program/libvclplug_gen680ls.so #8 0xf5b596c0 in ?? () from /opt/o208/program/libvclplug_gtk680ls.so #9 0xf7e62e9c in KeyCode::GetName () from /opt/o208/program/libvcl680ls.so #10 0xf7e67740 in ?? () from /opt/o208/program/libvcl680ls.so #11 0xf7e6e7b4 in ?? () from /opt/o208/program/libvcl680ls.so #12 0xf7e70778 in PopupMenu::Execute () from /opt/o208/program/libvcl680ls.so #13 0xf7f02878 in ?? () from /opt/o208/program/libvcl680ls.so #14 0xf7f02878 in ?? () from /opt/o208/program/libvcl680ls.so Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Ok, here is a good backtrace from this problem from a friend of mine. He has exactly the same hw as I (x86_64, smp, opensuse 10.2, xorg pre-7.2, openoffice 2.0.4, binary nvidia etc). Here is the openoffice bt. Xorg bt will follow. (gdb) bt #0 0xffffe405 in __kernel_vsyscall () #1 0xf6d5dea3 in poll () from /lib/libc.so.6 #2 0xf704b469 in XAddConnectionWatch () from /usr/lib/libX11.so.6 #3 0xf704b84f in _XRead () from /usr/lib/libX11.so.6 #4 0xf704c1c4 in _XReply () from /usr/lib/libX11.so.6 #5 0xf709e5b8 in XkbGetKeyboardByName () from /usr/lib/libX11.so.6 #6 0xf709e97f in XkbGetKeyboard () from /usr/lib/libX11.so.6 #7 0xf57aa49d in SalDisplay::GetKeyboardName () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #8 0xf57a39d7 in SalDisplay::GetKeyNameFromKeySym () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #9 0xf57a3b8e in SalDisplay::GetKeyName () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #10 0xf57793bd in X11SalFrame::GetKeyName () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #11 0xf7db0f4d in KeyCode::GetName () from /usr/lib/ooo-2.0/program/libvcl680li.so #12 0xf7db55c9 in Menu::GetDisplayText () from /usr/lib/ooo-2.0/program/libvcl680li.so #13 0xf7dbc380 in PopupMenu::IsInExecute () from /usr/lib/ooo-2.0/program/libvcl680li.so #14 0xf7dbc853 in PopupMenu::IsInExecute () from /usr/lib/ooo-2.0/program/libvcl680li.so #15 0xf7dbcaa5 in PopupMenu::IsInExecute () from /usr/lib/ooo-2.0/program/libvcl680li.so #16 0xf7dbd10a in PopupMenu::IsInExecute () from /usr/lib/ooo-2.0/program/libvcl680li.so #17 0xf7e02197 in Window::~Window () from /usr/lib/ooo-2.0/program/libvcl680li.so #18 0xf7e03ae8 in Window::~Window () from /usr/lib/ooo-2.0/program/libvcl680li.so #19 0xf7e027b8 in Window::~Window () from /usr/lib/ooo-2.0/program/libvcl680li.so #20 0xf577ec55 in X11SalFrame::GetWindowState () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #21 0xf577b279 in X11SalFrame::HandleMouseEvent () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #22 0xf577e60e in X11SalFrame::Dispatch () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #23 0xf57a6a5a in SalX11Display::Dispatch () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #24 0xf57a5756 in SalX11Display::Yield () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #25 0xf57a48d5 in SalX11Display::IsEvent () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #26 0xf579fefe in SalXLib::Yield () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #27 0xf579fd7d in SalXLib::Yield () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #28 0xf57a7e3f in X11SalInstance::Yield () from /usr/lib/ooo-2.0/program/libvclplug_gen680li.so #29 0xf7c951b5 in Application::Yield () from /usr/lib/ooo-2.0/program/libvcl680li.so #30 0xf7c95251 in Application::Execute () from /usr/lib/ooo-2.0/program/libvcl680li.so #31 0x0806df86 in desktop::Desktop::Main () #32 0xf7c99731 in InitVCL () from /usr/lib/ooo-2.0/program/libvcl680li.so #33 0xf7c99847 in SVMain () from /usr/lib/ooo-2.0/program/libvcl680li.so #34 0x08064a8b in sal_main () #35 0x08064ae0 in main ()
Xorg gurus, please look at this ;-) X Window System Version 7.1.99.902 (7.2.0 RC 2) Release Date: 13 November 2006 X Protocol Version 11, Revision 0, Release 7.1.99.902 Build Operating System: openSUSE SUSE LINUX Current Operating System: Linux genipapo 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 x86_64 Build Date: 09 January 2007 (gdb) bt #0 0x00002b193d2915bd in fork () from /lib64/libc.so.6 #1 0x00000000005528dd in Popen () #2 0x000000000054878b in XkbDDXCompileKeymapByNames () #3 0x0000000000548983 in XkbDDXLoadKeymapByNames () #4 0x0000000000528d66 in ProcXkbGetKbdByName () #5 0x0000000000447e3b in Dispatch () #6 0x00000000004311ed in main ()
(In reply to comment #13) > #1 0x00000000005528dd in Popen () > #2 0x000000000054878b in XkbDDXCompileKeymapByNames () more from gdb session: string used in Popen parameter (buf) is (gdb) x /500s 0x000000000369a0c0 0x369a0c0: "\"/usr/bin/xkbcomp\" -w 1 \"-R/usr/share/X11/xkb\" -xkm \"-\" -em1 \"The XKEYBOARD keymap compiler (xkbcomp) reports:\" -emp \"> \" -eml \"Errors from xkbcomp are not fatal to the X server\" \"/var/lib/xkb/compiled/server-0.xkm\"" it came from the following code in xorg/xkb/ddxLoad.c: buf = Xprintf( "\"%s" PATHSEPARATOR "xkbcomp\" -w %d \"-R%s\" -xkm \"%s\" -em1 %s -emp %s -eml %s \"%s%s.xkm\"", xkbbindir, ((xkbDebugFlags<2)?1:((xkbDebugFlags>10)?10:(int)xkbDebugFlags)), xkbbasedir, xkmfile, PRE_ERROR_MSG,ERROR_PREFIX,POST_ERROR_MSG1, xkm_output_dir,keymap);
just to make it clear the exact point it hung inside popen/fork: 0x00002b193d2915bd in fork () from /lib64/libc.so.6 here is the disass: (...) 0x00002b193d2915af <fork+127>: xor %esi,%esi 0x00002b193d2915b1 <fork+129>: mov $0x1200011,%edi 0x00002b193d2915b6 <fork+134>: mov $0x38,%eax 0x00002b193d2915bb <fork+139>: syscall 0x00002b193d2915bd <fork+141>: cmp $0xfffffffffffff000,%rax 0x00002b193d2915c3 <fork+147>: ja 0x2b193d291720 <fork+496> (...) confused. ip points to instruction just past the syscall, i don't know how can it hang there. --- if i do stepi, i get something that is interesting too: 0x00002b193d2915bd in fork () from /lib64/libc.so.6 (gdb) bt #0 0x00002b193d2915bd in fork () from /lib64/libc.so.6 #1 0x00000000005528dd in Popen () #2 0x000000000054878b in XkbDDXCompileKeymapByNames () #3 0x0000000000548983 in XkbDDXLoadKeymapByNames () #4 0x0000000000528d66 in ProcXkbGetKbdByName () #5 0x0000000000447e3b in Dispatch () #6 0x00000000004311ed in main () (gdb) stepi 0x00000000005525b0 in SmartScheduleInit () (gdb) stepi 0x00000000005525b5 in SmartScheduleInit () (gdb) stepi 0x00000000005525ba in SmartScheduleInit () (gdb) stepi 0x00000000005525be in SmartScheduleInit () (gdb) stepi 0x000000000042fdd0 in __errno_location@plt () (gdb) stepi 0x00002b193d21be10 in __errno_location () from /lib64/libc.so.6 (gdb) stepi 0x00002b193d21be17 in __errno_location () from /lib64/libc.so.6 (gdb) stepi 0x00002b193d21be20 in __errno_location () from /lib64/libc.so.6 (gdb) stepi 0x00000000005525c3 in SmartScheduleInit () (gdb) stepi 0x00000000005525ca in SmartScheduleInit () (gdb) stepi 0x00000000005525cc in SmartScheduleInit () (gdb) stepi 0x00000000005525cf in SmartScheduleInit () (gdb) stepi 0x00000000005525d6 in SmartScheduleInit () (gdb) stepi 0x00000000005525d9 in SmartScheduleInit () (gdb) step Single stepping until exit from function SmartScheduleInit, which has no line number information. 0x00002b193d22e5b0 in __restore_rt () from /lib64/libc.so.6 (gdb) step Single stepping until exit from function __restore_rt, which has no line number information. 0x00002b193d2915bb in fork () from /lib64/libc.so.6 the patient (xorg) died here. i hope you may be able to continue from this... --- btw, forking inside a xorg request to execute an external command sounds terribly dangerous to me... do we really need this?
This is getting interesting. For the first time ever, i have been able to unfreeze my Xorg. here is what i did: # gdb Xorg <pid> (...) Program received signal SIGINT, Interrupt. 0x00002b2e77d6d5bd in fork () from /lib64/libc.so.6 (gdb) disass Dump of assembler code for function fork: 0x00002b2e77d6d530 <fork+0>: push %rbp 0x00002b2e77d6d531 <fork+1>: mov %rsp,%rbp 0x00002b2e77d6d534 <fork+4>: push %r14 0x00002b2e77d6d536 <fork+6>: push %r13 0x00002b2e77d6d538 <fork+8>: push %r12 0x00002b2e77d6d53a <fork+10>: push %rbx 0x00002b2e77d6d53b <fork+11>: sub $0x30,%rsp 0x00002b2e77d6d53f <fork+15>: mov 2807650(%rip),%rcx # 0x2b2e7801aca8 <__fork_handlers> 0x00002b2e77d6d546 <fork+22>: test %rcx,%rcx 0x00002b2e77d6d549 <fork+25>: mov %rcx,%rbx 0x00002b2e77d6d54c <fork+28>: je 0x2b2e77d6d576 <fork+70> 0x00002b2e77d6d54e <fork+30>: mov 0x28(%rcx),%edx 0x00002b2e77d6d551 <fork+33>: test %edx,%edx 0x00002b2e77d6d553 <fork+35>: je 0x2b2e77d6d546 <fork+22> 0x00002b2e77d6d555 <fork+37>: lea 0x1(%rdx),%esi 0x00002b2e77d6d558 <fork+40>: mov %edx,%eax 0x00002b2e77d6d55a <fork+42>: lock cmpxchg %esi,0x28(%rcx) 0x00002b2e77d6d55f <fork+47>: cmp %eax,%edx 0x00002b2e77d6d561 <fork+49>: je 0x2b2e77d6d7c2 <fork+658> 0x00002b2e77d6d567 <fork+55>: mov 2807610(%rip),%rcx # 0x2b2e7801aca8 <__fork_handlers> 0x00002b2e77d6d56e <fork+62>: test %rcx,%rcx 0x00002b2e77d6d571 <fork+65>: mov %rcx,%rbx 0x00002b2e77d6d574 <fork+68>: jne 0x2b2e77d6d54e <fork+30> 0x00002b2e77d6d576 <fork+70>: xor %r12d,%r12d 0x00002b2e77d6d579 <fork+73>: callq 0x2b2e77d44e30 <__GI__IO_list_lock> 0x00002b2e77d6d57e <fork+78>: mov %fs:0x90,%r9d 0x00002b2e77d6d587 <fork+87>: mov %fs:0x94,%r8d 0x00002b2e77d6d590 <fork+96>: mov %r8d,%eax 0x00002b2e77d6d593 <fork+99>: neg %eax 0x00002b2e77d6d595 <fork+101>: mov %eax,%fs:0x94 0x00002b2e77d6d59d <fork+109>: mov %fs:0x10,%r10 0x00002b2e77d6d5a6 <fork+118>: xor %edx,%edx 0x00002b2e77d6d5a8 <fork+120>: add $0x90,%r10 0x00002b2e77d6d5af <fork+127>: xor %esi,%esi 0x00002b2e77d6d5b1 <fork+129>: mov $0x1200011,%edi 0x00002b2e77d6d5b6 <fork+134>: mov $0x38,%eax 0x00002b2e77d6d5bb <fork+139>: syscall 0x00002b2e77d6d5bd <fork+141>: cmp $0xfffffffffffff000,%rax 0x00002b2e77d6d5c3 <fork+147>: ja 0x2b2e77d6d720 <fork+496> 0x00002b2e77d6d5c9 <fork+153>: test %eax,%eax 0x00002b2e77d6d5cb <fork+155>: mov %eax,%r14d note the PC is pointing to <fork+141>, so i'm assuming it must be hanging inside the kernel (syscall). i believe the relevant code from glibc is the following: pid_t __libc_fork (void) { (... stripped ...) _IO_list_lock (); <- note __GI__IO_list_lock in disass above! #ifndef NDEBUG pid_t ppid = THREAD_GETMEM (THREAD_SELF, tid); #endif /* We need to prevent the getpid() code to update the PID field so that, if a signal arrives in the child very early and the signal handler uses getpid(), the value returned is correct. */ pid_t parentpid = THREAD_GETMEM (THREAD_SELF, pid); THREAD_SETMEM (THREAD_SELF, pid, -parentpid); #ifdef ARCH_FORK pid = ARCH_FORK (); #else # error "ARCH_FORK must be defined so that the CLONE_SETTID flag is used" pid = INLINE_SYSCALL (fork, 0); #endif --- i think the syscall in disass above must be either ARCH_FORK or INLINE_SYSCALL. using x86_64's unistd.h and converting eax 0x38 => 56 => __NR_clone. this is funny because __NR_fork is 57 which is what i would expect. i386's unistd.h yields even stranger syscall: __NR_mpx. still, this is not the value currently loaded on rax: (gdb) info reg rax 0xfffffffffffffdff -513 rbx 0x0 0 rcx 0xffffffffffffffff -1 rdx 0x0 0 rsi 0x0 0 rdi 0x1200011 18874385 (...) now the great trick: (gdb) set $rax = 0x38 (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y -> done. xorg is good again. do anybody have any idea on what is going on here?
in case anybody wants to try to reproduce the problem with openoffice, i've just confirmed using gdb breakpoints that it does only call this function on the very first time the menu is drawn. somebody (openoffice?) must be caching the XkbGetKeyboard's result.
bizzare, i can't imagine why fork() is hanging. this will go away when xkbcomp gets merged into the server, but for the meantime, you might want to check that out with a more minimal testcase, say.
I'd like to ask anybody who can reproduce the bug to post the result of the following command: # cat /proc/`pidof Xorg`/status thanks
Created attachment 9757 [details] locked up proc/id/status This lock and attached status report is using the reduced test case provided by cmc at http://www.openoffice.org/issues/show_bug.cgi?id=75578 /* * gcc keyboard.c -lX11 * ./a.out */ #include <X11/Xlib.h> #include <X11/XKBlib.h> #include <stdio.h> int main(void) { XkbDescPtr pXkbDesc = NULL; Display * pDisplay = XOpenDisplay(NULL); pXkbDesc = XkbGetKeyboard(pDisplay, XkbAllComponentsMask, XkbUseCoreKbd ); if (pXkbDesc) { const char* pAtom = NULL; if( pXkbDesc->names->groups[0] ) { pAtom = XGetAtomName( pDisplay, pXkbDesc->names->groups[0] ); printf("Keyboard Name is %s\n", pAtom); XFree( (void*)pAtom ); } XkbFreeKeyboard( pXkbDesc, XkbAllComponentsMask, True ); } XCloseDisplay(pDisplay); return 0; }
Thanks Jim! you have just confirmed my theory: there is a pending signal (actually of thread group type) SIGALRM which never gets served. ShdPnd: 0000000000002000 kernel has some code to abort the execution of the syscall, return to userspace (so signal can be handled) and then reenter the syscall. it seems the mechanism is not working so it just get stuck forever. i posted a message to linux kernel ml asking for advice but i was ignored :( http://www.uwsg.indiana.edu/hypermail/linux/kernel/0704.3/0717.html another guess: the nonsense SigQ value might be a hint of a kernel bug. SigQ: 1/18446744073709551615
ok, cursory investigation reveals that Xorg is suppose to handle SIGALRM by those SmartSchedule* functions that appeared in my earlier gdb session. so it looks that signal is being handled. but all those 18446744073709551615 signals might take a while to get served ;-) TODO: check kernel sources to understand what the second value of SigQ really means (qlim) and how could it have gotten that wrong.
(In reply to comment #22) > ok, cursory investigation reveals that Xorg is suppose to handle SIGALRM by > those SmartSchedule* functions that appeared in my earlier gdb session. Hmm, could this be related to bug 10747?
(In reply to comment #23) > Hmm, could this be related to bug 10747? related, yes. but it looks like a different bug imho. additional research information: 18446744073709551615 = -1 which is supposed to be the RLIMIT_SIGPENDING. having this negative rlimit may cause problem to the __sigqueue_alloc() kernel function. however, as far as i can see, this would possibly prevent new signals from being enqueued - not existing ones from being dequeued/cleared/whatever. what a tortuous history for a bug... starts like an openoffice issue, then xorg and in the end it is possible that none of them are actually faulty.
(In reply to comment #24) > (In reply to comment #23) > > Hmm, could this be related to bug 10747? > > related, yes. but it looks like a different bug imho. i take back my words: now i think it IS the same bug. i finally had a better chance of debugging it in my own system. first observation: RLIMIT_SIGPENDING is a false track. here it is not negative. SigQ: 1/16381 SigPnd: 0000000000000000 ShdPnd: 0000000000002000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 0000000051806ecb by stepping into SmartScheduleTimer i was able to confirm the condition described in 10747 (where SmartScheduleIdle is FALSE): (gdb) stepi 0x00000000005658f0 in SmartScheduleTimer () Dump of assembler code for function SmartScheduleTimer: 0x00000000005658f0 <SmartScheduleTimer+0>: mov %rbx,0xfffffffffffffff0(%rsp) 0x00000000005658f5 <SmartScheduleTimer+5>: mov %rbp,0xfffffffffffffff8(%rsp) 0x00000000005658fa <SmartScheduleTimer+10>: sub $0x18,%rsp 0x00000000005658fe <SmartScheduleTimer+14>: callq 0x431d78 <__errno_location@plt> 0x0000000000565903 <SmartScheduleTimer+19>: mov 2408582(%rip),%rdx # 0x7b1990 <_DYNAMIC+4192> 0x000000000056590a <SmartScheduleTimer+26>: mov (%rax),%ebp 0x000000000056590c <SmartScheduleTimer+28>: mov %rax,%rbx 0x000000000056590f <SmartScheduleTimer+31>: mov 2409906(%rip),%rax # 0x7b1ec8 <_DYNAMIC+5528> 0x0000000000565916 <SmartScheduleTimer+38>: mov (%rax),%rax 0x0000000000565919 <SmartScheduleTimer+41>: add %rax,(%rdx) 0x000000000056591c <SmartScheduleTimer+44>: mov 2405597(%rip),%rax # 0x7b0e00 <_DYNAMIC+1232> 0x0000000000565923 <SmartScheduleTimer+51>: mov (%rax),%esi 0x0000000000565925 <SmartScheduleTimer+53>: test %esi,%esi 0x0000000000565927 <SmartScheduleTimer+55>: je 0x56592e <SmartScheduleTimer+62> 0x0000000000565929 <SmartScheduleTimer+57>: callq 0x5657d0 <SmartScheduleStopTimer> 0x000000000056592e <SmartScheduleTimer+62>: mov %ebp,(%rbx) 0x0000000000565930 <SmartScheduleTimer+64>: mov 0x8(%rsp),%rbx 0x0000000000565935 <SmartScheduleTimer+69>: mov 0x10(%rsp),%rbp 0x000000000056593a <SmartScheduleTimer+74>: add $0x18,%rsp 0x000000000056593e <SmartScheduleTimer+78>: retq 0x000000000056593f <SmartScheduleTimer+79>: nop (gdb) stepi 0x0000000000565927 in SmartScheduleTimer () (gdb) stepi 0x000000000056592e in SmartScheduleTimer () note the jump from 565927 to 56592e requires SmartScheduleIdle being false. by forcing it to true, i was able to unlock my system: (gdb) set *(int *)$rax = 1 (gdb) p *(int *)$rax $3 = 1 (gdb) quit
In reply to comment #25 > i take back my words: now i think it IS the same bug. But new comments #2 on bug 10747 imply it is not related, or are they mistaken?
(In reply to comment #17) > in case anybody wants to try to reproduce the problem with openoffice, i've > just confirmed using gdb breakpoints that it does only call this function on > the very first time the menu is drawn. somebody (openoffice?) must be caching > the XkbGetKeyboard's result. > This fits with my experience. I have noticed the problem in OpenOffice.org Calc - but only the first time. AMD 64, SuSE 10.2, X.org nv It happens quite often...
This problem has gone away in GNU/Linux SPARC Debian/unstable. The reduced test case at comment #20 does not lock any more. And a patch is reported here:http://cvs.fedora.redhat.com/viewcvs/devel/xorg-x11-server/xserver-1.3.0-xkb-and-loathing.patch?view=markup
See also Novell Bug #245711.
(In reply to comment #20) The test case crashes my X session, but did a few experiment and found the following: commenting out Option "XkbModel" "latitude" in xorg.conf eliminates the problem, the test case runs without error, prints "Keyboard Name is Hungary". If I set the XkbModel back to latitude, it crashes again. X.Org X Server 1.4.0.90 Release Date: 5 September 2007 X Protocol Version 11, Revision 0 Build Operating System: Linux Ubuntu (xorg-server 2:1.4.1~git20080131-1ubuntu6) Current Operating System: Linux tibnote 2.6.24-12-generic #1 SMP Wed Mar 12 23:01:54 UTC 2008 i686
(In reply to comment #30) > (In reply to comment #20) > > The test case crashes my X session, but did a few experiment and found the > following: > commenting out Option "XkbModel" "latitude" in xorg.conf eliminates the > problem, the test case runs without error, prints "Keyboard Name is Hungary". > If I set the XkbModel back to latitude, it crashes again. > > X.Org X Server 1.4.0.90 > Release Date: 5 September 2007 > X Protocol Version 11, Revision 0 > Build Operating System: Linux Ubuntu (xorg-server 2:1.4.1~git20080131-1ubuntu6) > Current Operating System: Linux tibnote 2.6.24-12-generic #1 SMP Wed Mar 12 > 23:01:54 UTC 2008 i686 > Gemes: Are you able to try a current git master X server? I just tried the test program with latitude and without and could not reproduce the crash.
(In reply to comment #31) > (In reply to comment #30) > > (In reply to comment #20) > > > Gemes: > Are you able to try a current git master X server? I just tried the test > program with latitude and without and could not reproduce the crash. > Sorry, but I cannot reproduce it now with the current up-to-date ubuntu hardy either, and OO is currently working fine as well: "Keyboard Name is Hungary - Standard" Anyway I did not try the git master as a matter of fact I don't know how to, but this seems to be unnecessary - at least for me. Tib
(In reply to comment #32) > Sorry, but I cannot reproduce it now with the current up-to-date ubuntu hardy > either, and OO is currently working fine as well: "Keyboard Name is Hungary - > Standard" > > > Anyway I did not try the git master as a matter of fact I don't know how to, > but this seems to be unnecessary - at least for me. good enough for me. Hardy has xserver 1.4, so we assume it has been fixed in the meantime somewhen :)
Created attachment 21184 [details] /proc/id/status of locked up xorg /proc/id/status of locked up xorg see bug https://bugs.freedesktop.org/show_bug.cgi?id=10525
Created attachment 21185 [details] /proc/id/status of locked up xorg - after kill -9 /proc/id/status of locked up xorg, after i tried kill -9 -ing it see bug https://bugs.freedesktop.org/show_bug.cgi?id=10525
I'm still experiencing this bug, using xorg-server-1.4.2 in Debian Lenny package version 2:1.4.2-9. I compared deb-src and vanilla sources, xorg-server-1.4.2/os/utils.c are identical. And both seem to have included Adam Jacksons patch from http://cvs.fedora.redhat.com/viewvc/devel/xorg-x11-server/xserver-1.3.0-xkb-and-loathing.patch?view=markup The entire section > OsSigHandlerPtr old_alarm = NULL; /* XXX horrible awful hack */ in these sources is also the same in the current repository at http://cgit.freedesktop.org/xorg/xserver/tree/os/utils.c I attached /proc/<pid>/status both before trying to kill -9 it and after at https://bugs.freedesktop.org/attachment.cgi?id=21184 (before) and https://bugs.freedesktop.org/attachment.cgi?id=21185 (after). $ top sais PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13814 root 20 0 0 0 0 R 95.2 0.0 70:25.70 Xorg Any idea?
Also, this doesn't only happen with OpenOffice.org. I had it happen with rdesktop connected to a Windows XP machine, too. And only once I used OpenOffice.org on that Windows when this happened (didn't try again after). Just now I ran Opera or some other program I don't exactly remember (the Opera remained on the screen) when X locked up.
And another one. This time on Debian xserver-xorg-core 2:1.4.2-10, using rdesktop 1.6.0-2, connecting to a Win XP Pro SP3 while using Gimp 2.4.2 there. Again, I could supply /proc/<pid>/status, but they don't look much different form the last one. Again, nothing but a reboot solves this. :(
And it's still there. This time with xserver-xorg-core 2:1.4.2-11 and openoffice.org 1:3.0.1-6.
Ok, so I just read through this rather verbose bug's history. This is not an Xorg issue. FWIW, the reduced test case "just works" when I ran it on darwin with the Xorg DDX. This is likely a glibc or kernel issue. If you still have problems, please work with your distribution.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.