|Summary:||Calling free() in signal handler -> hang|
|Product:||xorg||Reporter:||Bernie Innocenti <bernie>|
|Component:||Server/General||Assignee:||Xorg Project Team <xorg-team>|
|Status:||RESOLVED FIXED||QA Contact:||Xorg Project Team <xorg-team>|
|i915 platform:||i915 features:|
|Bug Depends on:|
Description Bernie Innocenti 2007-03-07 07:10:52 UTC
Very often I can freeze the Xorg server (git head) by typing my username in the gdm editbox. Looks like a segfault happened and xf86SigHandler is trying to kill the server from _inside_ the SEGV signal handler, thus deadlocking on the global malloc arena lock. Calling free() (and many other libc functions) from signal handlers is illegal according to POSIX. So I think the codepath should be changed to avoid doing too much in SEGV... or add explicit checks to skip disallowed functions (i.e. almost any). (gdb) bt #0 0x0000003d3dad9ca8 in __lll_mutex_lock_wait () from /lib64/libc.so.6 #1 0x0000003d3da73382 in _L_lock_14395 () from /lib64/libc.so.6 #2 0x0000003d3da72411 in free () from /lib64/libc.so.6 #3 0x00000000004457aa in CloseDevice () #4 0x0000000000445b1a in CloseDownDevices () #5 0x000000000057ba67 in AbortServer () #6 0x000000000057bffe in FatalError () #7 0x0000000000483be7 in xf86SigHandler () #8 <signal handler called> #9 0x0000003d3da6f3f9 in _int_malloc () from /lib64/libc.so.6 #10 0x0000003d3da70b3d in malloc () from /lib64/libc.so.6 #11 0x0000000000576007 in Xalloc () #12 0x0000000000548ef2 in XkbCopyKeymap () #13 0x000000000045c990 in SwitchCoreKeyboard () #14 0x00000000004d8d0f in mieqProcessInputEvents () #15 0x00000000004842d1 in ProcessInputEvents () #16 0x000000000044c8a8 in Dispatch () #17 0x000000000043429a in main ()
Comment 1 Adam Jackson 2008-02-24 21:31:32 UTC
We should almost certainly siglongjmp() out of the signal handler, at a minimum.
Comment 2 Peter Hutterer 2008-04-22 23:44:01 UTC
Created attachment 16120 [details] [review] 0001-dix-set-up-siglongjmp-to-jump-to-in-case-of-a-Fatal.patch First attempt on a patch that uses siglongjmp to get out of the signal handler before calling FatalError. This is new territory for me, please review. I won't commit this before without an ACK. Tested it by intentionally screwing up a realloc within GetPointerEvents, seems to work ok.
Comment 3 Peter Hutterer 2008-04-30 21:06:11 UTC
Edgar Toernig states on the list: "This won't help at all - the lock is still held. A longjmp doesn't magically unlocks all locks and brings all data structures back into a consistent state. It's really only a jump and the rules which functions are save to be called still apply." http://lists.freedesktop.org/archives/xorg/2008-April/035039.html
Comment 4 Paulo César Pereira de Andrade 2008-04-30 21:57:48 UTC
Created attachment 16285 [details] [review] 0505-SAVE_CONTEXT-Mandriva-Custom-X-Server-patch.patch This patch added mainly as a sample of a use of sigsetjmp/siglongjmp. But I it is for server-1.4-branch, and addressing general error conditions. Basically the only failures it doesn't address are things like a busy loop with interrupts disabled (that usually only leave a few cicles with interrupts enabled and may not give enough time to handle keyboard events). The logic is simple: 1. When entering the main loop: 1.1. Sets a server and client "exception stack context" 2. When processing a client request: 2.1. Sets a volatile variable as a pointer to the client info. 3. When finished processing the client request: 3.1. Unsets the volatile pointer, as go do the server "idle" work, usually sleep in select, and/or awake on some timers deferred by signals, and some timers just server house keeping, like screensaver 4. Either the magic Ctrl+Alt+Backspace is pressed or a signal is received: 4.1. If processing a client event, sends BadImplementation result code, kills de client, and "jumps" to a safe location waiting for more requests, etc 4.2. If not processing a client event, jumps to a safe location and proceeds as if nothing did happen If the cause was a signal and it dropped to the signal handler, also sets an internal flag to know that the server is probably "corrupted", and if getting back to the signal handler, proceeds with usual behaviour. Note that the patch doesn't allocate memory, or try to work on problems with memory allocation from signal handlers, and it is in a single chunk with probably more code than it should. The patch also only works with the kbd driver (patch here: http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/x11-driver-input-keyboard/current/SOURCES/0006-Add-support-for-the-SAVE_CONTEXT-Mandriva-patch.patch?view=markup and yes, it is ugly :-) and requires at least initial xkb/keyboard handling working properly. If there is interest (and when I have time :-) I can try to work more on it, and port it to current X Server. The main reasons I never submited it to xorg is because: o it doesn't handle all failure cases o doesn't play nice with xkb that allows reconfiguring things like Ctrl+Alt+Backspace o only works with the kdb driver o Pressing Ctrl+Alt+Backspace when not in a real failure state may just kill some poor random client that happened to make a request at that time, and the user will need to press again to actually kill the X Server, i.e. to catch it when not handling a client event
Comment 5 Peter Hutterer 2008-07-02 01:34:14 UTC
Paulo: please send this patch to xorg, maybe it sparks some public discussion there. I don't feel qualified to comment on it, and there was no activity for over a month now.
Comment 6 Daniel Stone 2009-08-31 18:28:20 UTC
the XKB case no longer applies, as XkbCopyKeymap is now only called during event processing, not event enqueuing (which makes sense anyway). matthieu and peter have been pretty diligent about cleaning up the rest, so marking as fixed.