Very often I can freeze the Xorg server (git head) by typing
my username in the gdm editbox.
Looks like a segfault happened and xf86SigHandler is trying
to kill the server from _inside_ the SEGV signal handler,
thus deadlocking on the global malloc arena lock.
Calling free() (and many other libc functions) from signal
handlers is illegal according to POSIX.
So I think the codepath should be changed to avoid doing too
much in SEGV... or add explicit checks to skip disallowed
functions (i.e. almost any).
#0 0x0000003d3dad9ca8 in __lll_mutex_lock_wait () from /lib64/libc.so.6
#1 0x0000003d3da73382 in _L_lock_14395 () from /lib64/libc.so.6
#2 0x0000003d3da72411 in free () from /lib64/libc.so.6
#3 0x00000000004457aa in CloseDevice ()
#4 0x0000000000445b1a in CloseDownDevices ()
#5 0x000000000057ba67 in AbortServer ()
#6 0x000000000057bffe in FatalError ()
#7 0x0000000000483be7 in xf86SigHandler ()
#8 <signal handler called>
#9 0x0000003d3da6f3f9 in _int_malloc () from /lib64/libc.so.6
#10 0x0000003d3da70b3d in malloc () from /lib64/libc.so.6
#11 0x0000000000576007 in Xalloc ()
#12 0x0000000000548ef2 in XkbCopyKeymap ()
#13 0x000000000045c990 in SwitchCoreKeyboard ()
#14 0x00000000004d8d0f in mieqProcessInputEvents ()
#15 0x00000000004842d1 in ProcessInputEvents ()
#16 0x000000000044c8a8 in Dispatch ()
#17 0x000000000043429a in main ()
We should almost certainly siglongjmp() out of the signal handler, at a minimum.
Created attachment 16120 [details] [review]
First attempt on a patch that uses siglongjmp to get out of the signal handler before calling FatalError. This is new territory for me, please review. I won't commit this before without an ACK.
Tested it by intentionally screwing up a realloc within GetPointerEvents, seems to work ok.
Edgar Toernig states on the list:
"This won't help at all - the lock is still held. A longjmp doesn't
magically unlocks all locks and brings all data structures back into
a consistent state. It's really only a jump and the rules which
functions are save to be called still apply."
Created attachment 16285 [details] [review]
This patch added mainly as a sample of a use of
But I it is for server-1.4-branch, and addressing
general error conditions. Basically the only failures
it doesn't address are things like a busy loop with
interrupts disabled (that usually only leave a few
cicles with interrupts enabled and may not give enough
time to handle keyboard events).
The logic is simple:
1. When entering the main loop:
1.1. Sets a server and client "exception stack context"
2. When processing a client request:
2.1. Sets a volatile variable as a pointer to the client
3. When finished processing the client request:
3.1. Unsets the volatile pointer, as go do the server
"idle" work, usually sleep in select, and/or awake
on some timers deferred by signals, and some timers
just server house keeping, like screensaver
4. Either the magic Ctrl+Alt+Backspace is pressed or
a signal is received:
4.1. If processing a client event, sends BadImplementation
result code, kills de client, and "jumps" to a safe
location waiting for more requests, etc
4.2. If not processing a client event, jumps to a safe
location and proceeds as if nothing did happen
If the cause was a signal and it dropped to the signal
handler, also sets an internal flag to know that the server
is probably "corrupted", and if getting back to the signal
handler, proceeds with usual behaviour.
Note that the patch doesn't allocate memory, or try to
work on problems with memory allocation from signal handlers,
and it is in a single chunk with probably more code than it
The patch also only works with the kbd driver (patch here:
and yes, it is ugly :-) and requires at least initial xkb/keyboard
handling working properly.
If there is interest (and when I have time :-) I can try
to work more on it, and port it to current X Server.
The main reasons I never submited it to xorg is because:
o it doesn't handle all failure cases
o doesn't play nice with xkb that allows reconfiguring things
o only works with the kdb driver
o Pressing Ctrl+Alt+Backspace when not in a real failure
state may just kill some poor random client that happened
to make a request at that time, and the user will need
to press again to actually kill the X Server, i.e. to
catch it when not handling a client event
please send this patch to xorg, maybe it sparks some public discussion there.
I don't feel qualified to comment on it, and there was no activity for over a
the XKB case no longer applies, as XkbCopyKeymap is now only called during event processing, not event enqueuing (which makes sense anyway). matthieu and peter have been pretty diligent about cleaning up the rest, so marking as fixed.