Bug 10212

Summary: Calling free() in signal handler -> hang
Product: xorg Reporter: Bernie Innocenti <bernie>
Component: Server/GeneralAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: esigra, matthieu.herrb
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 16399    
Attachments:
Description Flags
0001-dix-set-up-siglongjmp-to-jump-to-in-case-of-a-Fatal.patch
none
0505-SAVE_CONTEXT-Mandriva-Custom-X-Server-patch.patch none

Description Bernie Innocenti 2007-03-07 07:10:52 UTC
Very often I can freeze the Xorg server (git head) by typing
my username in the gdm editbox.

Looks like a segfault happened and xf86SigHandler is trying
to kill the server from _inside_ the SEGV signal handler,
thus deadlocking on the global malloc arena lock.

Calling free() (and many other libc functions) from signal
handlers is illegal according to POSIX.

So I think the codepath should be changed to avoid doing too
much in SEGV... or add explicit checks to skip disallowed
functions (i.e. almost any).

(gdb) bt
#0  0x0000003d3dad9ca8 in __lll_mutex_lock_wait () from /lib64/libc.so.6
#1  0x0000003d3da73382 in _L_lock_14395 () from /lib64/libc.so.6
#2  0x0000003d3da72411 in free () from /lib64/libc.so.6
#3  0x00000000004457aa in CloseDevice ()
#4  0x0000000000445b1a in CloseDownDevices ()
#5  0x000000000057ba67 in AbortServer ()
#6  0x000000000057bffe in FatalError ()
#7  0x0000000000483be7 in xf86SigHandler ()
#8  <signal handler called>
#9  0x0000003d3da6f3f9 in _int_malloc () from /lib64/libc.so.6
#10 0x0000003d3da70b3d in malloc () from /lib64/libc.so.6
#11 0x0000000000576007 in Xalloc ()
#12 0x0000000000548ef2 in XkbCopyKeymap ()
#13 0x000000000045c990 in SwitchCoreKeyboard ()
#14 0x00000000004d8d0f in mieqProcessInputEvents ()
#15 0x00000000004842d1 in ProcessInputEvents ()
#16 0x000000000044c8a8 in Dispatch ()
#17 0x000000000043429a in main ()
Comment 1 Adam Jackson 2008-02-24 21:31:32 UTC
We should almost certainly siglongjmp() out of the signal handler, at a minimum.
Comment 2 Peter Hutterer 2008-04-22 23:44:01 UTC
Created attachment 16120 [details] [review]
0001-dix-set-up-siglongjmp-to-jump-to-in-case-of-a-Fatal.patch

First attempt on a patch that uses siglongjmp to get out of the signal handler before calling FatalError. This is new territory for me, please review. I won't commit this before without an ACK.

Tested it by intentionally screwing up a realloc within GetPointerEvents, seems to work ok.
Comment 3 Peter Hutterer 2008-04-30 21:06:11 UTC
Edgar Toernig states on the list:
"This won't help at all - the lock is still held.  A longjmp doesn't
magically unlocks all locks and brings all data structures back into
a consistent state.  It's really only a jump and the rules which
functions are save to be called still apply."


http://lists.freedesktop.org/archives/xorg/2008-April/035039.html
Comment 4 Paulo C├ęsar Pereira de Andrade 2008-04-30 21:57:48 UTC
Created attachment 16285 [details] [review]
0505-SAVE_CONTEXT-Mandriva-Custom-X-Server-patch.patch

  This patch added mainly as a sample of a use of
sigsetjmp/siglongjmp.
  But I it is for server-1.4-branch, and addressing
general error conditions. Basically the only failures
it doesn't address are things like a busy loop with
interrupts disabled (that usually only leave a few
cicles with interrupts enabled and may not give enough
time to handle keyboard events).

  The logic is simple:
1. When entering the main loop:
  1.1. Sets a server and client "exception stack context"
2. When processing a client request:
  2.1. Sets a volatile variable as a pointer to the client
       info.
3. When finished processing the client request:
  3.1. Unsets the volatile pointer, as go do the server
       "idle" work, usually sleep in select, and/or awake
       on some timers deferred by signals, and some timers
       just server house keeping, like screensaver
4. Either the magic Ctrl+Alt+Backspace is pressed or
   a signal is received:
  4.1. If processing a client event, sends BadImplementation
       result code, kills de client, and "jumps" to a safe
       location waiting for more requests, etc
  4.2. If not processing a client event, jumps to a safe
       location and proceeds as if nothing did happen
  If the cause was a signal and it dropped to the signal
  handler, also sets an internal flag to know that the server
  is probably "corrupted", and if getting back to the signal
  handler, proceeds with usual behaviour.

  Note that the patch doesn't allocate memory, or try to
work on problems with memory allocation from signal handlers,
and it is in a single chunk with probably more code than it
should.

  The patch also only works with the kbd driver (patch here:
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/x11-driver-input-keyboard/current/SOURCES/0006-Add-support-for-the-SAVE_CONTEXT-Mandriva-patch.patch?view=markup
and yes, it is ugly :-) and requires at least initial xkb/keyboard
handling working properly.

  If there is interest (and when I have time :-) I can try
to work more on it, and port it to current X Server.

  The main reasons I never submited it to xorg is because:
o it doesn't handle all failure cases
o doesn't play nice with xkb that allows reconfiguring things
  like Ctrl+Alt+Backspace
o only works with the kdb driver
o Pressing Ctrl+Alt+Backspace when not in a real failure
  state may just kill some poor random client that happened
  to make a request at that time, and the user will need
  to press again to actually kill the X Server, i.e. to
  catch it when not handling a client event
Comment 5 Peter Hutterer 2008-07-02 01:34:14 UTC
Paulo:
please send this patch to xorg, maybe it sparks some public discussion there.
I don't feel qualified to comment on it, and there was no activity for over a
month now.
Comment 6 Daniel Stone 2009-08-31 18:28:20 UTC
the XKB case no longer applies, as XkbCopyKeymap is now only called during event processing, not event enqueuing (which makes sense anyway).  matthieu and peter have been pretty diligent about cleaning up the rest, so marking as fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.