Bug 22084

Summary: XFS server crash with many dropped connections
Product: xorg Reporter: Štefan Sakalík <rabbit6440>
Component: App/xfsAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
URL: http://dionysos.fi.muni.cz/core
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Patch against git master none

Description Štefan Sakalík 2009-06-04 04:51:03 UTC
When I open many connections using netcat simultaneously, xfs crashes after I terminate all netcat programs. It happens especially with low numbers in client-limit (for example 4). I'm using gentoo, xfs version 1.0.8.

Command line parameters:
/usr/bin/xfs -config /etc/X11/fs/config -droppriv -user xfs -port 7100
Core dump is attached.
Comment 1 Štefan Sakalík 2009-06-04 05:09:53 UTC
I couldn't attach core dump, so here is link: http://dionysos.fi.muni.cz/core 
Comment 2 Alan Coopersmith 2009-06-09 16:10:55 UTC
Do you have an example of how you're running netcat?   I recently pushed some
old Sun bug fixes from the Solaris xfs to git which include fixes for not
properly closing down connections that didn't complete the font protocol 
handshake, which may fix this, but it's hard to tell unless you checkout git
master and test it, or you provide a test case I can run.   (I'm not going to
be able to do much with your gentoo core file on OpenSolaris or Solaris.)
Comment 3 Štefan Sakalík 2009-06-10 02:38:32 UTC
I tried xfs from git and the problem still persists.
Here is my config file:
-----------------------------------------
client-limit = 64
clone-self = off
catalogue = /usr/share/fonts/75dpi,
        /usr/share/fonts/100dpi,
        /usr/share/fonts/misc,
        ...

default-point-size = 120
default-resolutions = 75,75,100,100
use-syslog = on
-----------------------------------------
i run xfs: xfs -config /etc/X11/fs/config -droppriv -user xfs -port 7100
when I run many (~50) 'nc <host> 7100 &' and then simply terminate all those netcats, I get segmentation fault.

Some info from core (one i've uploaded):
segfault occurs at os/waitfor.c:207 (different linenumber in git, but files are more-less the same):

      client = clients[conn];
      if (!client)
         continue;
HERE: pClientsReady[nready++] = conn;       
      client->last_request_time = current_time;
      client->clientGone = CLIENT_ALIVE;

where nready = 1700 (same as port?), current_time = 1244112571498
Comment 4 Alan Coopersmith 2009-06-16 21:04:37 UTC
Created attachment 26878 [details] [review]
Patch against git master

Bug seems timing sensitive so hard to reproduce, but I finally got it
when running under the debugger - the crash is when nready exceeds
MaxClients, so the ClientsReady buffer is overflowed, since it only 
has MaxClients slots allocated.

While this seems possible to hit in a 32-bit build if a flood of sockets
hit select at once, I hit it in a 64-bit build due to calling ffs to get
the available socket for reading when the only bits set in the fdmask were
past the 32-bit limit, so ffs didn't find them, since it's defined as taking
only an int, and thus xfs got stuck in that loop incrementing nready until
it crashed.

Hit a couple other crashes while debugging - the attached patch fixes all
of them in my testing.
Comment 5 Štefan Sakalík 2009-06-17 07:21:49 UTC
Hi, thanks for the patch. It solved the problem.
Unfortunately, I've encountered another problem. I don't know if it's 
related but it has the same symptoms.
I've changed this in config file:
client-limit = 32
clone-self = on
It seems clone-self is causing this issue.
I've uploaded second core dump on my webserver.

Some info from that core dump:
Program terminated with signal 11, Segmentation fault.
[New process 15629]
#0  0x0000000000418fc0 in _FontTransSocketReopen (i=3, type=1, fd=3, 
port=0x7fff0b5f99f0 "7100") at /usr/include/X11/Xtrans/Xtranssock.c:531
528    ciptr->family = AF_UNIX;
529    memcpy(ciptr->peeraddr, ciptr->addr, sizeof(struct sockaddr));
530    ciptr->port = rindex(addr->sa_data, ':');
here -> 531    if (ciptr->port[0] == ':') ciptr->port++; /* port should 
now point to portnum or NULL */

where
ciptr->port = 0x0
addr->sa_data = "7100\000\000\000\000\000\000\000\000\000" (no ':' )
port = "7100"     ( 524: strlcpy(addr->sa_data, port, portlen)  )

With this change it works fine:
     char* ret_rindex = rindex(addr->sa_data, ':');
     if (ret_rindex == NULL)
       ciptr->port = addr->sa_data;
     else
       ciptr->port = ret_rindex;



On 06/17/2009 06:04 AM, bugzilla-daemon@freedesktop.org wrote:
> http://bugs.freedesktop.org/show_bug.cgi?id=22084
>
>
>
>
>
> --- Comment #4 from Alan Coopersmith<alan.coopersmith@sun.com>   2009-06-16 21:04:37 PST ---
> Created an attachment (id=26878)
>   -->  (http://bugs.freedesktop.org/attachment.cgi?id=26878)
> Patch against git master
>
> Bug seems timing sensitive so hard to reproduce, but I finally got it
> when running under the debugger - the crash is when nready exceeds
> MaxClients, so the ClientsReady buffer is overflowed, since it only
> has MaxClients slots allocated.
>
> While this seems possible to hit in a 32-bit build if a flood of sockets
> hit select at once, I hit it in a 64-bit build due to calling ffs to get
> the available socket for reading when the only bits set in the fdmask were
> past the 32-bit limit, so ffs didn't find them, since it's defined as taking
> only an int, and thus xfs got stuck in that loop incrementing nready until
> it crashed.
>
> Hit a couple other crashes while debugging - the attached patch fixes all
> of them in my testing.
>
>
>    
Comment 6 Štefan Sakalík 2009-06-17 07:42:02 UTC
Oh never mind, I have just noticed that file was from gentoo not git master. It's fixed there, I'm closing the bug.
Comment 7 Alan Coopersmith 2009-06-17 08:06:42 UTC
(In reply to comment #5)
> Hi, thanks for the patch. It solved the problem.

Thanks for verifying - I've pushed the patch to git master.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.