Summary: | XFS server crash with many dropped connections | ||||||
---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Štefan Sakalík <rabbit6440> | ||||
Component: | App/xfs | Assignee: | Xorg Project Team <xorg-team> | ||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||
Severity: | normal | ||||||
Priority: | medium | ||||||
Version: | unspecified | ||||||
Hardware: | x86-64 (AMD64) | ||||||
OS: | Linux (All) | ||||||
URL: | http://dionysos.fi.muni.cz/core | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Description
Štefan Sakalík
2009-06-04 04:51:03 UTC
I couldn't attach core dump, so here is link: http://dionysos.fi.muni.cz/core Do you have an example of how you're running netcat? I recently pushed some old Sun bug fixes from the Solaris xfs to git which include fixes for not properly closing down connections that didn't complete the font protocol handshake, which may fix this, but it's hard to tell unless you checkout git master and test it, or you provide a test case I can run. (I'm not going to be able to do much with your gentoo core file on OpenSolaris or Solaris.) I tried xfs from git and the problem still persists. Here is my config file: ----------------------------------------- client-limit = 64 clone-self = off catalogue = /usr/share/fonts/75dpi, /usr/share/fonts/100dpi, /usr/share/fonts/misc, ... default-point-size = 120 default-resolutions = 75,75,100,100 use-syslog = on ----------------------------------------- i run xfs: xfs -config /etc/X11/fs/config -droppriv -user xfs -port 7100 when I run many (~50) 'nc <host> 7100 &' and then simply terminate all those netcats, I get segmentation fault. Some info from core (one i've uploaded): segfault occurs at os/waitfor.c:207 (different linenumber in git, but files are more-less the same): client = clients[conn]; if (!client) continue; HERE: pClientsReady[nready++] = conn; client->last_request_time = current_time; client->clientGone = CLIENT_ALIVE; where nready = 1700 (same as port?), current_time = 1244112571498 Created attachment 26878 [details] [review] Patch against git master Bug seems timing sensitive so hard to reproduce, but I finally got it when running under the debugger - the crash is when nready exceeds MaxClients, so the ClientsReady buffer is overflowed, since it only has MaxClients slots allocated. While this seems possible to hit in a 32-bit build if a flood of sockets hit select at once, I hit it in a 64-bit build due to calling ffs to get the available socket for reading when the only bits set in the fdmask were past the 32-bit limit, so ffs didn't find them, since it's defined as taking only an int, and thus xfs got stuck in that loop incrementing nready until it crashed. Hit a couple other crashes while debugging - the attached patch fixes all of them in my testing. Hi, thanks for the patch. It solved the problem. Unfortunately, I've encountered another problem. I don't know if it's related but it has the same symptoms. I've changed this in config file: client-limit = 32 clone-self = on It seems clone-self is causing this issue. I've uploaded second core dump on my webserver. Some info from that core dump: Program terminated with signal 11, Segmentation fault. [New process 15629] #0 0x0000000000418fc0 in _FontTransSocketReopen (i=3, type=1, fd=3, port=0x7fff0b5f99f0 "7100") at /usr/include/X11/Xtrans/Xtranssock.c:531 528 ciptr->family = AF_UNIX; 529 memcpy(ciptr->peeraddr, ciptr->addr, sizeof(struct sockaddr)); 530 ciptr->port = rindex(addr->sa_data, ':'); here -> 531 if (ciptr->port[0] == ':') ciptr->port++; /* port should now point to portnum or NULL */ where ciptr->port = 0x0 addr->sa_data = "7100\000\000\000\000\000\000\000\000\000" (no ':' ) port = "7100" ( 524: strlcpy(addr->sa_data, port, portlen) ) With this change it works fine: char* ret_rindex = rindex(addr->sa_data, ':'); if (ret_rindex == NULL) ciptr->port = addr->sa_data; else ciptr->port = ret_rindex; On 06/17/2009 06:04 AM, bugzilla-daemon@freedesktop.org wrote: > http://bugs.freedesktop.org/show_bug.cgi?id=22084 > > > > > > --- Comment #4 from Alan Coopersmith<alan.coopersmith@sun.com> 2009-06-16 21:04:37 PST --- > Created an attachment (id=26878) > --> (http://bugs.freedesktop.org/attachment.cgi?id=26878) > Patch against git master > > Bug seems timing sensitive so hard to reproduce, but I finally got it > when running under the debugger - the crash is when nready exceeds > MaxClients, so the ClientsReady buffer is overflowed, since it only > has MaxClients slots allocated. > > While this seems possible to hit in a 32-bit build if a flood of sockets > hit select at once, I hit it in a 64-bit build due to calling ffs to get > the available socket for reading when the only bits set in the fdmask were > past the 32-bit limit, so ffs didn't find them, since it's defined as taking > only an int, and thus xfs got stuck in that loop incrementing nready until > it crashed. > > Hit a couple other crashes while debugging - the attached patch fixes all > of them in my testing. > > > Oh never mind, I have just noticed that file was from gentoo not git master. It's fixed there, I'm closing the bug. (In reply to comment #5) > Hi, thanks for the patch. It solved the problem. Thanks for verifying - I've pushed the patch to git master. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.