Sorry about the version; according to MacPorts it's libXt-1.0.5_0. I don't know what release of X.org that corresponds to. Mac OS X 10.5.6 (Intel Core Duo MacBook Pro) and Mac OS X 10.4.11 + security update 2008-008 (iBook G4). This seems to have started with recent updates to Mac OS X (10.5.5 or 10.5.6 for Leopard, and security update 2008-007 or so for Tiger). However, I've been frequently updating X.org with MacPorts as well, so it might be recent X.org versions rather than any change in Mac OS X. In XEmacs current head version, if I start "xemacs -vanilla" (ie, no user configuration), xemacs will initialize itself and start its toplevel read-eval-redisplay loop. This eventually settles into a loop which calls XtAppPending, which calls _XtWaitForSomething, which calls IOWait, which is a wrapper for poll(2) in the MacPorts configuration of libXt. This works fine (normal editing etc is possible) until I start an asynchronous process, for example with the Lisp code (start-process "echo" nil "echo") The process does get started AFAICT, and XEmacs reports the status return from start-process, then freezes in a tight loop in _XtWaitForSomething: ^Z Program received signal SIGTSTP, Stopped (user). 0x929b9e0e in poll$UNIX2003 () (gdb) next Single stepping until exit from function poll$UNIX2003, which has no line number information. IoWait (wt=<value temporarily unavailable, due to optimizations>, wf=<value temporarily unavailable, due to optimizations>) at NextEvent.c:353 353 } (gdb) _XtWaitForSomething (app=0xe111a0, ignoreEvents=0, ignoreTimers=<value temporarily unavailable, due to optimizations>, ignoreInputs=<value temporarily unavailable, due to optimizations>, ignoreSignals=1, block=0, drop_lock=0, howlong=0x0) at NextEvent.c:619 619 if (nfds == -1) { (gdb) next 696 if (nfds == 0) { (gdb) 706 if (block && howlong != NULL) (gdb) 709 if (ignoreInputs && ignoreEvents) { (gdb) 715 FindInputs (app, &wf, nfds, (gdb) ** Stepping over inlined function code. ** 466 if (!ignoreInputs) { (gdb) 454 for (ii = 0; ii < wf->num_dpys; ii++, fdlp++) { (gdb) 466 if (!ignoreInputs) { (gdb) FindInputs [inlined] () at NextEvent.c:467 467 fdlp = &wf->fdlist[wf->num_dpys]; (gdb) FindInputs [inlined] () at NextEvent.c:468 468 for (ii = wf->num_dpys; ii < wf->fdlistlen; ii++, fdlp++) { (gdb) FindInputs [inlined] () at NextEvent.c:467 467 fdlp = &wf->fdlist[wf->num_dpys]; (gdb) FindInputs [inlined] () at NextEvent.c:468 468 for (ii = wf->num_dpys; ii < wf->fdlistlen; ii++, fdlp++) { (gdb) FindInputs [inlined] () at NextEvent.c:470 470 if (fdlp->revents) { (gdb) FindInputs [inlined] () at NextEvent.c:468 468 for (ii = wf->num_dpys; ii < wf->fdlistlen; ii++, fdlp++) { (gdb) FindInputs [inlined] () at NextEvent.c:470 470 if (fdlp->revents) { (gdb) FindInputs [inlined] () at NextEvent.c:471 471 if (fdlp->revents & (XPOLL_READ|POLLHUP|POLLERR) (gdb) FindInputs [inlined] () at NextEvent.c:477 477 if (fdlp->revents & XPOLL_WRITE) (gdb) 482 if (condition) { (gdb) FindInputs [inlined] () at NextEvent.c:468 468 for (ii = wf->num_dpys; ii < wf->fdlistlen; ii++, fdlp++) { (gdb) 719 if (dpy_no >= 0 || found_input) { (gdb) 588 AdjustTimes (app, block, howlong, ignoreTimers, &wt); (gdb) ** Stepping over inlined function code. ** 590 if (block && app->block_hook_list) { (gdb) 608 if (app->rebuild_fdlist) (gdb) 612 if (drop_lock) { (gdb) 618 nfds = IoWait (&wt, &wf); (gdb) Heh-heh, heh-heh, heh-heh Beavis, let's do that again! :-( If I enable printing of signals in gdb, I see SIGIO being delivered over and over again, but I can't tell what those are for. Moving the mouse or clicking or typing in the window should queue up input I suppose, but doesn't break the infloop although gdb reports it's generating SIGIOs. nfds and wf look like this in the infloop: (gdb) print nfds $3 = 1 (gdb) print wf $4 = {fdlist = 0xeb1b50, stack = 0xbffff170, fdlistlen = 3, num_dpys = 1} gdb) print wf.fdlist[0] $5 = {fd = 6, events = 1, revents = 0} (gdb) print wf.fdlist[1] $6 = {fd = 4, events = 195, revents = 0} (gdb) print wf.fdlist[2] $7 = {fd = 7, events = 195, revents = 32} revents = 32 is POLLNVAL. lsof identifies the fds in question as xemacs 88335 steve 4 PIPE 0x5464f40 16384 xemacs 88335 steve 6u unix 0x5a6d198 0t0 ->0xbc15cc0 xemacs 88335 steve 7u CHR 15,3 0t0 73199364 /dev/ptmx The PIPE in fd=4 is used internally by XEmacs's own event loop, the unix socket fd=6 is X I assume, and the character special fd=7 is associated with the subprocess, but I don't know the details. This configuration is just ignored by _XtWaitForSomething, and it just keeps reproducing itself (I don't understand poll, so I can't guess whether the signal is being reraised within every loop or if the state of the fd persists over the loop). Thus we just skip all the way down to goto WaitLoop ... lather, rinse, repeat .... XtAppPending promises not to block, so I think this is a pretty serious bug. See also Bug #13471.
Confirmed with XEmacs 21.4.22 (latest stable release), using the MacPorts libraries (@1.0.5_0). On Leopard, both with the Mercurial HEAD version of XEmacs 21.5 and with XEmacs 21.4 the problem does not occur with Xquartz project libraries (version 2.3.2.1).
Xquartz project libraries, version 2.4.0-alpha2, also work correctly for both XEmacs 21.5 HEAD and XEmacs 21.4.22. So it's related to the MacPorts build.
I tried rebuilding libXt with the latest version of NextEvent.c, which rebuilds the fdList every time through the WaitLoop. This didn't help. I tried changing the unconditional goto WaitLoop at the end of _XTWaitForSomething() to if (block) goto WaitLoop; else { #ifdef USE_POLL XtStackFree ((XtPointer) wf.fdlist, fdlist); #endif return -1; } This helps in the sense that it now blocks in XtAppProcessEvent, which documents the behavior. I still get an infloop in _XtWaitForSomething but that is presumably an XEmacs or Mac OS X bug. I'll take that to appropriate channels, but any advice would be appreciated.
Created attachment 22934 [details] [review] Patch to handle POLLNVAL condition as an exception.
Created attachment 22935 [details] [review] Patch to prevent reentering the WaitLoop if block==False.
In light testing, the attached patch patch-NextEvent.c-XPOLL_EXCEPT seems to win for me. I don't really understand all ramifications, so I can't actually recommend it, and I've only tried it on Mac OS X "Leopard" 10.5.6. On the other hand, it seems rather safe because it only triggers in the case that an app receives an fd with POLLNVAL set, which would seem to count as an exceptional condition, and is apparently very rare since nobody else is experiencing the infloop. I also attach a patch to respect block=False in the WaitLoop (patch-NextEvent.c-BLOCK_CORRECTLY), which seem to me to be The RightThang[tm]. for XtAppPending, but again I don't understand the other uses of _XtWaitForSomething so I can't recommend it whole-heartedly.
The reason this works in the xquartz.macosforge.org build is that we actually do: --- libXt-1.0.5.orig/configure.ac 2008-02-12 19:02:35.000000000 -0800 +++ libXt-1.0.5/configure.ac 2008-02-12 19:02:41.000000000 -0800 @@ -43,8 +43,6 @@ PKG_CHECK_MODULES(XT, sm x11 xproto kbpr AC_CHECK_HEADER([alloca.h], AC_DEFINE(INCLUDE_ALLOCA_H, 1, [Define to 1 if Xalloca.h should include <alloca.h>])) # Map function checks to old Imake #defines -AC_CHECK_FUNC(poll, AC_DEFINE(USE_POLL,1, - [Define to 1 if you have the `poll' function.])) AC_CHECK_FUNC(snprintf, AC_DEFINE(USE_SNPRINTF,1, [Define to 1 if you have the `snprintf' function.])) I inherited that, but the previous maintainer said that emacs exhibited a bug when we used poll() in libXt, but neither he nor I had the time to trace it any further.
OK, that doesn't help localize the bug very much, since it completely changes the behavior of the event loop with respect to IO. Do you have a pointer to the previous discussion offhand? I plan to keep looking since the USE_POLL code is much more straightforward, and XtAppPending should never block, even temporarily, but I guess using select() instead of poll() is a reasonable workaround for MacPorts. Not to mention that something is very likely wrong with the Emacsen, since somehow they're passing around dead fds. FWIW, I'm on the emacs-devel list and haven't seen anything there; nor have others on the XEmacs lists complained about it, so it's not an urgent problem. But I would prefer that it not be closed.
Stephen, sorry I don't have a link to that discussion. As I mentioned, it was something that I inherited and didn't dig deep into (when you inherit 200+ bugs, it's hard to find time to satisfy curiosities when there's not really something "broken"). The patches look good for me... although the *real* problem is figuring out why POLLNVAL is being returned when the fd is still valid.
(In reply to comment #9) > The patches look good for me... although the *real* problem is figuring out why > POLLNVAL is being returned when the fd is still valid. Unfortunately, xnu (the OS X kernel) does not properly support poll() on character devices (in this case, /dev/ptmx). This is being tracked by Apple bug number 3710161 -- if you Google for that, you'll find a couple other references to it e.g. http://lists.apple.com/archives/darwin-dev/2005/May/msg00220.html I think that POLLNVAL means "don't poll() me plz". The real solution for this is for autoconf to automatically detect that we are building for OS X and disable poll().
That's curious... so then why is Stephen eventually getting valid data at some point?
Jamey, didn't you say you touched something like this recently? Is this still a problem?
This symptom doesn't appear to be related to libX11 or libxcb, but we did fix a similar-looking bug in libxcb. Here's the patch that we used there: http://cgit.freedesktop.org/xcb/libxcb/commit/?id=f0565e8f06aadf760a9065a97b8cf5ab9cbd18de Perhaps that will inspire a solution?
*** Bug 13471 has been marked as a duplicate of this bug. ***
Resolved on git master in the same way as libxcb handled it: To ssh://git.freedesktop.org/git/xorg/lib/libXt 0d8ef50..70bb9e2 master -> master
I've reverted this change from libXt as it caused some regressions. #47203 #47216 Reopening for further investigation, but I think this might not be necessary any more since I believe we should fall through to _XIOError() from _XEventsQueued() from XEventsQueued() from FindInputs() because xcb should be flagging the relevant connection as broken and we check xcb_connection_has_error() in _XEventsQueued.
Comment on attachment 22935 [details] [review] Patch to prevent reentering the WaitLoop if block==False. >--- src/NextEvent.c.orig 2006-07-12 03:05:55.000000000 +0900 >+++ src/NextEvent.c 2009-02-14 17:22:36.000000000 +0900 >@@ -722,7 +722,14 @@ > #endif > return dpy_no; > } >- goto WaitLoop; >+ if (block) >+ goto WaitLoop; >+ else { >+#ifdef USE_POLL >+ XtStackFree ((XtPointer) wf.fdlist, fdlist); >+#endif >+ return -1; >+ } > } > > #define IeCallProc(ptr) \
I am also frequently getting same kind of error on a 32 bit Linux X application. When this error happened I forcefully dumped core and see the back trace. Below is the part of back trace. #11 0xf7381864 in _XIOError () from /lib/libX11.so.6 #12 0xf737ed2f in _XEventsQueued () from /lib/libX11.so.6 #13 0xf736f770 in XEventsQueued () from /lib/libX11.so.6 #14 0xf74a6c26 in _XtWaitForSomething () from /lib/libXt.so.6 #15 0xf74a7eda in XtAppProcessEvent () from /lib/libXt.so.6 Is this error related with this bug ?? Is this issue have some fix till now?? Please help.
Hi, Subrata I'm the OP, not a developer, but it seems the developers aren't able to respond immediately so I'll tell you what I can. I'd like to begin by advising you that the wait loop is quite complex, so the partial trace you sent is not enough to confirm any relationship with this bug. Please add the whole trace you get so that the developers can more accurately diagnose the situation. Now, in the case that I reported, there were two interacting problems. The first was that the loop entry test didn't handle all possible conditions, only those that the developers recognized as valid. This is a general practice in Xorg, apparently, and I was told my patches attached to the issue which validate the condition would not be accepted. The second problem was that due to an incomplete implementation on Apple's part (which didn't get fixed in the 3 years I tracked this bug), an invalid file descriptor was left in the table, and so it was possible to enter the loop because of an invalid condition. Since nothing in the loop can fix the invalid condition, this led to an infinite loop. I couldn't find a way to fix or workaround the invalid file descriptor, either, and an alternative configure option to Xt that used select() rather than poll() gave acceptable results so I eventually gave up on this bug. AFAIK there have never been any bugs in Linux that would cause an invalid file descriptor to be left in the table, or a spurious report of POLLNVAL. So if there is any relation of your issue to this bug, the most likely cause is in the program that is freezing. That is, the application is allocating a file descriptor, it doesn't get closed but somehow becomes invalid, and thus causes this behavior. Of course it is not impossible that a new bug has been introduced into Linux or into Xorg, but I think your best use of time would be to check on the possibility that the application has a bug in its resource cleanup as described above. It appears that libxcb uses a patch similar in intent to my "handle POLLNVAL" patch above. Perhaps if you switch to a libxcb-based build of Xorg you'll get better results. (That's a wild guess since I have no idea what your Xorg configuration looks like.) Good luck, sorry I can't be more helpful.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/lib/libxt/issues/3.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.