Summary: | The returning event of XNextEvent() has incorrect timestamp | ||
---|---|---|---|
Product: | xorg | Reporter: | Mike Qin <mikeandmore> |
Component: | Lib/Xlib | Assignee: | Xorg Project Team <xorg-team> |
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> |
Severity: | major | ||
Priority: | high | CC: | hramrach, ishikawa, jeremyhu |
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
See Also: |
https://bugzilla.mozilla.org/show_bug.cgi?id=787943 https://bugzilla.gnome.org/show_bug.cgi?id=654831 |
||
Whiteboard: | 2011BRB_Reviewed | ||
i915 platform: | i915 features: | ||
Attachments: |
Description
Mike Qin
2011-07-19 02:58:47 UTC
Do you have a reduced test case to show this bug off? If this were the case, I'd expect many things to break... (In reply to comment #1) > Do you have a reduced test case to show this bug off? If this were the > case, > I'd expect many things to break... Actually, many things have begun to break. Thunderbird mail client for one. Some users of thunderbird (and firefox) have begun noticing a strange timestamp problems in the last 18 months or so, and lately one cause of the problem has been tracked to the uninitialized time (and possibly serial) field of XEvent. The problem mentioned by Mike Qin in the original report that "XNextEvent() has incorrent timestamp on every XIM commit event" fits the bill which the attached patch tries to fix. (You may want to remove the fprintf statemtns.) The first hunk in the patch address the uninitialized fieds in _XimCommitRecv before it puts back the event: thus it is quite likely that Mike Qin is seeing the problem caused by this use of uninitialized event fields. The second hunk is for an execution path which I have not experienced so far, but will plague some eventually (or already has done so already). See mozilla's bugzila. https://bugzilla.mozilla.org/show_bug.cgi?id=787943 Using XIM with Firefox will cause firefox menu cannot popup correctly. The problem reported in the mozilla bugzilla boils down to that the uninitialized time stamp value makes it into the X11 event queue, and somehow GDK library picks it up and later uses the bogus timestamp for its internal consistency check and many strange things happen: one is that thunderbird (and firefox) can no longer display pull down menus. It seems to have hurt only people using XIM to input characters outside US-ASCII, etc. There you can find that the unpatched imDefLkup.c will produce bogus time stamp: see the comment 38 of that bugzilla entry. https://bugzilla.mozilla.org/show_bug.cgi?id=787943#c38 The attached patch has been re-worked just to report that the time stamp is set to 0 and serial is set accordingly using LastKnownRequstProcessed. With the patch, I have not seen the problem mentioned in mozilla's bugzilla for a couple of days. (Befor it happens maybe a few times a day AT LEAST, and was very trouble some. And now that come to think of it, it happened more or less whenever I did some Japanese output or copy&paste inside gnome terminal console. Now I know why.) Hope this helps. Created attachment 68677 [details]
Patch to X11 modules/im/ximcp/{imDefLkup.c,imTrans.c}
The accompanying patch.
The line numbers may be a way off due to other debugging statements
which are removed from the final patch.
In both places, event data structure has uninitialized time and serial fields.
(I wonder how this was not noticed earlier.)
TIA
Wow, this bug has been a whole year. Pretty surprised that there are still people working on this... I'm not convinced by myself that setting timestamp to zero is always correct. In _XimCommitRecv() we have the pointer to XIM event packet and we could set ev.time to buf_l[3], just like how it's being done in imExten.c? How is the timestamp generated for other normal X11 event? (In reply to comment #4) > Wow, this bug has been a whole year. Pretty surprised that there are still > people working on this... > I started to work on this bug pretty lately since there is a serious usability issue now with thunderbird Necessity is the mother of invetions, er, debugging ? > I'm not convinced by myself that setting timestamp to zero is always > correct. In _XimCommitRecv() we have the pointer to XIM event packet and we > could set ev.time to buf_l[3], just like how it's being done in imExten.c? Right. However, value of timestamp 0 seems to have a special meaning of "CURRENT TIME", at least for GDK if I am not mistaken. So setting it to 0 is a safe escape, I think. > How is the timestamp generated for other normal X11 event? I have not investigated much but I am sure that a totally new event will use the X11 server timestamp. When an event is passed after subtle modification I am not quite sure what is the preferred or correct method. For example, what about an event that is putback after some of the field are modified or fabricated. The particular putback event seems to be of the latter variety. But I have no position here. If thunderbird works as expected without breaking the menu system, I am a happy camper :-) Seriously, I will place a print statement to see if buf_l[3] contains something useful. But as I mentioned above, setting a timestamp to 0 seems a safe option, too. TIA (In reply to comment #5) > (In reply to comment #4) > > I'm not convinced by myself that setting timestamp to zero is always > > correct. In _XimCommitRecv() we have the pointer to XIM event packet and we > > could set ev.time to buf_l[3], just like how it's being done in imExten.c? > > Right. > > However, value of timestamp 0 seems to have a special meaning of "CURRENT > TIME", at least for GDK if I am not mistaken. So setting it to 0 is a safe > escape, I think. > > > Seriously, I will place a print statement to see if buf_l[3] contains > something useful. But as I mentioned above, setting a timestamp to 0 seems a > safe option, too. > > TIA I have found out that the code in question in imExten.c is for forwarding X Event and thus have the Event data structure at hand. However, the place where the fix was made in my patch is handling XIM_COMITT message and the message does not have the Event structure associated with it. So from the passed data structure, it seems impossible to obtain any meaningful timestamp. The context information associated with input-method-ID (XIM) and input-context-ID (IC) also do not seem to store any time stamp or such. XIM is not designed that way, it seems. I am printing time field value near line 380 from an event passed to the routine there. The event seems to have a proper time stamp there. Unless such external event is passed, XIM related routine cannot extract the time stamp from the XIM message in general (except for XIM_FORWARD_EVENT message) So I think setting the field to 0L is the best we can hope right now. Now my position now is as follows: - Before many applications worked with random junk set to the time field. - Only recently in the last 18months or so, gdk libraries and metacity window manager, etc. have begun enforcing the checking of seemingly valid value in time field. (0L is used as indicating "CURRENT TIME"), and sometimes store the bogus timestamp to the demise of subsequent expected behavior. So my guess is that if we can get past by gdk library and window manager by setting the time field to 0, the rest of the X applications should be fine. (After all, they did not bother with the junk value in time field at all.) Now, 32 bit time counter counting 1/1000 seconds will wrap around in about 49 days. (Windows NT had a famous history of rebooting / crashing due to this.) So if setting 0 to time filed DOES cause a trouble, I will start seeing problems within a 49 day usage cycle sometime in the next few weeks. I am not sure if I can keep on running the same X session that long, but I will try and see if I can report a problem due to setting the time field to 0. TIA Created attachment 69001 [details] Renewed patch (adding an initalization to buf32 in imDefLkup.c) It has been almost full week. I have NOT had a single occurrence of the problem of pull-down menu not showing correctly for the full week after I applied the patch(es). So as far as my user experience is concerned, the patch(es) have improved it very much! I have less frustration on my linux desktop now. I am attaching a slightly renewed patch here. I added another initialization for buf32 in imDefLkup.c: access to uninitialized buf32 DID HAPPEN during my testing and random value may lead to the crash. (Of course, why this happened is anybody's guess.) Anyway, all those incorrect timestamp issues are gone! Well almost. (See the warnings below.) I wonder this also applies to people who use Chinese input engine. [WARNINGS] During the whole time (one full week now), the ill-effect of timestamp being 0 may not be that obvious, if any. Only THREE (3) warnings I got during the last week's operation from the window manager after my fix in ~/.xsession-errors: As usual "ウィンドウ・マネージャーの警告" means "Warning from the Window Manager" in Japanese. (1) I got the following sequence of messages TWICE: ウィンドウ・マネージャーの警告: Buggy client sent a _NET_ACTIVE_WINDOW message with a timestamp of 0 for 0x460008e (Iceweasel) ウィンドウ・マネージャーの警告: meta_window_activate called by a pager with a 0 timestamp; the pager needs to be fixed. metacity is fuzzy about this issue, and states that this problem can be ignored for now (in the source file, that is). And its definition of "pager" is very fuzzy, too. (2) Not sure if the following message is related to my fix. Already GTK library had an issue with the menu system : (see the thread which includes this post: Re: X11 timestamp problems? ( "Re: Minutes of the GNOME Board meeting 19 March 2001" ) from Jim Gettys https://mail.gnome.org/archives/gtk-devel-list/2001-March/msg00563.html ウィンドウ・マネージャーの警告: GtkMenu failed to grab the pointer (3) ??? transient problem caused by the replacement of libX11 ??? The following is a series of messges I got once: this could be a real problem. Or these are transient errors caused because I was replacing the libx11 library with different dump message: (gnome-settings-daemon:18356): color-plugin-WARNING **: not found device (null): Failed to FindDeviceByProperty: GDBus.Error:org.freedesktop.ColorManager.Failed: property match 'XRANDR_name'='default' does not exist gnome-session[18281]: WARNING: Client '/org/gnome/SessionManager/Client31' failed to reply before timeout gnome-session[18281]: WARNING: Unable to find desktop file 'thunderbird.desktop': 検索ディレクトリには妥当なキー・ファイルがありませんでした gnome-session[18281]: WARNING: Unable to find desktop file 'gnome-thunderbird.desktop': 検索ディレクトリには妥当なキー・ファイルがありませんでした ウィンドウ・マネージャーの警告: CurrentTime used to choose focus window; focus window may not be correct. ウィンドウ・マネージャーの警告: Got a request to focus the no_focus_window with a timestamp of 0. This shouldn't happen! (gnome-settings-daemon:18356): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed ** Message: applet now removed from the notification area (gnome-sound-applet:18403): Gdk-WARNING **: gnome-sound-applet: Fatal IO error 11 (リソースが一時的に利用できません) on X server :0. [Renewed Patch] I can only say that the following patch(es) have resolved my issues for now. I think trying to fill in the time-field with Xserver time is an overkill and may be a real overhead for XIM routine. (In Japanese, I found that the fixed portion is invoked only when the input is confirmed and passed to the application, but I don't know how it is used by Chinese and Korean input applicastions. If these parts are executed often (every keystroke or so), then the added overhead to obtain the timestamp would cause a very negative user experience. Imagine that you are running an application on remote machine and that you are typing into to it and each keystroke causes extra context switch and network round trip due to such fectching of xserver time. Here is the renwed patch (the lines may be a little off to your source due to the removed lines for additional dumps). Created attachment 69098 [details] [review] The same patch above but without fprintf to print the warning. Created attachment 69099 [details] [review] The same patch above but without fprintf to print the warning. I cleaned up the patch by eliminating the fprintf statements that print the warning about fixed timestamps to stderr. Hello, thanks for working on this. If you get less warnings by setting the current time on fabricated events then perhaps that's the way to go. As I understand it when you commit a sentence (or one or a few words) is the time when the application receives (completely fabricated) input. Before text is committed the input goes to the IM temporary buffer. This should be pretty much the same for all CJK or any other complex input methods. Naturally people try to optimize IMs so that they either perform direct letter for letter conversion or can convert long buffers so that extra input that is used solely for operating the conversion is reduced. That should also reduce fabricated input. Is it even possible to operate an IM remotely? I don't think I have seen this working. The IM socket is local to the machine and the IM helper is local to the X screen so IMs pass neither through X forwarding nor from one local server to another (eg. Xnest). Not sure how much is specific to the IM implementation and how much is inherent in XIM. > If you get less warnings by setting the current time on fabricated events then perhaps that's the way to go.
Hi, thank you for the comment.
I will see if setting the timestamp to the current *SERVER* time
causes any ill-effects or not (logically incorrect processing, warnings from
other programs, or performance hit when I type).
kinput2(-wnn) *may* allow a remote connection, but now I am not sure.
It may be probably only the connection to the backend server remotely
which I have certainly attempted before.
I will report my finding.
TIA
Clarification: Although I will see if I can fill in the current server time (non-zero) value and the subsequent behavior, I still think time (0L) that means CurrentTime is a good value for newly fabricated request events (not the forwarded event after a modification on the payload, etc.). I wish GDK or whatever user-level library that has caused the havoc with incorrect timestamp record internally the short history of received timestamps and who passed it so that we can figure out, after the fact, which object sent an event with bogus timestamp. At least we can identify who is most likely to blame and can focus on fixing it. Oh well. (In reply to comment #11) > > If you get less warnings by setting the current time on fabricated events then perhaps that's the way to go. > > After browsing some pages search for X server timestamp information, I came to a conclusion that "(CurrentTime is generally the best choice if no event is being processed)" Quote from the following page: http://standards.freedesktop.org/xembed-spec/xembed-spec-latest.html So I will keep the timestamp field to 0L unless there is an event being processed (for forwarding, etc.) By the way, as long as this XIM events with bogus timestamp causing problems for users, the patch posted fixed the issues for me (japanese users using kinput2 XIM front-end) and Chinese users who uses XIM front-ends for Chinese users. So the bug is indeed very real, and the patch fixes the problem. The only minor nit is maybe about once a week (I got two similar warnings in the last couple of weeks.) I get ウィンドウ・マネージャーの警告: Buggy client sent a _NET_ACTIVE_WINDOW message with a timestamp of 0 for 0x46005d4 (Mozilla Fi) ウィンドウ・マネージャーの警告: meta_window_activate called by a pager with a 0 timestamp; the pager needs to be fixed. But I have a feeling that this is not critical as compared to the near unusability of thunderbird or firefox the patch I posted tried to fix.. Also, since the value of 0 is supposed to be "CurrentTime" in Gnome and why the message is not treating it as such is a little mystery. The warnigs are from the metacity window manager: Quote from: metacity/src/core/window.c The relevant section that prints the warning messages: From below, within gnome's library we can call meta_display_get_current_time_roundtrip() to obtain the current server time, I presume. Not sure what is the difference between the _roundtrip version and the shorter one meta_display_get_current_time(). (I searched for it using google but the first page is full of reports of the function segfaulting.) Quote from: metacity/src/core/window.c 5131 /* FIXME: What a braindead protocol; no timestamp?!? */ 5132 timestamp = meta_display_get_current_time_roundtrip (display); 2978 /* For those stupid pagers, get a valid timestamp and show a warning */ 2979 if (timestamp == 0) 2980 { 2981 meta_warning ("meta_window_activate called by a pager with a 0 timestamp; " 2982 "the pager needs to be fixed.\n"); 2983 timestamp = meta_display_get_current_time_roundtrip (window->display); 2984 } 2985 ... 5284 if (timestamp == 0) 5285 { 5286 /* Client using older EWMH _NET_ACTIVE_WINDOW without a timestamp */ 5287 meta_warning ("Buggy client sent a _NET_ACTIVE_WINDOW message with a " 5288 "timestamp of 0 for %s\n", 5289 window->desc); 5290 timestamp = meta_display_get_current_time (display); 5291 } The metacity window manager seems to think(?) the use of 0 as the current timestamp ought to be reserved for the window manager only and clients ought to generate non-zero timestamps? Hard to deduce what is going on here. But, for now, good is better than broken albeit may not be perfect. So I stick to 0 value for the timestamp where no event is being passed to xim processing since obviously only metacity is complaining once a week (?). (Again, if a valid event is passed and then I should pick up the event timestamp to pass around. I agree with this.) As for XIM input front end, kinput2, here is a quote from its README. I think XIM implementation in the core libX11 in itself may be limited, but a client talking to kinput2 could be from another machine (I have not tested this.) So, probably a client on a PC-A, can talk to kinput2 on a PC-B that talks to a backend server on a client PC-C. (But again, I have not tested it yet. So there may be a limit.) --- quote README of kinput2-v3.1 Multiple Protocol: There are several protocols which define how the communication between input server and clients should be done. Kinput2 implements following protocols: * kinput protocol * kinput2 protocol * Matsushita's jinput protocol * Sony's xlc protocol * XIMP protocol (X11R5 Ximp implementation) * X Input Method Protocol (X11R6 standard) If a client can speak at least one of them, the client can communicate with kinput2 and get Japanese text. The support of X Input Method Protocol is not yet fully tested and might have many bugs. --- end quote PS: I think I found a few problems with XIM support indeed: while debugging the original problem and trying to figure out where the uninitialized timestamp crept in using valgrind, I found kinput2 accessing already freed areas sometimes. kinput seems to have a deep problem in handling clients dying suddenly. If request events are still in the input queue, it tries to pick up the related events and try to release resources. However, in doing so, it tries to already reference released data while trying to decipher what is in the subsequent events related to the dead client. This may explain why kinput2 dies silently now and then. (There is no tell-tale syslog messages, etc.) Bug hunting never seems to end. That is another story, but I am impressed with how powerful valgrind is, and how powerful typical desktop PC has become to run valgrind to uncover such subtle problems in real-time(!) Another week, and I see no ill-effect of the patch. There were a few messages regarding WM_TRANSIENT_FOR is invalid for some windows (without any mention of timestamp). I think the above warnings are related to Gnome's idea of handling of windows and not related to the bug discussed here. There was again one warning from metacity regarding "pager" using 0 timestamp ONCE. In order to learn how many XIM messages come and go, I inserted a counter in the main loop of kinput2 (XIM front end) to see how many XIM interactions it handles. Since Wednesday when I began counting, the counter went to almost 50,000 (!) late Friday afternoon while I used the system for work. [I print only the date/time for every 1000 XIM messages it handles]. 50000 messages per three days is a quite a large number. (I probably type similar number of keys (and more since I often type English only documents as well.) So I can certainly say it is being exercised very much. I think the patch is very safe. TIA Comment on attachment 69099 [details] [review] The same patch above but without fprintf to print the warning. This patch turned out to be rather safe (I see no ill-effect for the last several weeks), and is very valuable for XIM users who have been bitten with the problem of bogus timestamp. Now, I wonder how I can ask the xorg team to bring this patch into the next release? You post the patch to xorg-devel@lists.x.org for review. http://www.x.org/wiki/Development/Documentation/SubmittingPatches (In reply to comment #16) > You post the patch to xorg-devel@lists.x.org for review. > > http://www.x.org/wiki/Development/Documentation/SubmittingPatches Thank you. I posted a patch according the guideline, and it is in the queue managed by PatchWorks: http://patchwork.freedesktop.org/project/Xorg/list/ TIA and I wish everyone a peaceful holiday season. I took the liberty of increasing the importance (severity) of this bug entry due to the seriousness of the issue caused by the bug (and the patch posted here fixes it.) Here is a copy of an e-mail I wrote to xorg-devel mailing list. I have received inquiries over the course of last several months thanking that the patch works locally, but at the same time the people who wrote to me wished and wondered when the patch would be in the official release distributed by many linux distributions. So do I. Any tips to get this patch incorporated into bug-fix releases quickly would be appreciated. TIA --- quote from a post to xorg-devel mailing list --- I posted a patch to fix the problem reported in the following bugzilla entry. Bug 39367 - Summary: The returning event of XNextEvent() has incorrect timestamp [1] https://bugs.freedesktop.org/show_bug.cgi?id=39367 The Bug 39367 was filed in July 2011, but my patch was posted in Dec 18, 2012. I subsequently posted the final patch to xorg-devel mailing list. It is recorded in the following: [2] http://patchwork.freedesktop.org/patch/12686/ I wonder if someone can enlighten me to make this patch incorporated into new bug-fix release of Xserver and its libraries. Shall I file a different bugzilla report since the Bug 39367 seemed to have been reviewed in 2011? Or can someone point me to the right maintainer of the affected modules/files? Background: I think the lapse between the original report and my follow-up (which I posted after I noticied the problem indepently of the original report and found a fix) did not help to make my follow-up patch report stand out as important. (The original report had this line in " URL: Whiteboard: 2011BRB_Reviewed " and I am afraid that this bug discussion fell through the floor. Obviously 2011 BRB whatever happened before my follow-up fix. Also, although I posted the patch to xorg-devel@lists.x.org mailing list the timing of the post near the holiday season toward the end of the year 2012 may not have helped. Over the last several months, I have received inquiries from several people: the patch helped them to solve the issues of non-working menu system in Mozilla Thunderbird, and Firefox (which is confused to no end due to incorrect timestamp generated by XIM module). They wondered when this patch makes it into the next revision of Xorg server. The patch as it stands fixes a major issue for users of thunderbird/firefox under linux who need to use "XIM input method" to type non-ASCII, non-latin characters such as Chinese, Japanese, Korean and others. The problem that plagued these users is serious. Simply put, the pull-down menus or context-menu (invoked by right-button) no longer works if the bogus timestamp (created by Xlib's XIM module, for example ) is recorded by some programs including thunderbird, firefox, etc. We have to kill the programs (or worse re-start Xserver.) With the patch, we can avoid the bogus timestamp from Xlib. Since the serious issue is solved by the patch, these people contacted me to find out when the official release, which would be incorporated into many linux distributions, will incorporate the patch. TIA Chiaki PS: I looked at the list of maintainers and could not find relevant contact if I am not mistaken. I suspect that nobody suspected this 15 years old (or older?) code had a dormant bug that plagues people today, and that is why there is no explicit maintainer for XIM module --- end quote --- Just FYI: there is a different thread at mozilla bugzilla about a bug which will be fixed by the patch posted here. https://bugzilla.mozilla.org/show_bug.cgi?id=779900 Many people got frustrated by this bug and so I wish that the patch is incorporated soon. TIA (In reply to comment #19) > Just FYI: there is a different thread at mozilla bugzilla about a bug which > will be fixed by the patch posted here. > > https://bugzilla.mozilla.org/show_bug.cgi?id=779900 > > Many people got frustrated by this bug and so I wish that the patch is > incorporated soon. > > TIA Just want to confirm that your patch is actually committed? http://cgit.freedesktop.org/xorg/lib/libX11/commit/?id=8f58e54 Maybe we should close this bug for now? (Even though it has been 2 years and I already don't have the environment to confirm your fix.) I guess because of this issue, most of people already moved from XIM to gtk/qt immodule. Sorry, forgot to close the bug when committing the patch. Closed now. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.