I'm able to "lock up" the Unity session by opening menus quickly by using a touchscreen. Seems as if there's a grab active. I can see the tooltips from launcher icons, interact with focused apps, but that's it. Can't reproduce with plain metacity, because the menus open so quickly with it, whereas with Unity on this hw the effects slow it down so that the race is hit. Tried several of the recent patches on top of 1.13, but they haven't helped. Now I see there are newer patches available. I'll give them a try. Filed this one for tracking this particular issue.
tried patches from 56558 and 55738, also "Sync TouchListener memory.." from Carlos Garnacho, didn't help.
I just repeatedly tap on the top-most icon (the one which has the Ubuntu logo) of Ubuntu's launcher in a touchscreen. Those taps alternately open and close the dash (a fullscreen window that shows icons for applications, media and other files). Eventually those taps stop having any effect. I.e., the launcher no longer gets ButtonPress and ButtonRelease events out of them. I've added a wealth of logging (see xorg.log attachment) to try to understand what's happening on the server. From looking at it could see the following: From touches 2 to 26, launcher is the first window in the list of listeners. From touch 27 onwards, the root window is the first one. Problem is, from touch 27 onwards, xserver fails to pass the touch ownership down to the launcher window because there's always an older pointer-emulated touch (touch 26) lying around which it apparently can't get rid of (i.e. properly process).
Created attachment 70064 [details] log output of the "repeated tapping on ubuntu logo launcher icon" use case
Do cross-check with Bug 56557 as well, this can cause issues if any grabs are activated on the root window and I wonder if that influences the behaviour here
(In reply to comment #4) > Do cross-check with Bug 56557 as well, this can cause issues if any grabs > are activated on the root window and I wonder if that influences the > behaviour here Yes, they are at least closely related (most likely have the same cause) as a pointer-emulated touch gets "stuck" because of failed resource lookups in RetrieveTouchDeliveryData() as well.
Created attachment 70422 [details] log output of use case with patches from bug 56557 applied With the 4 patches mentioned in bug 56557 applied (comments 3 and 4), the bug (missing ButtonPress and ButtonRelease events) manifests itself already on the second tap on the touchscreen. Again, due to a failure in RetrieveTouchDeliveryData()
New set of patches, please try those on top of the current set you already tested. http://patchwork.freedesktop.org/patch/12519/ http://patchwork.freedesktop.org/patch/12520/ http://patchwork.freedesktop.org/patch/12521/ http://patchwork.freedesktop.org/patch/12522/
Created attachment 70656 [details] log output of use case with patches from Comment 7 also applied This is the log output I get with this new set of patches (from Comment 7) applied on top of those mentioned in Comment 6. Again, the same problem. The first tap on the icon with the ubuntu logo in the launcher (top left corner of the screen) works fine and displays the dash (a fullscreen window showing application icons, etc). The launcher now has a active pointer grab. Upon the second tap on the ubuntu icon, xserver fails to deliver events to that listener (laucnher's active pointer grab) because the corresponding RetrieveTouchDeliveryData() call fails. A snippet from the log: """ (II) TouchBeginDDXTouch: ddx id 0, touch 2 - returning with emulate pointer == 1 [ 2859.473] (II) ProcessTouchEvent: TouchBegin, master pointer, touch 2 ... [ 2859.474] (II) RetrieveTouchDeliveryData: listener(window=launcher, listener=1105199104, type=pointer_grab, state=begin, level=core) [ 2859.474] (II) dixLookupClient: failed! - rid & SERVER_BIT [ 2859.474] (II) - Not delivering to listener 1105199104 because his delivery data couldn't be retrieved. """
We are also experiencing this bug with other touch screen software, not Unity related. The underlying X problem seems to be identical. Has a solution been found?
Nope, the bug is still there. Rasterman reproduced it with E17 and commented on the downstream bug: https://bugs.launchpad.net/ubuntu-nexus7/+bug/1068994/comments/24
can you test this branch here please? http://cgit.freedesktop.org/~whot/xserver/log/?h=touch-grab-race-condition-56578 Last 5 commits (currently), starting with 2cd9c4f709f105b7a7faf31b8c10993d0949563c
unfortunately still able to reproduce it :/ I needed these commits on top of 1.13.2 to be able to compile with the new patches: cc79107a5b60d2926e16ddbee04149e8d5acc969 fe59774c55e5d423633405e0869c22f4ce382548 91ab237358c6e33da854914d3de493a9cbea7637 9ad0fdb135a1c336771aee1f6eab75a6ad874aff
You'll need all of http://cgit.freedesktop.org/~whot/xserver/log/?h=server-1.13-branch, at the least. I haven't tested this on 1.13.x at all, purely working from git master for now.
Sorry, to clarify: you need that 1.13 branch linked above AND the patches from Comment 11
Created attachment 74845 [details] evemu-record from the touchscreen attached the evemu dump from reproducing the bug by hitting the unity indicators quickly a couple of times. I'll try the more complete 1.13 build next.
Ok, analysis of the bug as follows. To trigger this bug, we need the following client stack: * touch client with a passive touch grab * core client with a passive button grab in GrabmodeSync * optional: core client with button mask on window The touch client must reject the touch. As the touch grab activates, all events are sent to the touch client, and stored in the touch event history. When the client rejects, the events are replayed on the next client. The replayed TouchBegin will trigger the core passive grab, and switch the device's processInputProc to EnqueueEvent(). BUG 1: because touch event history replaying calls DeliverTouchEvents directly, EnqueueEvent is side-stepped and no events end up in the sync'd queue. Later, when the client calls XAllowEvents no events are there for syncing, ComputeFreezes() exits early and the emulated motion/release events are not sent to the client. Fixing that is possible so that EnqueueEvent is honoured. Tricky though, because it will have a number of side-effects, see below. BUG 2: because the TouchEnd never ends up in the history (by design) no release event ends up in the queue. So when replaying, the emulated button release is missing. Not sure yet how to fix this. BUG3: If there's the optional third client, it's implicit passive grab currently does not get released. That's the easiest one to fix. Side effects of the first bug: If we use EnqueueEvent() for event history replaying, we will replay touch events into the sync buffer, but not actually process them. If there is at least one touch client below the client with the sync passive core grab, it cannot get touch events until the grabbing client calls XAllowEvents. If that touch client has the ownership mask set, that behaviour is against the protocol spec. Coincidentally, this bug already exists anyway, it's just gone unnoticed so far because touch clients appear to be generally above the normal clients. To be compliant with the touch specs, we need to wrap EnqueueEvent to still handle touch events for clients with the ownership mask even if the device is currently synced.
Branch available for testing here. I think this fixes the issue but I've been unsuccessful getting this backported to a 1.13 ubuntu server. http://cgit.freedesktop.org/~whot/xserver/log/?h=touch-grab-race-condition-56578-v2 If you can test this, that'd be much appreciated.
Hi Peter, I think your recent patches do fix the issue. I compiled your server and a fresh xinput evdev 2.7.3. I confirmed TouchBegin TouchEnd were being sent with a brief xinput test-xi2 test. # xdpyinfo |grep -E '(vendor|version)' version number: 11.0 vendor string: The X.Org Foundation vendor release number: 11399902 X.Org version: 1.13.99.902 My usual scenario to experience this problem is: run Chrome xwininfo [tap root screen, get window id of chrome window] xev -id 0x.... [use window id of chrome window] tap screen a few times to see xev notify events ctrl-C on screen, touch a UI button the press activates the UI button screen switches to new page <-- ButtonRelease is dropped somewhere from here the new UI button underlying where my finger just pressed is stuck down ^--- to here With these same testing steps above I cannot get a stuck button on your new xserver branch. It seems that the ButtonRelease event arrives correctly.
Thanks John, much appreciated. First patchset, minor changes for preparation: http://patchwork.freedesktop.org/patch/13193/ http://patchwork.freedesktop.org/patch/13194/ http://patchwork.freedesktop.org/patch/13195/ http://patchwork.freedesktop.org/patch/13196/ http://patchwork.freedesktop.org/patch/13197/ http://patchwork.freedesktop.org/patch/13198/ http://patchwork.freedesktop.org/patch/13199/ Second patchset, the actual meat: http://patchwork.freedesktop.org/patch/13204/ http://patchwork.freedesktop.org/patch/13205/ http://patchwork.freedesktop.org/patch/13206/ http://patchwork.freedesktop.org/patch/13207/ http://patchwork.freedesktop.org/patch/13208/ http://patchwork.freedesktop.org/patch/13209/ http://patchwork.freedesktop.org/patch/13210/ http://patchwork.freedesktop.org/patch/13211/ http://patchwork.freedesktop.org/patch/13212/ http://patchwork.freedesktop.org/patch/13213/ http://patchwork.freedesktop.org/patch/13214/ http://patchwork.freedesktop.org/patch/13215/
Ok I've tested them as well by building 1.14rc minus the video abi changing stuff (and commits on top of them), and added the touch branch. This allowed me to test on the nexus7 & tegra3 blob. Looks like it's much better now, although sometimes the touch appears to get somewhat hung but can recover from it later on, and when this happens also generates messages like [ 5101.196] [Xi] Too many valuators reported for device 'Virtual core pointer'. Ignoring event. on the logfile. The buffered actions prior to the hang are replayed after waiting for a while. At this stage it's quite easy to crash the server.
do you have a good backtrace for the crashes? random, or always the same spot? Is it regular in response to some interaction? can it be caused by the backports?
Created attachment 76003 [details] backtrace Here's the backtrace, seems to be the same every time. Way to reproduce here: 1. open an app, so there's a window around 2. attach an external pointer device 3. tailf the X logfile 4. hit the panel indicators frantically with the touchscreen, until the touch input is locked 5. move the window with the other pointer device 6. see how some "[Xi] ..." messages appear on the logfile 7. repeat the steps until.. 8. .. when the touch input is locked the logfile will get these Xi messages after every touch.. when this happens keep hitting the screen until it crashes, can take a couple of minutes :) so, it's only after using the other pointer device for a grab when the touch input grab is released. Also, while in step 8 I noticed that the multitouch gestures of unity seemed to work, while the panel menus failed to react. Also, Onboard seemed to work as well. So, while locked I can drag a window with a three-touch gesture but not by a single touch drag from the titlebar. Not sure what backports you mean, this is 1.14 with your branch, but ajax's video abi commits reverted so the blob (and thus unity) work.
Created attachment 76034 [details] valgrind spam that occurs when following tjaalton's instructions I can reproduce this on x1.14 with my macbook pro in the manner tjaalton described. It didn't need the video abi revert.
did you rebuild the drivers too? just wondering, because I used to get a similar crash on my backports but only when running against the system drivers, not against the upstream ones.
I still crashed even if I rebuilt the drivers against the patched xorg-server, so it's not that.
It seems that the ubuntu patches for synaptics trigger it, most likely not these: 02-do-not-use-synaptics-for-keyboards.patch - makes synaptics no longer match input.keyboard 101_resolution_detect_option.patch - Add resolutiondetect atom and config option, to add a way to disable autodetect 115_evdev_only.patch - uncomment 50-synaptics.conf 118_quell_error_msg.patch - only affects tools 124_syndaemon_events.patch - only affects syndaemon But these change some things around: 103_enable_cornertapping.patch - sets RTCornerButton default to 2, and RBCornerButton default to 3 104_always_enable_tapping.patch - always sets up tap buttons in set_default_parameters 106_always_enable_vert_edge_scroll.patch - guess :-) 128_disable_three_click_action.patch 129_disable_three_touch_tap.patch - both disable 3 touch actions, to make three-touch gestures work Presumably one of those default tweaks would cause it. I'll try to nail it down.
well I'm not using synaptics, so it's not the same crasher then?
crashes unpatched too, after all :) I guess I didn't hammer enough on the touchpad like a 3 year old
The crash is in xorg-server by the way, not in the driver, and seems to involve memory freed in xorg-server. It just seems more likely that it involves multitouch handling in xorg-server in general, and is not a bug in a specific driver. Either that or there are 2 different bugs in evdev and synaptics that both cause a similar backtrace in xorg-server, this somehow seems less likely to me. :)
can you bisect the server then? I honestly don't know where it triggers and given that it's 19 patches it'll be easier to bisect than figure it out otherwise. fwiw, I've pushed the rebased branch (only a few squashes and reshuffling), please make sure you pull first.
for reference, 1.13 server branch (at time of writing 1.13.3 release) crashes just as hard.
Thanks a lot for the hard work here. We see the same issue in Sugar, the UI is basically unusable with touch as we have a "global" grab. I tested the patches from comment 19 against xserver-1.14.0, they do solve the problem, and I cannot see any new issues introduced by them. I also tested to 1.13.3. In order to do that I first had to backport a few commits: * Update the MD's position when a touch event is received * Don't use GetTouchEvents when replaying events * Don't use GetTouchEvents in EmitTouchEnd Then I added the patches from comment #19, and things are now working equally well there.
Created attachment 77097 [details] backtrace the backport seems incomplete, since it's trivial to crash the server with unity by switching between opening the dash or indicator menus
The backported patches on 1.13.3 (from comment #32) have now been in OLPC's development builds for over a week and we haven't seen any adverse effects. I've also done some testing on 1.14.0. I can make this crash (with no backtrace) simply by going a bit crazy on the touchscreen for a few minutes, both before and after this patch series. A problem for another day. Based on this I would vote for going ahead with the merge of this patch series into master. I also found a related bug with both 1.13.3 and 1.14.0 (both before and after these patches), and posted a patch here: http://lists.x.org/archives/xorg-devel/2013-April/035878.html
first valgrind error is on int emulate_pointer = ! !(ev->device_event.flags & TOUCH_POINTER_EMULATED); So I guess it's safe to assume that ev is garbage.. Other writes seem to be related to ev too, judging from the valgrind output I guess random stuff gets overwritten. Looking at SetTapState output: 0 -> 1 1 -> 10 moving state stuff 10 -> 2 2 -> 10 moving state stuff 10 -> 2 and then a few more 2 -> 10 and 10 -> 2 with moves until valgrinds starts complaining and xserver starts crashing: (II) SetTapState - 10 -> 2 (millis:3928387395) ==25788== Invalid read of size 4 ==25788== at 0x24236E: ProcessOtherEvent (exevents.c:1519) ==25788== by 0x264CAE: ProcessPointerEvent (xkbAccessX.c:751) ==25788== by 0x166641: PlayReleasedEvents (events.c:1217) ==25788== by 0x16DED4: ComputeFreezes (events.c:1297) ==25788== by 0x16E2E3: AllowSome (events.c:1725) ==25788== by 0x16E495: ProcAllowEvents (events.c:1785) ==25788== by 0x15DC45: Dispatch (dispatch.c:432) ==25788== by 0x14C5B9: main (main.c:295) ==25788== Address 0x122336b0 is 16 bytes before a block of size 152 free'd ==25788== at 0x4C2BA6C: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==25788== by 0x806D84A: sna_mode_wakeup (sna_display.c:3500) ==25788== by 0x161F3B: WakeupHandler (dixutils.c:426) ==25788== by 0x2AF6E3: WaitForSomething (WaitFor.c:224) ==25788== by 0x15D9A0: Dispatch (dispatch.c:361) ==25788== by 0x14C5B9: main (main.c:295) This is with the patches from comment #19 + daniel drake's patch Digging more, looking up the InternalEvent struct.. int emulate_pointer = ! !(ev->device_event.flags & TOUCH_POINTER_EMULATED); Now this is a function that is looking verrrrrrrry suspicious for type == ET_TouchOwnership.. I think it would make sense to have ET_TouchOwnership handled directly by ProcessTouchOwnershipEvent, rather than through ProcessTouchEvent. Patch attached below..
Created attachment 77621 [details] [review] Call ProcessTouchOwnershipEvent directly
Valgrind came up with this complaint on 1.13.3 with the backported patches: ==15921== Invalid read of size 4 ==15921== at 0x1D0A00: DeliverTouchEvents (exevents.c:1297) ==15921== by 0x1D2589: ProcessOtherEvent (exevents.c:1611) ==15921== by 0x1567C1: TouchEventHistoryReplay (touch.c:491) ==15921== by 0x1D0EBB: TouchPuntToNextOwner (exevents.c:1120) ==15921== by 0x1D11EB: TouchRejected (exevents.c:1196) ==15921== by 0x1D28B5: ProcessOtherEvent (exevents.c:1223) ==15921== by 0x1E7DAB: ProcessPointerEvent (xkbAccessX.c:751) ==15921== by 0x204DC5: mieqProcessDeviceEvent (mieq.c:556) ==15921== by 0x1570A7: TouchListenerAcceptReject (touch.c:1013) ==15921== by 0x1D6AD3: ProcXIAllowEvents (xiallowev.c:128) ==15921== by 0x1D2BD5: ProcIDispatch (extinit.c:406) ==15921== by 0x13CC0D: Dispatch (dispatch.c:428) ==15921== Address 0xc6a0bac is 4 bytes inside a block of size 68 free'd ==15921== at 0x482E5B0: free (vg_replace_malloc.c:446) ==15921== by 0x14C129: DeletePassiveGrab (grabs.c:336) ==15921== by 0x1527FD: doFreeResource (resource.c:873) ==15921== by 0x152F7F: FreeResource (resource.c:903) ==15921== by 0x14C49F: DeletePassiveGrabFromList (grabs.c:686) ==15921== by 0x144A7D: ProcUngrabButton (events.c:5640) ==15921== by 0x13CC0D: Dispatch (dispatch.c:428) ==15921== by 0x132035: main (main.c:298) I picked the fixes from 57301 to 1.13 too: Xi: fix touch event selction conflicts (#57301), and the commit before that to make it apply. This brings 1.13 dix and Xi to the 1.14 equivalent minus pointer barriers, as far as I can tell, but then I was getting the following segfault: ==1748== Invalid read of size 4 ==1748== at 0x4831DCC: memcpy (mc_replace_strmem.c:878) ==1748== by 0x156959: TouchConvertToPointerEvent (touch.c:637) ==1748== by 0x1D0FA3: DeliverTouchEmulatedEvent.isra.0.part.1 (exevents.c:1375) ==1748== by 0x1D0C5F: DeliverTouchEvents (exevents.c:1920) ==1748== by 0x1D25B1: ProcessOtherEvent (exevents.c:1611) ==1748== by 0x1E7E03: ProcessPointerEvent (xkbAccessX.c:751) ==1748== by 0x1423B1: PlayReleasedEvents (events.c:1214) ==1748== by 0x146D13: ComputeFreezes (events.c:1294) ==1748== by 0x146F6B: AllowSome (events.c:1722) ==1748== by 0x1470BF: ProcAllowEvents (events.c:1785) ==1748== by 0x13CC0D: Dispatch (dispatch.c:428) ==1748== by 0x132035: main (main.c:298) ==1748== Address 0xcaa1284 is 156 bytes inside a block of size 280,000 free'd ==1748== at 0x482E5B0: free (vg_replace_malloc.c:446) ==1748== by 0x1570E3: TouchListenerAcceptReject (touch.c:1015) ==1748== by 0x146D6F: ComputeFreezes (events.c:1282) ==1748== by 0x146F6B: AllowSome (events.c:1722) ==1748== by 0x1470BF: ProcAllowEvents (events.c:1785) ==1748== by 0x13CC0D: Dispatch (dispatch.c:428) ==1748== by 0x132035: main (main.c:298)
Note: macbook pro (synaptics) seems to work just fine with the 1.13.3 backports, so it looks like it's a separate bug due to different behavior on a true touch device. The valgrind backtraces were on arm/tegra, which also enables a software keyboard. The easiest way to crash on ubuntu's xserver on the tegra is by making sure valgrind is running with --free-fill=fe so the freed memory is always reset to an invalid value.
Created attachment 78124 [details] [review] touch-fix.patch I have partial (full?) success (on the Lenovo Thinkpad Twist, an Intel-based convertible, see also https://launchpad.net/bugs/1068994 and https://launchpad.net/bugs/1015183): I have rebuilt the current Ubuntu Raring package of xorg-server (1.13.3-0ubuntu5) with the following two patches: 1. http://cgit.freedesktop.org/~whot/xserver/commit/?h=touch-grab-race-condition-56578-v2&id=0498a4f0e0b90a850df7022a3356f10adabff855 (found via comment #17) 2. http://lists.x.org/archives/xorg-devel/2013-April/035878.html and after that clicking via touch screen on the Lenovo Thinkpad Twist works reliably. Only remaining (minor) problems are (but the touch click ability does not get lost by them): a. In Chromium when you create a new tab, the new tab contains icons for web apps (at least the app store and perhaps some examples). These icons cannot be clicked by touch, only with a mouse. All the rest in Chromium is clickable by touch. b. Touch clicks do not work in XBMC, but after using and leaving XBMC with an external mouse on the normal desktop touch-clicking works again. These are probably separate bugs which got revealed by the now working touch click. Complete patch for xorg-server is attached.
Created attachment 78125 [details] [review] touch-fix.patch Sorry, patch is not complete. Here is the correct one.
ok, thanks to Maarten's debugging we've found the issue. listener->grab is not copied but rather referenced, leaving the grab stale once it was deleted. Reproducible test case is simply: XGrabButton() pointer-emulating touch down XUngrabButton() trigger touch update/end This doesn't necessarily crash, but once you run through valgrind to reset memory after freeing it we have a reliable crasher.
I have built xorg-server with my patch also on the Nexus 7 (armhf) now and it works perfectly there with the desktop and all applications, too, and on the Nexus 7 XBMC and Chromium's web apps work with touch. It also seems to fix the Nexus 7.
ok, I'll be honest. this is a giant mess where we potentially access dangling pointers and sorting this out is nasty. my attempts to do so today have failed badly. fix will come, but not too soon I'm afraid
Yes, I can see how time consuming this must be. Thanks for continuing to work on it, at OLPC we can promise you some testing once code is ready. In the mean time I will add the latest 2 patches to our development builds for further testing: Xi: Do not handle ET_TouchOwnership in ProcessTouchEvent dix: copy event in TouchConvertToPointerEvent correctly
Please have a test of this branch here: http://cgit.freedesktop.org/~whot/xserver/log/?h=touch-grab-race-condition-56578-v2 I'm not 100% sure yet if there's a memleak introduced - haven't done the required checks yet. but it fixes the crasher caused by the invalid memory dereference.
Peter, I have tested your new branch on the Lenovo Thinkpad Twist now. I do not get any crashes and left clicking by tapping is absolutely reliable for me. Right-clicking via onboard does not work for me though. If I activate the right-click mode and tap, the tap is interpreted as left click (right-click mode ignored). At least I do not get a stuck-left-button effect by the right click. I do not get any crash nore a stuck-button effect at all, independent what I am doing. What is missing now is a fix for the right click.
Thanks for continuing to work on this. I believe the touch-grab-race-condition-56578-v2 patch series so far creates a problem with mouse input. In Sugar's Paint application, I can't paint anything by moving the mouse around with the button held down. Running xev, I can see that clicking and holding the mouse button doesn't actually trigger any events. Only when I release, ButtonPress and ButtonRelease appear in quick succession. If nobody beats me to it, I'll bisect this later this week. Also, the above test was done on xserver-1.13.3, I should also test on a newer version to make sure there aren't any other factors at play.
Daniel, Peter, I am using the the full GIT branch touch-grab-race-condition-56578-v2 which is 1.14 and here I have no problem with Sugar's Paint application (rgbPaint, am I right?). I can paint both with an external Bluetooth mouse with the left button held down and with my finger on the touch screen of the Lenovo Thinkpad Twist.
Thanks for testing. Sugar's paint app is http://activities.sugarlabs.org/en-US/sugar/addon/4082 It is probably more meaningful to do the xev test though. Click the mouse button and hold, you would expect a ButtonPress event to show immediately, but it doesn't. And do that under sugar, in case the global touch grabs are affecting things.
Daniel, on my 1.14 I do not see any problem, also when testing with xev. Both with the external mouse and my finger on the touch screen I see ButtonPress events when I press and hold the mouse button or when I put my finger onto the screen and I get ButtonRelease events when I release the mouse button or take my finger from the screen. This works all correctly for me.
Till, can you run this under valgrind please to make sure I didn't introduce any memory leaks?
Peter, how do I run the xorg server under Valgrind? I have a Ubuntu Raring system.
Another touch problem: If I run Chromium browser and try to drag and drop one of the tabs using the touch screen, the left button gets stuck down and it does not get even unstuck if I continue working with the external Bluetooth mouse. I can only kill the session. It also happens sometimes that X crashes but without any message in /var/log/syslog.
Created attachment 78472 [details] /etc/X11/X-valgrind For valgrinding xserver you want to install the xserver-xorg-core-dbg package from the binary you generated, and also install xserver-xorg-input*dbg and xserver-xorg-video*dbg and valgrind I enabled auto valgrinding by creating /etc/X11/X-valgrind with the contents of this adjustment, make the file executable and then point the /etc/X11/X symlink to it. It will append the log to /var/log/Xorg-valgrind.HOSTNAME, so if xserver crashes you'll get detailed information why. :-)
Also with 1.14 XBMC behaves as in comment #39, not reacting to touch clicks. Looking more deeply into XBMC's behavior, the mouse cursor is put into the lower right corner of the screen when touch-clicking an arbitrary place, perhaps all touch clicks are registered with the coordinates of the lower right corner.
I have set up running X under Valgrind now. I have installed xserver-xorg-core-dbg valgrind xserver-xorg-video-intel-dbg xserver-xorg-video-modesetting-dbg xserver-xorg-input-evdev-dbg xserver-xorg-input-synaptics-dbg libdrm2-dbg libdrm-intel1-dbg ThenI have installed Maarten's script, made it executable, and linked it. After that I have restarted X via sudo restart lightdm X is mnuch slower now, probably due to Valgrind's work.
First observation under Valgrind: onboard pops up when touch-clicking an input field, but onboard is non-functional. Independent whether I touch-click the keys or use my external mouse, the keys do not react. No changes of the key's color, no character appearing in the input field. Also right-clicking does not work as one cannot operate the right-click button.
Created attachment 78475 [details] Xorg-valgrind.till-twist My Valgrind log as of now.
Installed libunwind8-dbg to improve Valgrind log, then restarted lightdm, logged in, and now onboard works.
Created attachment 78476 [details] Xorg-valgrind.till-twist Update of Valgrind log.
I have more experience with the onboard-aided right click (same running under Val;grind or without Valgrind): Touch-clicking the right-click key on onboard makes it turning grey. After that doing one touch click on the desktop background does nothing. A second touch click on the background makes the right-click menu open and onboard disappear. Right-clicking in Chromium does not work. The second click only makes onboard disappear but does not pop up the right-click menu of Chromium.
Same with the double-click emulation button of onboard: It also executes the double-click only on the second touch click (tested with Nautilus).
Created attachment 78483 [details] Xorg-valgrind.till-twist.gz Finally I succeeded to make X crashing again, I opened several programs (Firefox, Chromium, Thunderbird, Calculator, digikam), did some clicks in them, and closed them again. Then I opened LibreOffice Writer via the Launcher and got a window asking to recover a previous document which was not correctly closed. I rejected and when I answered the question whether I really want to reject with "Yes", X crashed. Valgrind log attached.
Created attachment 78484 [details] Xorg-valgrind.till-twist.gz With LibreOffice Writer I can reproduce the crash reliably. Right after login I touch-click its icon in the Launcher, get the dialog to recover the document of the previous session, I reject, and as soon as I click "Yes" to confirm, X crashes, and X crashes fast enough so that LibreOffice does not clean up the document which I have rejected. In the next session I will get asked again. If you cannot reproduce the crash as you do not have a broken document, try starting a new document and then "kill -9" LibreOffice. On the next session it should ask you for recovering your document.
Note: In the last two comments (and also in my other tests), I did all operations by touch clicking (if not otherwise stated).
Created attachment 78485 [details] Xorg-valgrind.till-twist.gz X crashes as well if I do the described steps with LibreOffice using my external Bluetooth mouse for all clicks and not the touch screen. Valgrind log attached.
(In reply to comment #47) > I believe the touch-grab-race-condition-56578-v2 patch series so far creates > a problem with mouse input. In Sugar's Paint application, I can't paint > anything by moving the mouse around with the button held down. > > Running xev, I can see that clicking and holding the mouse button doesn't > actually trigger any events. Only when I release, ButtonPress and > ButtonRelease appear in quick succession. I have reproduced this by checking out the git branch in question and building it directly, so it was not a side effect of my earlier attempt (above) where I had backported this to 1.13.3. The problem can be reproduced very easily: xinit /usr/bin/xev (running over ssh from another machine, to be able to see stdout) Move the mouse cursor to the top left (where the xev window is). Click and hold the mouse button, and keep holding. No output from xev. Now release the mouse button, ButtonPress and ButtonRelease arrive at the same time. No touch input is needed to see this problem. A few churns of "git bisect" later I have tracked this down to: 3e1515898545b0ed9e1f0794800c07061c8c8039 is the first bad commit commit 3e1515898545b0ed9e1f0794800c07061c8c8039 Author: Peter Hutterer <peter.hutterer@who-t.net> Date: Thu Apr 18 10:32:11 2013 +1000 dix: drop DeviceIntRec's activeGrab struct
Created attachment 78643 [details] Xorg-valgrind.till-twist.gz Another crash, this time I was visiting http://www.tagesspiegel.de/ with the Chrome browser. As usual, Valgrind log attached.
Created attachment 78644 [details] Xorg-valgrind.till-twist.gz Another crash: Still visiting http://m.tagesspiegel.org/, watching one of the videos, tried to maximize the Chrome window -> X crashed. Valgrind log attached again.
the libreoffice hint helped a lot tracking this down. New branch posted (top commit b8a2de82e36dd922843618f15703113dd556b164 dix: fix cursor refcounting ). Please give this a test. looks like my test box here is happy and valgrind doesn't see any leaks (yet)
Created attachment 78801 [details] nexus valgrind log for latest attempt Still a bit buggy. On the nexus7 I can cause it to drop events in the same way still.. What I do is touch the ubuntu dash icon in upper left, then release finger and make a dragging motion with the dash icon. I'm not 100% sure if the touch was fully released, or it just stopped registering my finger. But this (still) results in the following spam from xserver: [Xi] Virtual core pointer: Failed to get event 8 for touchpoint 1 [Xi] Virtual core pointer: Failed to get event 8 for touchpoint 1 [Xi] Virtual core pointer: Failed to get event 8 for touchpoint 2 source device 7: history size 100 overflowing for touch 12 (history size overflowing repeated a lot, for touch 12 and 13) Stopping lightdm doesn't crash any more and shows no leak. Only thing that may or may not be relevant is a still reachable warning: ==3663== 16,384 bytes in 4 blocks are still reachable in loss record 245 of 246 ==3663== at 0x482D4B8: calloc (vg_replace_malloc.c:593) ==3663== by 0x216F23: WriteToClient (io.c:1017) ==3663== by 0x142667: WriteEventsToClient (events.c:5982) ==3663== by 0x142747: TryClientEvents (events.c:1968) ==3663== by 0x144905: DeliverEventToInputClients (events.c:2116) ==3663== by 0x144A99: DeliverEventsToWindow (events.c:2151) ==3663== by 0x144D51: ProcSendEvent (events.c:5411) ==3663== by 0x13B9D5: Dispatch (dispatch.c:432) ==3663== by 0x130D2F: main (main.c:295) Full log for the session is attached as vg.nexus
Peter, I have tried your new snapshot (comment #70) and so far I did not get crashes. Touch operation without right-clicking works well for me now. The right-click emulation via Onboard is still broken, though.
(In reply to comment #70) > the libreoffice hint helped a lot tracking this down. New branch posted (top > commit b8a2de82e36dd922843618f15703113dd556b164 dix: fix cursor refcounting > ). Please give this a test. looks like my test box here is happy and > valgrind doesn't see any leaks (yet) I would like OLPC to help with this testing, but the xev problem in comment #67 is getting in our way. Have you had a chance to investigate this yet?
daniel - xev behaves normally for me in the last branch. is it still misbehaving for you?
Yep, reproduced with HEAD b8a2de82e3, bisection identifies the first bad commit as 3e15158985.
tried to bisect this, but I can't see any difference in the xev output before or after that commit. Tested several revisions after (and 3e15158985) and xev works as expected. fwiw, my test box here is Ubuntu 12.10 with the server branch above, rest as-is. mouse used is a trackpoint, which for all purposes looks like a mouse. test case was xinit /usr/bin/xev -- /opt/xorg/bin/Xorg -retro, then clicking+dragging into the xev window. events as expected.
> ==3663== 16,384 bytes in 4 blocks are still reachable in loss record 245 of > 246 > ==3663== at 0x482D4B8: calloc (vg_replace_malloc.c:593) > ==3663== by 0x216F23: WriteToClient (io.c:1017) > ==3663== by 0x142667: WriteEventsToClient (events.c:5982) > ==3663== by 0x142747: TryClientEvents (events.c:1968) > ==3663== by 0x144905: DeliverEventToInputClients (events.c:2116) > ==3663== by 0x144A99: DeliverEventsToWindow (events.c:2151) > ==3663== by 0x144D51: ProcSendEvent (events.c:5411) > ==3663== by 0x13B9D5: Dispatch (dispatch.c:432) > ==3663== by 0x130D2F: main (main.c:295) This appears to be present in 1.14.0, not introduced by this series. I raise a white flag on the other issue though, like the bug Daniel sees I cannot reproduce it here.
(In reply to comment #76) > tried to bisect this, but I can't see any difference in the xev output > before or after that commit. Tested several revisions after (and 3e15158985) > and xev works as expected. Thanks for testing - I have now looked closer. The patch removes a field from struct _GrabInfoRec. That is an ABI change, what does it affect? It does seem to break stuff outside of the xserver according to my initial test. If I readd the field, even though it is now unused, xev works again.
oh, right. sorry, I forgot to mention this - it is indeed a ABI break so you have to recompile the drivers (or add the now-unused field back in). Maarten, this could also be the reason for your bug?
Pushed the branch with a fix to keep the ABI, please test de12ce91d8e44ab9398e730b457e5abc8d1acbe6
I built it, and changing between dash and indicators soon hangs with this on the log: [ 3110.957] (EE) BUG: triggered 'if (!pGrab)' [ 3110.957] (EE) BUG: ../../dix/grabs.c:258 in FreeGrab() [ 3110.957] (EE) [ 3110.957] (EE) Backtrace: [ 3110.957] (EE) gdb doesn't give anything, just the usual WaitForSomething etc
(In reply to comment #80) > Pushed the branch with a fix to keep the ABI, please test > de12ce91d8e44ab9398e730b457e5abc8d1acbe6 Built this and can't see any problems after a quick test. I'll ship this in upcoming OLPC development builds for wider testing.
I have a lenovo S10-3t with full keyboard, synaptics touchpad and cando 2 touch screen that I'd like to try this on. I have ubuntu 13.04 on it. What are the git commands to access de12ce91d8e44ab9398e730b457e5abc8d1acbe6 and does it just replace the xserver-xorg or do I have to rebuild the other xorg parts too?
Sorry, I found the files on the pages referenced above, so don't need any reply.
Is make check failing for anyone else with v3? (EE) test device: not enough space for touch events (max 5 touchpoints). Dropping this event. (EE) test device: not enough space for touch events (max 5 touchpoints). Dropping this event. (EE) test device: not enough space for touch events (max 5 touchpoints). Dropping this event. /bin/bash: line 5: 26164 Segmentation fault MALLOC_PERTURB_=15 ${dir}$tst FAIL: touch Program received signal SIGSEGV, Segmentation fault. TouchInitTouchPoint (t=t@entry=0x4196e950, v=0x0, index=index@entry=0) at ../../dix/touch.c:243 243 ti->valuators = valuator_mask_new(v->numAxes);
I still need help compiling the test branch of xserver on ubuntu 13.04. If I try to compile it, detailed here http://www.x.org/wiki/CompileXserverManually it fails with complaints of wrong versions of x11proto. But, I have verified that the correct packages are actually installed on my system. So, I tried using jhbuild which builds everything in your home directory, details here http://www.x.org/wiki/JhBuildInstructions But, when I launch the jhbuild version, it crashes because it doesn't include my synaptics touchpad or cando touch screen. It won't run without input devices. So, can you provide some insight as to how I can build and test this xserver? Since some of you are using ubuntu, perhaps more specific instructions would work for me. TIA
(In reply to comment #85) > Is make check failing for anyone else with v3? caused by a patch merged into master (and thus picked up on v3), fix is here: http://patchwork.freedesktop.org/patch/13687/
paul: add-apt-repository ppa:canonical-x/x-staging apt-get update apt-get dist-upgrade apt-get build-dep xorg-server will get you 1.14 + necessary build dependencies. Copy the debian directory from xserver 1.14, and comment out each patch that fails to apply in debian/patches/series
(In reply to comment #88) Hello. I would like to provide testing for this bug if possible but I'm not exactly clued up on compiling xorg-server from scratch. I figure it could be useful to have a none-standard (ie not a laptop or tablet device) low-end hardware test case but if it's unlikely to be useful then please let me know. One thing I've noticed is that once this bug has triggered (rendering most GTK applications and the unity dash unusable), Nautilus continues to function normally with the touch screen. Can anyone else confirm this on a standard Ubuntu 13.04 installation? Anyway, I can see the branch you're talking about and can clone the git repository no problem. > Copy the debian directory from xserver 1.14, and comment out each patch that fails to apply in debian/patches/series I'm not certain which directory / patches you're referring to here, could you point me in the right direction? I can duplicate this bug every time with a custom application which uses a GtkToolPalette. It appears to trigger every time I tap a category which produces a smooth roll-out animation - the hardware is pretty low end so I suppose this additional load triggers a race condition? I can trigger the bug in other normal uses but this one is guaranteed every time.
grab http://people.canonical.com/~mlankhorst/xorg-server_1.14.1.orig.tar.gz and xorg-server_1.14.1-0ubuntu0.3+1.15rc1+touch.diff.gz
(In reply to comment #90) Thank you, Maarten. I can patch and compile that copy but for some reason I'm getting a compilation error with the de12ce91d8e44ab9398e730b457e5abc8d1acbe6 branch in /dix/window.c line 421-425: > REGION_INIT(pScreen, &pWin->clipList, &box, 1); > REGION_INIT(pScreen, &pWin->winSize, &box, 1); > REGION_INIT(pScreen, &pWin->borderSize, &box, 1); > REGION_INIT(pScreen, &pWin->borderClip, &box, 1); > window.c:421:5: error: the comparison will always evaluate as ‘true’ for the > address of ‘box’ will never be NULL [-Werror=address] Any ideas?
sorry guys, please take the compilation errors to the list. This bug is confusing enough with >90 comments and I'd like to keep off-topic stuff to a minimum. pushed a new version of the branch after fixing a cursor refcounting issue that crashed my server when dragging and email in thunderbird. new branch tip is 9a5ad65330693b3273972b63d10f2907d9ab954a. This one also includes the fix Daniel wrote originally to avoid stuck buttons (http://lists.x.org/archives/xorg-devel/2013-April/035878.html)
That fixed up the background corruptions and hangs on armhf/nexus 7, but I'm still seeing a stuck mouse button, and [ 77305.765] [Xi] Virtual core pointer: Failed to get event 8 for touchpoint 1.
nm, bg is still corrupt when running in valgrind :(
fwiw, the latest branch got merged into master. It's still buggy but an improvement over the previous state. commit c76a1b343d6a56aa9529e87f0eda8d61355d562b Merge: 891123c 9a5ad65 Author: Keith Packard <keithp@keithp.com> Date: Thu May 23 19:58:36 2013 -0600 Merge remote-tracking branch 'whot/touch-grab-race-condition-56578-v3'
Thanks for all your work on this. At OLPC we've been testing the branch but have been a couple of commits behind the tip. Anyway, I think its still worth contributing the test result: no problems seen.
I would like to be any help I can with this bug fix. I am able to test on an 18.5" Winmate M185D as well as a 10" Winmate device (W10ID3S-PCH1). I am currently running Unity 13.04 and can make any necessary changes to the system. Please let me know what I can do to test and how to do it. I feel a bit over my head, but am willing to learn in order to be helpful.
Wondering if anything has been happening in a while...
Peter fixed a load of stuff and it got merged in xserver master. Unfortunately there have not been any development releases of xserver master since that happened, but that will come in time. If you are still seeing problems, and are definitely using xserver master, then I suggest explaining your problem here (if you are sure that you are seeing the same issue), or opening a new bug report (if it seems like your issue might be unrelated).
The fix for bug #66720 looks relevant, commit 8eeaa74bc241acb41f1d upstream, it seems something broke for me though, so I can't test it right now.
Nope, and I noticed a BUG on !pGrab in FreeGrab, I'll try it a bit more on monday.
(In reply to comment #99) > Peter fixed a load of stuff and it got merged in xserver master. > Unfortunately there have not been any development releases of xserver master > since that happened, but that will come in time. > > If you are still seeing problems, and are definitely using xserver master, > then I suggest explaining your problem here (if you are sure that you are > seeing the same issue), or opening a new bug report (if it seems like your > issue might be unrelated). Thanks for the sumary! I was just wondering. Thanks!
(In reply to comment #101) > Nope, and I noticed a BUG on !pGrab in FreeGrab, I'll try it a bit more on > monday. merged as 0e3be0b25fcfeff386bad132526352c2e45f1932 yesterday. as for the rest, I really need something that's reproducible.
I think the changes to onboard to use xinput2 directly may have fixed the remaining issue I was having. When I checked out onboard from trunk and used it on my nexus7 things worked, and nothing got stuck.
Maarten, I have checked with the new onboard ("bzr branch lp:onboard") now on an up-to-date Saucy with xserver packages from the x-staging PPA and I do not get a stuck-mouse-button effect any more.
I did further testing over longer time and no stuck button. Its seems that with the current X from the x-staging PPA and the current Onboard from the onboard PPA the problem is solved.
Thanks for testing. I'm going to close this one as fixed since we definitely fixed quite a few bugs in this patch set. If there's something left please file a new bug so we can narrow down the new (old? :) issues.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.