Summary: | Mouse buttons sometimes stop responding when moving from one xinerama screen to another | ||||||
---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Henk Spaaij <zaai> | ||||
Component: | Server/General | Assignee: | Peter Hutterer <peter.hutterer> | ||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||
Severity: | critical | ||||||
Priority: | highest | CC: | davey, jebusthesaviour, remi, thomas.jarosch, tobias | ||||
Version: | 7.4 (2008.09) | ||||||
Hardware: | Other | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Description
Henk Spaaij
2008-11-22 18:50:15 UTC
On Sat, Nov 22, 2008 at 06:50:16PM -0800, bugzilla-daemon@freedesktop.org wrote: > Anywhere between a few minutes and a few hours, the mouse buttons stop > responding. The mouse itself keeps moving and remains visible. > The trigger is when moving the mouse between two xinerama screens. just a guess, could it be that the button events are delivered to the wrong screen? If you have a full-screen xev on the other screen when it happens, does it show the button events. Good suggestion, but thats not it unfortunately. Running xev in both screens. Neither reports any events when moving or clicking the mouse buttons onto the xev area. A detailed description of this bug can be found here: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-177/+bug/296167 Don't get tricked by the title in the URL, it also happens with the included Matrox drivers and ATI drivers according to the report. I'm also seeing this on Fedora 9, the productive workstations that is hit worst uses three Xinerama screens. Bug #10797 seems to be the same thing. As a temporary workaround I'll try to switch from Xinerama to TwinView on my coworker's boxes. Any way I can help to track this down? Detailed steps how to reproduced this is in the ubuntu bug report. Here are some more reports of the same problem, I guess: http://www.nabble.com/Fedora-9-minor-issue--4-td17317014.html https://bugzilla.redhat.com/show_bug.cgi?id=475945 https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/41301 The problem is reproducible here within 2-5 minutes by opening many new windows on different screens. ALT+TAB, switching to the text console or the keyboard mouse don't help. Running xev shows no events from the mouse, not even the keyboard mouse. Is there a way to print mouse events on a X server-wide scope? My coworker and I were able to reproduce it with just X, xterm and a little script: --------------------------------------- #!/bin/sh export DISPLAY=:0.0 for COUNT in `seq 1 50`; do XPOS=$[ ( $RANDOM % 4600 ) ] YPOS=$[ ( $RANDOM % 1000 ) ] xterm -geometry 80x25+$XPOS+$YPOS & done --------------------------------------- The trick is to move the mouse fast between the screen/GPU borders while the windows are being opened. We got the root window id via "xdpyinfio" and then started xev -id ROOT-ID. The mouse events all go to the root window instead of the applications when the bug occurs. Huh? Nice work Thomas. Peter, its not related to evdev as it also happens when evdev is not used (on my box). The script doesn't reproduce the problem for me, but then again, after switching from kde4 to xfce4 a few weeks ago the problem has disappeared. Maybe the slow nvidia performance on kde4 makes it worse. Missing events or something along those lines. After switching to xfce4 I'm running the same xorg, kmail and other software as before, but things are snappy and fast, and the problem no longer appears. I _think_ the cause is a race condition in WindowsRestructured() which calls CheckMotion(). CheckMotion in turn sets the wrong screen, if the mouse already moved into a position on the other screen. Not sure yet though, but that's a path to look at if you have time. some further information for yall: setup: 2 x nvidia 8600gts vid cards 3 x screens, all separate X screens, merged with xinerama Arranged as GPU0S0 - GPU1S0 - GPU1S1 ubuntu 8.10, gnome, metacity I've setup a xev process in the same location on each screen (so 3 xev instances), when i trigger the bug using the GPU0S0/GPU1S0 boundary moving the mouse in the xev session on screen GPU0S0(left) triggers events in the xev process running on screen GPU1S0(middle). moving the mouse within xev processes on the other 2 screens (middle and right) causes no events to be registered anywhere. The frequency of the bug's occurrence is definitely tied to system / graphical load. With nvidia cards a few firefox windows (or anything with heavy xrender use) and/or a few open windowed opengl apps and the bug happens about 50% of the time when changing screens and happens much more frequently when moving over the boundary between GPU's than it does moving over the boundary between x sessions that are running on the same GPU. To "reset" the bug without rebooting the x server you can do the following: alt-tab to a window (firefox is the one i usually use) alt-space to bring up the window options menu (metacity) select move (this binds the cursor to the center of the window) use the arrow keys to move the window to another X session window move the mouse the bug should now be gone i'm happy to run some other tests if theres anything that would be useful cheers, x My last comment was a wrong trail, I had a debugging session on that today and
it has nothing to do with WindowsRestructured. ATM, it looks like a scaling
issue, with the reporter in the Red Hat bugzilla saying it happens more often
on fast transitions than on slow ones.
Can you verify this?
Also, he has a triple monitor setup with two cards, and cannot reproduce it
by switching between the first two monitors (the ones on the same card).
> I've setup a xev process in the same location on each screen (so 3 xev
> instances), when i trigger the bug using the GPU0S0/GPU1S0 boundary moving the
> mouse in the xev session on screen GPU0S0(left) triggers events in the xev
> process running on screen GPU1S0(middle). moving the mouse within xev
> processes on the other 2 screens (middle and right) causes no events to be
> registered anywhere.
these events are probably sent to the root window of one of the screens. Try
to get the root window id with xdpyinfo | grep root and then run xev -id <root
window id>
*** Bug 18127 has been marked as a duplicate of this bug. *** (In reply to comment #10) > these events are probably sent to the root window of one of the screens. Try > to get the root window id with xdpyinfo | grep root and then run xev -id <root > window id> > I can confirm that xev shows then the mouse again. Even the button events are displayed in konsole. :) Thanks! Tobi In response to comment 10, I have a three head machine (2 on an mga dual head card and 1 on a usbvga adaptor using the sis chipset) and can confirm that the problem does occur when switching between the two mga heads. I have not been able to confirm it, but a co-worker with a similar setup says he had the problem happen once without actually switching between heads. We both have the symptom occur several times a day, but so far the quick fix of moving a window from head to head fixes the issue. The problem does seem to be limited to xinerama as I have another setup with an ATI card using randr in xorg.conf but otherwise identical software (all machines mentioned are running from an LTSP server with the same image, just different xorg.confs) that does not have the problem. We may have narrowed it down to .... wait for it.... Animated Cursors! Looks like if an animated cursor is active and the pointer crosses the screen, at least one of the calls to update the cursor image has the wrong screen information. This actually results in the cursor jumping back to the original screen and then immediately back again (only visible if you happen to have a breakpoint at the right posision). Once that happens, the screen info is out of sync, with the pointer image having a different target screen than the event delivery. To verify this, I'd need you to install a cursor theme that does not use animated cursors (don't ask me which one or how to install it...) and try to reproduce the bug. (In reply to comment #14) > We may have narrowed it down to .... wait for it.... Animated Cursors! > > Looks like if an animated cursor is active and the pointer crosses the screen, > at least one of the calls to update the cursor image has the wrong screen > information. This actually results in the cursor jumping back to the original > screen and then immediately back again (only visible if you happen to have a > breakpoint at the right posision). > Once that happens, the screen info is out of sync, with the pointer image > having a different target screen than the event delivery. > > To verify this, I'd need you to install a cursor theme that does not use > animated cursors (don't ask me which one or how to install it...) and try to > reproduce the bug. > I can see this as the bug does relate to graphics load for me...ie if i move my cursor away from an app with an animated cursor the likely hood of it not changing before reaching the X session border would increase (i would think) thus causing the bug to appear more often under load. If this is true i would imagine that the bug appears much more for those that keep apps full screened on each session (as i do with VM's at times, and do see an increase in frequency). I'll test this by keeping windows away from the boarders, thus giving the cursor more time to move to a non animated state before crossing the boarder. cheers, x (In reply to comment #14) > We may have narrowed it down to .... wait for it.... Animated Cursors! > > Looks like if an animated cursor is active and the pointer crosses the screen, > at least one of the calls to update the cursor image has the wrong screen > information. This actually results in the cursor jumping back to the original > screen and then immediately back again (only visible if you happen to have a > breakpoint at the right posision). > Once that happens, the screen info is out of sync, with the pointer image > having a different target screen than the event delivery. Thanks for looking into this Peter, it's really appreciated. I'm not 100% sure if it's related to animated cursors as I found a new way to trigger the bug: There are some posts in various bug trackers that mention if you assign a short-cut to your window manager to move a window via keyboard to another screen, the problem can be worked around. For me, the opposite happens: I have a working mouse and press the keyboard short-cut to move the window from one screen to another. As soon as the mouse cursor in the middle of the moving window crosses the screen border, the window stops moving -> the keyboard events go somewhere else, too. Also the short-cut for moving the window stops working. Luckily I can still press CTRL+ALT+DEL for properly shutting down my KDE session :-) While moving the window, the mouse cursor changes it's shape to a cross. Does that somehow also qualify as animated cursor/execute the same code path? btw: A coworker just phoned me to report that the problem happened to him while he was working on one screen only. Huh? (In reply to comment #16) > > I'm not 100% sure if it's related to animated cursors as I found a new way to > trigger the bug: There are some posts in various bug trackers that mention if > you assign a short-cut to your window manager to move a window via keyboard to > another screen, the problem can be worked around. > > For me, the opposite happens: I have a working mouse and press the keyboard > short-cut to move the window from one screen to another. As soon as the mouse > cursor in the middle of the moving window crosses the screen border, the window > stops moving -> the keyboard events go somewhere else, too. Also the short-cut > for moving the window stops working. Luckily I can still press CTRL+ALT+DEL for > properly shutting down my KDE session :-) > > While moving the window, the mouse cursor changes it's shape to a cross. > Does that somehow also qualify as animated cursor/execute the same code path? > > btw: A coworker just phoned me to report that the problem happened to him while > he was working on one screen only. Huh? > interesting tried this and confirmed basically the same thing, test cases: select window on left screen (GPU0,S0), use move window command, use keyboard to move the window to the middle or right screen (GPU1,S0 or S1) this triggers the bug, mouse no longer registers events properly. Moving the window (or any window) back to the left screen GPU0,S0 fixes the bug. Also, ONLY moving something onto the left GPU0.S0 screen will fix the bug, moving something onto either of the other 2 screens will not fix the bug. Each of my screens is a separate X session. I don't get the cross icon, on the left and middle screens i get the normal gnome/metacity closed hand icon (that icon does have an open and close hand position it other instances, so it may be using the animated cursor code path). Moving the window to the right screen gives the standard system arrow cursor. Interested it knowing if the left screen is special because its on a separate GPU or because it is Xsession / Xscreen 0 cheers, x Hi, Just to clarify, this issue popped up in Ubuntu 8.10 and is affecting people with multi-screen systems (typically 3+) with Xinerama switched on. It's been ongoing since October 2008. Bottom line is that the people affected are "power users" so this bug is hitting pro-Linux people and developers VERY hard .. without pointing any fingers, nobody is coming up with a fix. As this bug was "introduced" into what was commonly thought to be a stable system (and hence relied upon for critical business systems) and given the scale of the problem, it would be rather nice is *someone* backed out of the problem. I've heard reports for example from Ubuntu users that selectively downgrading X.org components to 8.04 solves the problem for them. Please please please can someone either; a. Produce a fix (ok, this may not be possible ..) b. Backout of whatever changes caused the issue c. Issue a recommendation to packagers that they backout / downgrade to a known working version For my part I'm about to go downgrade my X components after the 10th lockup today. (In reply to comment #14) > We may have narrowed it down to .... wait for it.... Animated Cursors! Happy new year everyone! I really appreciated the work on this bug and wanted to ask if there is any short news concerning the issue? waring: out of context. i'm purchase this: http://www.newegg.com/Product/Product.aspx?Item=N82E16815106011 3d frame rates have jump ~25% in all apps i can use compiz (note above 25% jump is with compiz enabled) it may cost a bit more but it destroys xinerama performance in ever way. cheers, x launchpad thread has directions for workaround. and easy reproduction for debugging. It is very sad a bug as severe as this can lie around assigned for such a long time. not even an assignee to take point on the issue. https://bugs.launchpad.net/ubuntu/+bug/296167 Don't know if it helps, but I've set severity to critical. Its good to see a second workaround, for those for whom it might work. Since switching to XFCE the frequence has lessened for me but once every two days it still happens. Unfortunately, lately the keyboard also seems to become unresponsive. -Any- news, please? Here's a braindump of what I know so far. I have not been able to reproduce the bug yet and since everything so far indicates a race condition, this is not an easy one to track down. This information is from a (very) remote ssh debugging session. The server has two screen pointers around the Event Queue (EQ). One is miEventQueue.pEnqueueScreen, the other miEventQueue.pDequeueScreen. Enqueue is used during signal handling to shove new events into the EQ, Dequeue during event processing to take them out and process them further. Both are modified through mieqSwitchScreen(), with Dequeue being conditional on a parameter. Other interesting variables are miPointer.pScreen (the screen of the rendered sprite) and sprite.screen (the screen as seen during event processing). The usual order of updates to these four variables is: pEnqueueScreen -> miPointer.pScreen -> miEventQueue.pDequeueScreen -> sprite.screen When the screwup happens, I noticed the order isn't 1,2,3,4 as above, but instead 1,2,1,2,3,4. From then on, the first two always have different values than the other two. The question is how the screens get out of sync. I looked at the code and I can't explain it. One remote guess is XineramaCheckMotion(), where the root x/y coordinates are used. I think by then they should be in per-screen coordinates already, so that would give us the wrong screen, possibly triggering that. Although this should happen all the time, not just sometimes. Without being able to run gdb on a busted server, I can't really say more. Peter, thanks for picking this up. Two questions come to mind: 1. what is the trigger for the deQueue to happen? 2. what happens if an enqueue interrupts a dequeue? The event order suggests that the dequeue never happens, or does not complete, or the next enqueue doesn't wait for an associated dequeue to complete. This type of problem looks like a missing semaphore or something like it. I don't know the code, but maybe asking some questions might help find a lead. > 1. what is the trigger for the deQueue to happen? dequeuing events is part of the main loop. > 2. what happens if an enqueue interrupts a dequeue? nothing, in theory those two should be independent. > The event order suggests that the dequeue never happens, or does not complete, > or the next enqueue doesn't wait for an associated dequeue to complete. This > type of problem looks like a missing semaphore or something like it. Reports that the bug occurs more frequently under load indicate that too. The one lead I have is that mi/mipointer.c stuff is being called both during SIGIO handling and during event processing, but I haven't found any significant overlap yet. Thanks for the update Peter, it's highly appreciated! > Here's a braindump of what I know so far. > I have not been able to reproduce the bug yet and since everything so far > indicates a race condition, this is not an easy one to track down. This > information is from a (very) remote ssh debugging session. I'm still able to reproduce it 100% of the time by using a keyboard shortcut in KDE to move a window from one screen to another. As soon as the mouse cursor in the middle of the moving window crosses the screen border, the window stops moving -> the keyboard events go somewhere else, too. Would it help if I set you up a box with two graphic cards and full root access to reproduce the thing? We have an IP ready KVM switch so you could access it from the outside like a normal user. Though reconfiguring all this stuff and putting it in a DMZ will require some time... > Reports that the bug occurs more frequently under load indicate that too. > The one lead I have is that mi/mipointer.c stuff is being called both during > SIGIO handling and during event processing, but I haven't found any > significant overlap yet. Hmm, also smells like an -EAGAIN error code issue. Suppose some code is in the middle of a read/write and a signal fires, you're read/write call will be interrupted with -EAGAIN. Even without a signal firing, if a box is under load the kernel can issue -EAGAIN for IO operations f.e. on sockets. Created attachment 22373 [details] [review] 0001-mi-don-t-call-UpdateSpriteForScreen-if-we-have-Xine.patch This patch seems to fix the issue. mi: don't call UpdateSpriteForScreen if we have Xinerama enabled. #18668 In Xinerama all windows hang off the first root window. Crossing the screens must not reset the spriteTrace, otherwise picking fails and events are sent to the root window. I would be happy to test the patch. if you could explain how I go about it. Fedora 10 users can get a scratch build, see https://bugzilla.redhat.com/show_bug.cgi?id=473825#c35 Otherwise, you need to get the server sources from your respective distribution and apply the patch (or get your distribution to do it for you). Then build and install the patched server. Alternatively you can check out the git repository, more on http://wiki.x.org/wiki/JhBuildInstructions (In reply to comment #28) > Created an attachment (id=22373) [details] > 0001-mi-don-t-call-UpdateSpriteForScreen-if-we-have-Xine.patch > > This patch seems to fix the issue. Peter, you're my hero! Patch is working fine on Fedora 9. Tested on my box and a triple head one. Congratulations also from my co-workers. Have a nice weekend! Thank you Peter this is a great find. I've reported your patch on the Gentoo Bug forum as well. (https://bugs.gentoo.org/show_bug.cgi?id=243496) As soon as I can confirm this works I'll close this bug. Thanks on behalf of the Gentoo users :) Patch sent to list for final review. http://lists.freedesktop.org/archives/xorg/2009-February/043171.html Pushed as 9fe9b6e4ef669b192ee349e3290db5d2aeea273c and nominated for 1.6. Thanks for testing. Thanks for fixing it Peter I tested the fix against Ubuntu Intrepid and it works great. System stable, thanks so much Peter! *** Bug 20389 has been marked as a duplicate of this bug. *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.