Bug 18668 - Mouse buttons sometimes stop responding when moving from one xinerama screen to another
Summary: Mouse buttons sometimes stop responding when moving from one xinerama screen ...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: 7.4 (2008.09)
Hardware: Other All
: highest critical
Assignee: Peter Hutterer
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 18127 20389 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-11-22 18:50 UTC by Henk Spaaij
Modified: 2010-03-30 08:49 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
0001-mi-don-t-call-UpdateSpriteForScreen-if-we-have-Xine.patch (1.11 KB, patch)
2009-01-29 17:39 UTC, Peter Hutterer
no flags Details | Splinter Review

Description Henk Spaaij 2008-11-22 18:50:15 UTC
Anywhere between a few minutes and a few hours, the mouse buttons stop responding. The mouse itself keeps moving and remains visible.

The trigger is when moving the mouse between two xinerama screens.
No messages about this are logged in Xorg.0.log, messages, or any other log file.
In this state xev does not see any mouse events.

When X is restarted the mouse buttons are recognized again.
Another recovery is running a synergy server locally and connecting to it with the client. Note that this only works when running synergyc from the konsole in one of the xinerama screens. Doing the exact same thing in the other screen has no effect.


OS: Gentoo
xorg-7.4; xorg-server-1.5.2

The bug was first reported by Eric Stein. See also: https://bugs.gentoo.org/show_bug.cgi?id=243496 for more details.
Comment 1 Peter Hutterer 2008-11-25 20:24:31 UTC
On Sat, Nov 22, 2008 at 06:50:16PM -0800, bugzilla-daemon@freedesktop.org wrote:
> Anywhere between a few minutes and a few hours, the mouse buttons stop
> responding. The mouse itself keeps moving and remains visible.

> The trigger is when moving the mouse between two xinerama screens.

just a guess, could it be that the button events are delivered to the wrong
screen? If you have a full-screen xev on the other screen when it happens,
does it show the button events.
Comment 2 Henk Spaaij 2008-11-26 00:00:30 UTC
Good suggestion, but thats not it unfortunately.

Running xev in both screens. Neither reports any events when moving or clicking the mouse buttons onto the xev area.
Comment 3 Thomas Jarosch 2008-12-11 14:53:33 UTC
A detailed description of this bug can be found here:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-177/+bug/296167

Don't get tricked by the title in the URL, it also happens with the included Matrox drivers and ATI drivers according to the report. I'm also seeing this on Fedora 9, the productive workstations that is hit worst uses three Xinerama screens.

Bug #10797 seems to be the same thing. As a temporary workaround I'll try to switch from Xinerama to TwinView on my coworker's boxes. Any way I can help to track this down? Detailed steps how to reproduced this is in the ubuntu bug report.

Here are some more reports of the same problem, I guess:
http://www.nabble.com/Fedora-9-minor-issue--4-td17317014.html
https://bugzilla.redhat.com/show_bug.cgi?id=475945
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/41301
Comment 4 Peter Hutterer 2008-12-11 18:11:54 UTC
Could be https://bugzilla.redhat.com/show_bug.cgi?id=473825
Comment 5 Thomas Jarosch 2008-12-12 02:01:15 UTC
The problem is reproducible here within 2-5 minutes by opening many new windows on different screens. ALT+TAB, switching to the text console or the keyboard mouse don't help.

Running xev shows no events from the mouse, not even the keyboard mouse.

Is there a way to print mouse events on a X server-wide scope?
Comment 6 Thomas Jarosch 2008-12-12 03:11:18 UTC
My coworker and I were able to reproduce it with
just X, xterm and a little script:

---------------------------------------
#!/bin/sh

export DISPLAY=:0.0

for COUNT in `seq 1 50`; do
    XPOS=$[ ( $RANDOM % 4600 ) ]
    YPOS=$[ ( $RANDOM % 1000 ) ]
    xterm -geometry 80x25+$XPOS+$YPOS &
done
---------------------------------------

The trick is to move the mouse fast between the screen/GPU
borders while the windows are being opened.

We got the root window id via "xdpyinfio" and then
started xev -id ROOT-ID. The mouse events all go
to the root window instead of the applications
when the bug occurs. Huh?
Comment 7 Henk Spaaij 2008-12-12 10:27:17 UTC
Nice work Thomas. 
Peter, its not related to evdev as it also happens when evdev is not used (on my box).

The script doesn't reproduce the problem for me, but then again, after switching from kde4 to xfce4 a few weeks ago the problem has disappeared. 

Maybe the slow nvidia performance on kde4 makes it worse. Missing events or something along those lines. After switching to xfce4 I'm running the same xorg, kmail and other software as before, but things are snappy and fast, and the problem no longer appears.
Comment 8 Peter Hutterer 2008-12-13 00:32:16 UTC
I _think_ the cause is a race condition in WindowsRestructured() which calls
CheckMotion(). CheckMotion in turn sets the wrong screen, if the mouse already
moved into a position on the other screen. Not sure yet though, but that's a
path to look at if you have time.
Comment 9 Mark 2008-12-13 15:26:59 UTC
some further information for yall:

setup: 
2 x nvidia 8600gts vid cards
3 x screens, all separate X screens, merged with xinerama
Arranged as GPU0S0 - GPU1S0 - GPU1S1
ubuntu 8.10, gnome, metacity

I've setup a xev process in the same location on each screen (so 3 xev instances), when i trigger the bug using the GPU0S0/GPU1S0 boundary moving the mouse in the xev session on screen GPU0S0(left) triggers events in the xev process running on screen GPU1S0(middle).  moving the mouse within xev processes on the other 2 screens (middle and right) causes no events to be registered anywhere.

The frequency of the bug's occurrence is definitely tied to system / graphical load.  With nvidia cards a few firefox windows (or anything with heavy xrender use) and/or a few open windowed opengl apps and the bug happens about 50% of the time when changing screens and happens much more frequently when moving over the boundary between GPU's than it does moving over the boundary between x sessions that are running on the same GPU.

To "reset" the bug without rebooting the x server you can do the following:

alt-tab to a window (firefox is the one i usually use)
alt-space to bring up the window options menu (metacity)
select move (this binds the cursor to the center of the window)
use the arrow keys to move the window to another X session window
move the mouse
the bug should now be gone

i'm happy to run some other tests if theres anything that would be useful

cheers,

x

Comment 10 Peter Hutterer 2008-12-17 20:04:05 UTC
My last comment was a wrong trail, I had a debugging session on that today and
it has nothing to do with WindowsRestructured. ATM, it looks like a scaling
issue, with the reporter in the Red Hat bugzilla saying it happens more often
on fast transitions than on slow ones.

Can you verify this?

Also, he has a triple monitor setup with two cards, and cannot reproduce it
by switching between the first two monitors (the ones on the same card).


 
> I've setup a xev process in the same location on each screen (so 3 xev
> instances), when i trigger the bug using the GPU0S0/GPU1S0 boundary moving the
> mouse in the xev session on screen GPU0S0(left) triggers events in the xev
> process running on screen GPU1S0(middle).  moving the mouse within xev
> processes on the other 2 screens (middle and right) causes no events to be
> registered anywhere.

these events are probably sent to the root window of one of the screens. Try
to get the root window id with xdpyinfo | grep root and then run xev -id <root
window id>
Comment 11 Peter Hutterer 2008-12-18 02:52:56 UTC
*** Bug 18127 has been marked as a duplicate of this bug. ***
Comment 12 Tobias Kaminsky 2008-12-18 03:39:34 UTC
(In reply to comment #10)

> these events are probably sent to the root window of one of the screens. Try
> to get the root window id with xdpyinfo | grep root and then run xev -id <root
> window id>
> 


I can confirm that xev shows then the mouse again. Even the button events are displayed in konsole. :)

Thanks!
Tobi
Comment 13 Steve Ash 2008-12-18 05:24:19 UTC
In response to comment 10, I have a three head machine (2 on an mga dual head card and 1 on a usbvga adaptor using the sis chipset) and can confirm that the problem does occur when switching between the two mga heads.

I have not been able to confirm it, but a co-worker with a similar setup says he had the problem happen once without actually switching between heads.  We both have the symptom occur several times a day, but so far the quick fix of moving a window from head to head fixes the issue.

The problem does seem to be limited to xinerama as I have another setup with an ATI card using randr in xorg.conf but otherwise identical software (all machines mentioned are running from an LTSP server with the same image, just different xorg.confs) that does not have the problem.
Comment 14 Peter Hutterer 2008-12-18 16:22:19 UTC
We may have narrowed it down to .... wait for it.... Animated Cursors!

Looks like if an animated cursor is active and the pointer crosses the screen,
at least one of the calls to update the cursor image has the wrong screen
information. This actually results in the cursor jumping back to the original
screen and then immediately back again (only visible if you happen to have a
breakpoint at the right posision).
Once that happens, the screen info is out of sync, with the pointer image
having a different target screen than the event delivery.

To verify this, I'd need you to install a cursor theme that does not use
animated cursors (don't ask me which one or how to install it...) and try to
reproduce the bug.
Comment 15 Mark 2008-12-18 18:37:38 UTC
(In reply to comment #14)
> We may have narrowed it down to .... wait for it.... Animated Cursors!
> 
> Looks like if an animated cursor is active and the pointer crosses the screen,
> at least one of the calls to update the cursor image has the wrong screen
> information. This actually results in the cursor jumping back to the original
> screen and then immediately back again (only visible if you happen to have a
> breakpoint at the right posision).
> Once that happens, the screen info is out of sync, with the pointer image
> having a different target screen than the event delivery.
> 
> To verify this, I'd need you to install a cursor theme that does not use
> animated cursors (don't ask me which one or how to install it...) and try to
> reproduce the bug.
> 

I can see this as the bug does relate to graphics load for me...ie if i move my cursor away from an app with an animated cursor the likely hood of it not changing before reaching the X session border would increase (i would think) thus causing the bug to appear more often under load.  If this is true i would imagine that the bug appears much more for those that keep apps full screened on each session (as i do with VM's at times, and do see an increase in frequency).  I'll test this by keeping windows away from the boarders, thus giving the cursor more time to move to a non animated state before crossing the boarder.  

cheers,

x
Comment 16 Thomas Jarosch 2008-12-19 01:12:15 UTC
(In reply to comment #14)
> We may have narrowed it down to .... wait for it.... Animated Cursors!
> 
> Looks like if an animated cursor is active and the pointer crosses the screen,
> at least one of the calls to update the cursor image has the wrong screen
> information. This actually results in the cursor jumping back to the original
> screen and then immediately back again (only visible if you happen to have a
> breakpoint at the right posision).
> Once that happens, the screen info is out of sync, with the pointer image
> having a different target screen than the event delivery.

Thanks for looking into this Peter, it's really appreciated.

I'm not 100% sure if it's related to animated cursors as I found a new way to trigger the bug: There are some posts in various bug trackers that mention if you assign a short-cut to your window manager to move a window via keyboard to another screen, the problem can be worked around. 

For me, the opposite happens: I have a working mouse and press the keyboard short-cut to move the window from one screen to another. As soon as the mouse cursor in the middle of the moving window crosses the screen border, the window stops moving -> the keyboard events go somewhere else, too. Also the short-cut for moving the window stops working. Luckily I can still press CTRL+ALT+DEL for properly shutting down my KDE session :-)

While moving the window, the mouse cursor changes it's shape to a cross.
Does that somehow also qualify as animated cursor/execute the same code path?

btw: A coworker just phoned me to report that the problem happened to him while he was working on one screen only. Huh?
Comment 17 Mark 2008-12-19 08:46:21 UTC
(In reply to comment #16)
> 
> I'm not 100% sure if it's related to animated cursors as I found a new way to
> trigger the bug: There are some posts in various bug trackers that mention if
> you assign a short-cut to your window manager to move a window via keyboard to
> another screen, the problem can be worked around. 
> 
> For me, the opposite happens: I have a working mouse and press the keyboard
> short-cut to move the window from one screen to another. As soon as the mouse
> cursor in the middle of the moving window crosses the screen border, the window
> stops moving -> the keyboard events go somewhere else, too. Also the short-cut
> for moving the window stops working. Luckily I can still press CTRL+ALT+DEL for
> properly shutting down my KDE session :-)
> 
> While moving the window, the mouse cursor changes it's shape to a cross.
> Does that somehow also qualify as animated cursor/execute the same code path?
> 
> btw: A coworker just phoned me to report that the problem happened to him while
> he was working on one screen only. Huh?
> 

interesting tried this and confirmed basically the same thing, test cases:

select window on left screen (GPU0,S0), use move window command, use keyboard to move the window to the middle or right screen (GPU1,S0 or S1) this triggers the bug, mouse no longer registers events properly.  Moving the window (or any window) back to the left screen GPU0,S0 fixes the bug.  Also, ONLY moving something onto the left GPU0.S0 screen will fix the bug, moving something onto either of the other 2 screens will not fix the bug.  Each of my screens is a separate X session.

I don't get the cross icon, on the left and middle screens i get the normal gnome/metacity closed hand icon (that icon does have an open and close hand position it other instances, so it may be using the animated cursor code path).  Moving the window to the right screen gives the standard system arrow cursor.

Interested it knowing if the left screen is special because its on a separate GPU or because it is Xsession / Xscreen 0

cheers,
x

Comment 18 Gareth Bult 2009-01-08 06:58:29 UTC
Hi,

Just to clarify, this issue popped up in Ubuntu 8.10 and is affecting people with multi-screen systems (typically 3+) with Xinerama switched on. It's been ongoing since October 2008.

Bottom line is that the people affected are "power users" so this bug is hitting pro-Linux people and developers VERY hard .. without pointing any fingers, nobody is coming up with a fix.

As this bug was "introduced" into what was commonly thought to be a stable system (and hence relied upon for critical business systems) and given the scale of the problem, it would be rather nice is *someone* backed out of the problem. I've heard reports for example from Ubuntu users that selectively downgrading X.org components to 8.04 solves the problem for them.

Please please please can someone either;
a. Produce a fix (ok, this may not be possible ..)
b. Backout of whatever changes caused the issue 
c. Issue a recommendation to packagers that they backout / downgrade to a known working version

For my part I'm about to go downgrade my X components after the 10th lockup today.
Comment 19 Thomas Jarosch 2009-01-12 05:36:15 UTC
(In reply to comment #14)
> We may have narrowed it down to .... wait for it.... Animated Cursors!

Happy new year everyone! I really appreciated the work on this bug
and wanted to ask if there is any short news concerning the issue?
Comment 20 Mark 2009-01-20 19:05:30 UTC
waring: out of context.

i'm purchase this: http://www.newegg.com/Product/Product.aspx?Item=N82E16815106011

3d frame rates have jump ~25% in all apps

i can use compiz (note above 25% jump is with compiz enabled)

it may cost a bit more but it destroys xinerama performance in ever way.

cheers,

x
Comment 21 geekfreak 2009-01-20 22:39:20 UTC
launchpad thread has directions for workaround. and easy reproduction for debugging. It is very sad a bug as severe as this can lie around assigned for such a long time. not even an assignee to take point on the issue.

https://bugs.launchpad.net/ubuntu/+bug/296167
Comment 22 Henk Spaaij 2009-01-20 23:23:30 UTC
Don't know if it helps, but I've set severity to critical. 

Its good to see a second workaround, for those for whom it might work.

Since switching to XFCE the frequence has lessened for me but once every two days it still happens. Unfortunately, lately the keyboard also seems to become unresponsive. 

Comment 23 Thomas Jarosch 2009-01-27 01:05:29 UTC
-Any- news, please?
Comment 24 Peter Hutterer 2009-01-28 17:04:30 UTC
Here's a braindump of what I know so far.
I have not been able to reproduce the bug yet and since everything so far indicates a race condition, this is not an easy one to track down. This information is from a (very) remote ssh debugging session.

The server has two screen pointers around the Event Queue (EQ). One is miEventQueue.pEnqueueScreen, the other miEventQueue.pDequeueScreen. Enqueue is used during signal handling to shove new events into the EQ, Dequeue during event processing to take them out and process them further.
Both are modified through mieqSwitchScreen(), with Dequeue being conditional on a parameter.

Other interesting variables are miPointer.pScreen (the screen of the rendered sprite) and sprite.screen (the screen as seen during event processing).

The usual order of updates to these four variables is:
pEnqueueScreen -> miPointer.pScreen -> miEventQueue.pDequeueScreen -> sprite.screen

When the screwup happens, I noticed the order isn't 1,2,3,4 as above, but instead 1,2,1,2,3,4. From then on, the first two always have different values than the other two.

The question is how the screens get out of sync. I looked at the code and I can't explain it.
One remote guess is XineramaCheckMotion(), where the root x/y coordinates are used. I think by then they should be in per-screen coordinates already, so that would give us the wrong screen, possibly triggering that. Although this should happen all the time, not just sometimes.
Without being able to run gdb on a busted server, I can't really say more.
Comment 25 Henk Spaaij 2009-01-28 17:49:05 UTC
Peter, thanks for picking this up.
Two questions come to mind:
1. what is the trigger for the deQueue to happen? 
2. what happens if an enqueue interrupts a dequeue?

The event order suggests that the dequeue never happens, or does not complete, or the next enqueue doesn't wait for an associated dequeue to complete. This type of problem looks like a missing semaphore or something like it.

I don't know the code, but maybe asking some questions might help find a lead.
Comment 26 Peter Hutterer 2009-01-28 18:00:19 UTC
> 1. what is the trigger for the deQueue to happen? 

dequeuing events is part of the main loop.

> 2. what happens if an enqueue interrupts a dequeue?

nothing, in theory those two should be independent.

> The event order suggests that the dequeue never happens, or does not complete,
> or the next enqueue doesn't wait for an associated dequeue to complete. This
> type of problem looks like a missing semaphore or something like it.

Reports that the bug occurs more frequently under load indicate that too.
The one lead I have is that mi/mipointer.c stuff is being called both during
SIGIO handling and during event processing, but I haven't found any
significant overlap yet.
Comment 27 Thomas Jarosch 2009-01-29 00:41:42 UTC
Thanks for the update Peter, it's highly appreciated!

> Here's a braindump of what I know so far.
> I have not been able to reproduce the bug yet and since everything so far
> indicates a race condition, this is not an easy one to track down. This
> information is from a (very) remote ssh debugging session.

I'm still able to reproduce it 100% of the time by using a keyboard shortcut in KDE to move a window from one screen to another. As soon as the mouse
cursor in the middle of the moving window crosses the screen border,
the window stops moving -> the keyboard events go somewhere else, too.

Would it help if I set you up a box with two graphic cards and full root access to reproduce the thing? We have an IP ready KVM switch so you could access it from the outside like a normal user. Though reconfiguring all this stuff and putting it in a DMZ will require some time...

> Reports that the bug occurs more frequently under load indicate that too.
> The one lead I have is that mi/mipointer.c stuff is being called both during
> SIGIO handling and during event processing, but I haven't found any
> significant overlap yet.

Hmm, also smells like an -EAGAIN error code issue. Suppose some code is in the middle of a read/write and a signal fires, you're read/write call will be interrupted with -EAGAIN. Even without a signal firing, if a box is under load the kernel can issue -EAGAIN for IO operations f.e. on sockets.
Comment 28 Peter Hutterer 2009-01-29 17:39:05 UTC
Created attachment 22373 [details] [review]
0001-mi-don-t-call-UpdateSpriteForScreen-if-we-have-Xine.patch

This patch seems to fix the issue.

mi: don't call UpdateSpriteForScreen if we have Xinerama enabled. #18668

In Xinerama all windows hang off the first root window. Crossing the screens
must not reset the spriteTrace, otherwise picking fails and events are sent to
the root window.
Comment 29 geekfreak 2009-01-29 21:33:22 UTC
I would be happy to test the patch. if you could explain how I go about it.

Comment 30 Peter Hutterer 2009-01-29 22:34:20 UTC
Fedora 10 users can get a scratch build, see
https://bugzilla.redhat.com/show_bug.cgi?id=473825#c35

Otherwise, you need to get the server sources from your respective distribution and apply the patch (or get your distribution to do it for you). Then build and install the patched server.
Alternatively you can check out the git repository, more on 
http://wiki.x.org/wiki/JhBuildInstructions
Comment 31 Thomas Jarosch 2009-01-30 02:04:04 UTC
(In reply to comment #28)
> Created an attachment (id=22373) [details]
> 0001-mi-don-t-call-UpdateSpriteForScreen-if-we-have-Xine.patch
> 
> This patch seems to fix the issue.

Peter, you're my hero! Patch is working fine on Fedora 9. Tested on my box and a triple head one. Congratulations also from my co-workers.

Have a nice weekend!
Comment 32 Henk Spaaij 2009-01-30 08:37:56 UTC
Thank you Peter this is a great find. I've reported your patch on the Gentoo Bug forum as well. (https://bugs.gentoo.org/show_bug.cgi?id=243496)

As soon as I can confirm this works I'll close this bug.

Thanks on behalf of the Gentoo users :)
Comment 33 Peter Hutterer 2009-02-01 16:44:18 UTC
Patch sent to list for final review.
http://lists.freedesktop.org/archives/xorg/2009-February/043171.html
Comment 34 Peter Hutterer 2009-02-03 15:09:31 UTC
Pushed as 9fe9b6e4ef669b192ee349e3290db5d2aeea273c and nominated for 1.6. 
Thanks for testing.
Comment 35 Henk Spaaij 2009-02-03 15:59:46 UTC
Thanks for fixing it Peter
Comment 36 Steven Harms 2009-02-04 20:27:28 UTC
I tested the fix against Ubuntu Intrepid and it works great.  System stable, thanks so much Peter!
Comment 37 Jeremy Rumpf 2010-03-30 08:49:32 UTC
*** Bug 20389 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.