I think I have traced this to the AddPassiveGrabToList function in
xc/programs/Xserver/dix/grabs.c. This function allows the same client to succeed
with multiple grabs on the same key if the requests are made from the same client.
When using a program that carelessly calls XGrabKey several times, the list of
grabs increases. After a while, this makes the X server use 100% CPU as well as
being unresponsive for several minutes.
Sorry about the phenomenal bug spam, guys. Adding xorg-team@ to the QA contact so bugs don't get lost in future.
I got hit by this issue recently with X.org 7.2 and KDE 3.5.6. Something in KDE does lots of grabs gradually, and when the desktop has a week or two uptime, everything starts to be very sluggish. Keyboard autorepeat can repeat only about 15-20 chars/sec even if it is set to 30 chars/sec, and X takes 60-70% of CPU time while keeping a key pressed.
Seems that it was "kded" that owned those grabs. When I killed kded, the X server hung *totally* (mouse didn't move, keyboard did nothing, display didn't change) for 45 minutes! I attached gdb to X server process, and took backtrace a few times. Almost every time I interrupted the process, it was busy in DeletePassiveGrab():
#0 0x08096a3a in DeletePassiveGrab ()
#1 0x0806fdb0 in FreeClientResources ()
#2 0x08080437 in CloseDownClient ()
#3 0x0808687e in Dispatch ()
#4 0x0806e589 in main ()
Once it got rid of all the grabs, the X server was working normally again and the slowdown was gone.
Of course there is a bug in the X client; it shouldn't do so many grabs. But I think that X server should do something to prevent this from happening. I patched dix/grabs.c to not allow more than one grab for the same key combination, and so far the only thing I have noticed to break has been apps using libXt. For example, xedit dies with the following message:
X Error of failed request: BadAccess (attempt to access private resource denied)
Major opcode of failed request: 28 (X_GrabButton)
Serial number of failed request: 1652
Current serial number in output stream: 1667
Created attachment 9843 [details] [review]
Prevent clients from grabbing the same key combination many times.
Breaks apps using libXt; KDE/Qt and GTK+ apps seem to work fine.
Created attachment 9850 [details] [review]
Silently ignore duplicate grab requests
This patch makes the X server to silently ignore grab requests for key combinations for which a grab already exists. (For comparision, my first patch made X server to return an error.)
Now apps using libXt work (xterm, xedit, ...). It's still possible that some applications break: for example, if an app grabs the same key combination twice and then releases the grab once, there won't be any grab left if this patch has been applied.
I and a few friends have used X.org 7.2 with the patch "Silently ignore duplicate grab requests" applied and everything seems to work fine: X server no longer gets slow with long KDE uptimes and no applications have broken.
Please see <http://bugs.kde.org/show_bug.cgi?id=144993#c1>. KDE side seems to think that this should be fixed in the X server, and that my patch to the X server looks good. I hope that this bug could get fixed before X.org 7.3 is out.
The X11 protocol specs for GrabKey say:
This request establishes a passive grab on the keyboard.
This request overrides all previous passive grabs by the same client
on the same key combinations on the same window.
I agree, the silently ignore patch looks correct to me. It's basically down to specifying whether "override" means "overwrite" or not. If not, the current behaviour is correct. New grabs are prepended to the list, so until a grab is removed it "overrides" previous ones.
Update from various communications on the IRC channel (paraphrased):
daniels: A lot of GrabKeys with the same value are considered bad behaviour. a few are ok, but hundreds are bad.
Seli: The silently ignore patch doesn't override previous grabs, so it's not correct.
whot (me): it should be possible to remove a grab from the list and just prepend the new one. removing the grab should not affect any ongoing grabs on the devices.
I'm not a 100% sure on the last point though, I only spent a few minutes browsing the source. But it seems the passive grab from the list is copied onto the device, we don't use the a reference to the grab.
I see I originally misunderstood the exact semantics of GrabKey/UngrabKey in my response to the KDE bugreport. However I still think this is a problem of X.
What UngrabKey does is that it resets passive grab on its specified key/modifiers, GrabKey sets passive grab on its specified key/modifiers. It can be simply seen as setting and resetting bits (to oversimplify it a bit).
UngrabKey does this by walking the whole list of passive grabs and adjusting all matching ones. GrabKey, on the contrary, simply prepends to the list. Therefore e.g. doing the same grab 100 times and then one ungrab is a no-op in the end.
I don't see any ground for claiming that doing many same grabs is wrong for any reason, just like there's nothing wrong with setting the same bit many times. I consider it simply an implementation inefficiency of the X server that it doesn't remove duplicates from the list and therefore does not handle this gracefully. That would be like claiming that doing repeatedly the same XSelectInput() is wrong.
A sufficient fix IMHO would be, after the prepend is done, walk the list and remove any exact duplicate.
Created attachment 10222 [details] [review]
Remove identical grabs when prepending new.
When a new GrabKey is processed, check for identical grabs and remove them from the passive grab list.
If there is no further comments, I'll push the patch after some additional testing.
I'm unable to confirm yet if the new patch really fixes the issue for me. The patch looks good to me though (I also see what was wrong with my patches), so I would be surprised if it didn't fix this issue.
In any case, I will report when I can confirm that the patch works (or doesn't work). It may take some time though, because it's hard to get enough uptime in the summer.
Thanks to everyone who have helped solving this bug!
Fixed in commit b141b85c254afff3ce2221d899787fab3dc295bd.
I just had 22 days of uptime, and things were still running smoothly. Thus, this bug seems to be fixed now. Thank you.