Bug 17868

Summary: Some Java applications are slow on remote X connections
Product: XCB Reporter: Juha Erkkilä <Juha.Erkkila>
Component: LibraryAssignee: xcb mailing list dummy <xcb>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: high CC: jerickson, wollw
Version: 1.1   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: tshark-dump of one keypress with XCB linked in
tshark-dump of one keypress without XCB
Patch to disable nagle algorithm on XCB network sockets

Description Juha Erkkilä 2008-10-02 03:59:24 UTC
Created attachment 19336 [details]
tshark-dump of one keypress with XCB linked in

Some Java applications, such as the trial version in http://www.typingmaster.com/, run unusably slow when used over a remote X connection.

I'm using Ubuntu Hardy 8.04.1 with LTSP5, Linux kernel 2.6.24-19-server, with 32-bit firefox and 32-bit Java-plugin.  The relevant package versions are here:

firefox32 2.0.0.17
sun-java32 1.6.0.5
sun-java6-jre 6-06-0ubuntu1
ia32-libs 2.2ubuntu11
libxcb1 1.1-1ubuntu1
libxcb-xlib0 1.1-1ubuntu1

The issue does not appear to be Java-version related (it exists in 1.5 and 1.6 versions).  I have not tested the very latest Java-versions though.  The reason I'm reporting this as a possible XCB-related issue is that one can workaround the problem by using an X11-library that does not link to XCB.

In current Ubuntu version (Hardy), /usr/lib32/libX11.so.6.2.0 library links to /usr/lib32/libxcb-xlib.so.0 and /usr/lib32/libxcb.so.1 libraries, XCB version is 1.1.  In previous Ubuntu version (Gutsy) the X11-library does not do this.  When the new library shared object replaced with the old one, the problem disappears, and Java applications that had problems run fine.

There may be some other differences between the X11-libraries, but using XCB as an underlying implementation seems to be a major change, or is it?  All other applications do not appear to have these problems, Java applications appear to be the sole source of these problems.

I'm adding two attachments that show tshark-dump of network traffic in both cases, perhaps it helps to analyze the issue.  What happens there is one keypress on typingmaster Java-version, on Hardy/XCB case it takes half a minute to process one keypress and switch a screen, on Gutsy/no-XCB case it takes maybe a second.

Juha
Comment 1 Juha Erkkilä 2008-10-02 04:00:21 UTC
Created attachment 19337 [details]
tshark-dump of one keypress without XCB
Comment 2 Jordan Erickson 2009-02-17 11:54:07 UTC
FYI, I have triaged this issue with Ubuntu Launchpad. Please see https://bugs.launchpad.net/libxcb/+bug/277069 for additional submitted comments and information.
Comment 4 elupus 2009-05-21 07:45:56 UTC
I think this could be related to what I was experiencing for GLX over network.

Running any GLX application over network was horribly slow. I then found out that running GLX application over network loopback on the same machine was event slow. Ie i tested the following.

DISPLAY=localhost:0.0 LIBGL_ALWAYS_INDIRECT=1 xbmc
vs
DISPLAY=:0.0 LIBGL_ALWAYS_INDIRECT=1 xbmc

where xbmc being XBMC Media Center is a opengl application. The first test gave a fps of 10, while the second a fps 40 on my hardware. After some pondering i thought about the nagle algorithm.

After modifying libxcb to disable the nagle algorthim, the above two commands rendered at about the same speed of 40fps.

I'll attach a diff.
Comment 5 elupus 2009-05-21 07:47:35 UTC
Created attachment 26071 [details] [review]
Patch to disable nagle algorithm on XCB network sockets
Comment 6 elupus 2009-05-22 05:18:33 UTC
My patch seems to have solved the issue for the people affected by this bug. There might be an alternate approach that would incure less overhead due to TCP_NODELAY.

One could instead of having TCP_NODELAY enabled all the time, only enable it on the socket on a call to _XFlush(), then disable. I'm not sure how the kernel would like this setting being enabled and disabled all the time thou.
Comment 7 Julien Cristau 2009-05-26 01:20:19 UTC
Disabling Nagle sounds pretty reasonable to me.  It's also what Xtrans (and thus traditional Xlib) does.
Comment 8 Julien Danjou 2009-05-26 07:16:57 UTC
commit ee89850e68205a7f8961ace0839b5be86040dade
Author: elupus <elupus@ecce.se>
Date:   Tue May 26 16:14:48 2009 +0200

    Disable Nagle on TCP socket
    
    Signed-off-by: Julien Danjou <julien@danjou.info>
Comment 9 Bart Massey 2009-05-26 09:58:06 UTC
(In reply to comment #8)
> commit ee89850e68205a7f8961ace0839b5be86040dade
> Author: elupus <elupus@ecce.se>
> Date:   Tue May 26 16:14:48 2009 +0200
> 
>     Disable Nagle on TCP socket
> 
>     Signed-off-by: Julien Danjou <julien@danjou.info>
> 

I can't believe we had Nagle on. :-)  Oops.

Thanks much to all for the diagnosis and fix.
Comment 10 Wouter Bolsterlee 2010-12-09 15:44:37 UTC
Wouldn't it be a better option to use TCP_CORK and to "pull the cork" in _XFlush()? See http://baus.net/on-tcp_cork for a more elaborate description of this feature.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.