Bug 29561 - _XReply: performance regression in latest version
_XReply: performance regression in latest version
Status: NEW
Product: xorg
Classification: Unclassified
Component: Lib/Xlib
git
Other All
: medium major
Assigned To: Jamey Sharp
Xorg Project Team
2011BRB_Reviewed
: regression
: 51648 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-13 05:40 UTC by Rami Ylimaki
Modified: 2012-07-05 12:29 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
minimal test case (875 bytes, text/plain)
2010-08-13 05:41 UTC, Rami Ylimaki
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rami Ylimaki 2010-08-13 05:40:43 UTC
Compile the attached test program and run it with latest libX11 from FDO git. Use "ltrace -S ./test-xreply" to monitor system calls.

The test program executes a single XInternAtoms request with 100 atoms. As a result, _XReply is calling non-blocking read 200 times. It seems that whenever _XReply reads n replies, it will execute n*2 non-blocking reads that all fail with EAGAIN.

I tested the execution time of XInternAtoms with 100 atoms on a Debian Sid snapshot from today. The libX11 from Sid doesn't suffer from this problem and there is only one failing non-blocking read in _XReply. Then I compiled latest upstream libX11 inside Sid and installed it. Updating libX11 made the problem visible. On my machine, the XInternAtoms call with 100 atoms takes about 8 times longer (1ms vs. 8ms) if latest libX11 is used from FDO. I used ltrace for getting the execution times.

I'm not too concerned about the performance of XInternAtoms itself, but it is the easiest way to make this problem visible. The number of failing non-blocking reads has increased for other requests as well, but there are usually much less replies when handling other requests.
Comment 1 Rami Ylimaki 2010-08-13 05:41:35 UTC
Created attachment 37846 [details]
minimal test case
Comment 2 Jamey Sharp 2010-08-13 11:28:16 UTC
Oh. Every time Xlib calls xcb_wait_for_reply, it has to call xcb_poll_for_event in case there's an event that should be processed first. Unfortunately, unlike xcb_poll_for_reply, xcb_poll_for_event tries to read from the socket if it doesn't find any events to return. I guess that probably seemed like a good idea to me when I wrote it.

We could get Xlib to call xcb_poll_for_event only once per call to _XReply in the common case where there are no events, which would improve things for functions like XInternAtoms that use async replies. It wouldn't help the common case where async replies aren't being used, though: they'd still have an extra read syscall per reply.

Ideally we'd change the semantics of xcb_poll_for_event to never read, which would probably require introducing some sort of xcb_force_read function. Unfortunately there may be code relying on this side effect of xcb_poll_for_event, and it's hard to tell which callers care, making that approach painful. For example, _XEventsQueued in xcb_io.c is relying on this side effect, but _XReadEvents is not... I think?

The other approach is to introduce a response queue API. Xlib would inform XCB, "I want all events and any responses to these requests that I issue," and XCB would feed that set of responses into a queue without separating replies, errors, or events. Such an API would eliminate a lot of the XCB-related code in Xlib; fix the one remaining Xlib/XCB threading bug I know about; fix this excessive-read annoyance; and hopefully turn out to be useful to other callers too, such as cairo. But somebody needs to write the code.
Comment 3 Robert White 2010-12-04 09:05:41 UTC
Possibly super niave question: Why not split xcb_poll_for_event into an internal and a wrapper. The wrapper would contain the whole implementation as-is by being a call to the internal (say xcb_check_for_buffered_events) and then try the file descriptor once if the check returned nada.

Now xcb_wait_for_reply could also call xcb_check_for_buffered_events before going into its poll/wait/whatever; which will naturally already have the path to include the socket in its wait-for-read list. (If it doesn't already include the socket that xcb_poll_for_event does the non-blocking read upon, then this could deadlock anyway, or it is already ready to absorb the hit for unread events that could happen after xcb_poll_for_event EAGAIN(ed) anyway. In either case it has zero repeatable impact to skip the actual/expensive failed read.)

The always-oughta rules for poll safe polling are:
(1) you have to check your pre-read data before it is safe to call poll/epoll, and (2) you always-oughta include all the file descriptors that can feed the pre-read data pool you check before a poll in the poll itself.

Every path that looks like "buffer_check-maybe_return-nb_read_into_buffer-maybe_return-poll-nb_read_into_buffer-maybe_return" is a priori wrong compared to "buffer_check-maybe_return-poll-nb_read_into_buffer-maybe_return".

So split out the buffer check part of xcb_poll_for_event. Now you don't need a whole buffer select API, you just process what you got or you poll-wait if you got nothing buffered. You do this buffer check for all paths, and you decide to read/poll only once on any path.
Comment 4 Rami Ylimaki 2010-12-07 01:37:37 UTC
(In reply to comment #3)
> Possibly super niave question: Why not split xcb_poll_for_event into an
> internal and a wrapper. The wrapper would contain the whole implementation
> as-is by being a call to the internal (say xcb_check_for_buffered_events) and
> then try the file descriptor once if the check returned nada.

This is how xcb_poll_for_event() is working currently. It's calling internal get_event() to read a buffered event before trying to read from the connection once.

I actually implemented a public xcb_poll_for_queued_event() function that never tries to read anything from the connection. Using this in _XReply() and _XReadEvents() solves the excessive system call problem when requests are done and events read in blocking fashion.

> Now xcb_wait_for_reply could also call xcb_check_for_buffered_events before
> going into its poll/wait/whatever; which will naturally already have the path
> to include the socket in its wait-for-read list. (If it doesn't already include
> the socket that xcb_poll_for_event does the non-blocking read upon, then this
> could deadlock anyway, or it is already ready to absorb the hit for unread
> events that could happen after xcb_poll_for_event EAGAIN(ed) anyway. In either
> case it has zero repeatable impact to skip the actual/expensive failed read.)
> 
> The always-oughta rules for poll safe polling are:
> (1) you have to check your pre-read data before it is safe to call poll/epoll,
> and (2) you always-oughta include all the file descriptors that can feed the
> pre-read data pool you check before a poll in the poll itself.
> 
> Every path that looks like
> "buffer_check-maybe_return-nb_read_into_buffer-maybe_return-poll-nb_read_into_buffer-maybe_return"
> is a priori wrong compared to
> "buffer_check-maybe_return-poll-nb_read_into_buffer-maybe_return".
> 
> So split out the buffer check part of xcb_poll_for_event. Now you don't need a
> whole buffer select API, you just process what you got or you poll-wait if you
> got nothing buffered. You do this buffer check for all paths, and you decide to
> read/poll only once on any path.

If I understood your comments correctly, the buffer check part has been already separated in XCB. Read system calls aren't executed unless necessary. The original problem comes from the fact that the current public XCB API doesn't make it easy to implement Xlib so that excessive reading on the connection could be avoided.

Adding something like xcb_poll_for_queued event() to the XCB public interface seems to fix the problem. I haven't posted my patches to the mailing list yet, I will do so when I find time for that. However, the XCB maintainers might have other (more generic) plans regarding this problem so my patches won't be necessarily accepted.
Comment 5 Jeremy Huddleston 2011-10-09 01:47:06 UTC
Jamey, it's been about a year since this was updated... any thoughts on how to 
address this?
Comment 6 Rami Ylimaki 2011-10-10 00:16:40 UTC
(In reply to comment #5)
> Jamey, it's been about a year since this was updated... any thoughts on how to 
> address this?

The excessive reads are fixed by these patches:

XCB:  http://lists.freedesktop.org/archives/xcb/2011-March/006852.html
XLIB: http://lists.freedesktop.org/archives/xcb/2011-March/006864.html

The first one seems to have been applied to libxcb, but the latter one hasn't been applied to libX11 yet.
Comment 7 Rami Ylimaki 2011-10-10 00:25:36 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Jamey, it's been about a year since this was updated... any thoughts on how to 
> > address this?
> 
> The excessive reads are fixed by these patches:
> 
> XCB:  http://lists.freedesktop.org/archives/xcb/2011-March/006852.html
> XLIB: http://lists.freedesktop.org/archives/xcb/2011-March/006864.html
> 
> The first one seems to have been applied to libxcb, but the latter one hasn't
> been applied to libX11 yet.

And now I remember that the libX11 patch hasn't been applied because no tagged version of libxcb has the required fix yet. People don't want libX11 to depend on a fix that hasn't been officially released yet.
Comment 8 Alan Coopersmith 2011-10-10 10:53:09 UTC
(In reply to comment #7)
> And now I remember that the libX11 patch hasn't been applied because no tagged
> version of libxcb has the required fix yet. People don't want libX11 to depend
> on a fix that hasn't been officially released yet.

As a hard dependency, I agree, but if you #ifdef'ed it so that libX11 could
build fine on older libxcb, I don't see why we wouldn't take it in libX11 now,
while waiting for xcb to sort its release out.
Comment 9 Uli Schlachter 2012-07-05 12:29:58 UTC
*** Bug 51648 has been marked as a duplicate of this bug. ***