A measureable improvement in performance results from changes in the generated
code like the attached patch. The idea is to take all the core protocol requests
that have only constant data and bypass any data-dependent processing in the XCB
core. This only works for request functions that take no arguments and do not
have to set an extension-specific major number: there are 13 such requests.
XCBGrabServer, XCBUngrabServer, XCBGetInputFocus, XCBQueryKeymap,
XCBGetFontPath, XCBListExtensions, XCBGetKeyboardControl, XCBGetPointerControl,
XCBGetScreenSaver, XCBListHosts, XCBGetPointerMapping, XCBGetModifierMapping,
Created attachment 5399 [details] [review]
Patch to make XCBNoOperation use XCB_REQUEST_RAW flag.
I provided measurements of the effect of fixing both this bug and #10167 in the message "best-case X no-op performance measurement":
Created attachment 29041 [details] [review]
Patch implementing usage of XCB_REQUEST_RAW
I've attached a patch implementing this.
I've no idea of what it does, I've just follow Jamey comments. :-)
Awesome, thanks Julien. :-) I don't think this patch is right though.
First, you should be able to programatically work out when this optimization is OK, rather than listing the functions to apply it to. According to my original notes in this bug, the request a) must not be in an extension and b) must not have any parameters. I think that's more restrictive than necessary but I can't remember why I described it that way, and at least it isn't wrong.
More importantly: When you use XCB_REQUEST_RAW, xcb_send_request won't set the length or major opcode fields for you. It just sends the bag of bytes that you hand it, exactly as-is. So you can't pass it an uninitialized request, you have to set those fields.
In my original illustrative patch, I set those fields by preinitializing a 'static const' request. That means fewer instructions and especially fewer stores, and is probably where the "must not have any parameters" restriction came from.
If you don't make the request const, then I think this optimization applies to any core protocol request that does not have variable-length parameters. Ideally you'd use 'static const' where possible as well as non-const raw requests for the others.
If we did a little API work in core XCB, we could extend this to extension requests as well, but I don't want to think that hard yet.
IMHO this whole line of work turns out to be a bad idea. Lots of complexity for small performance gains on things that don't bottleneck anyone anyway AFAIK. Honestly, if the request doesn't involve a round trip and isn't issued by normal applications thousands of times per second, I'm really not interested in its performance.
I'm OK with the patch going in, when it's right, but if it were up to me I'd just revert it, close the bug and call it an interesting experiment.
We can probably stop micro-optimising x11perf round trips.