Follow up of GNOME bug http://bugzilla.gnome.org/show_bug.cgi?id=314616 and Mandriva bug http://qa.mandriva.com/show_bug.cgi?id=17723 Radeon acceleration is not enabled when using XRenderComposite to render pixmap background in nautilus (when using Gtk 2.8 + Cairo, when cairo workaround for old broken Render implementation is not enabled). Acceleration is back to normal when starting glxinfo (seems to reset acceleration in the driver). Disabling acceleration Radeon driver also fixes the problem. card used : Radeon 9200SE
See also bug #4456
This bug can be workaround by running glxinfo (seems to reset Radeon acceleration code) or adding Options NoAccel "true" in xorg.conf.
Created attachment 4388 [details] big-fill.c This test uses the same sequence of xlib/cairo calls that Nautilus uses when painting its background. It shows how cairo_fill() of a non-alpha rectangle takes over 1 second. Per Vlad's suggestion, I turned on Options "AccelMethod" "EXA" for my Radeon card. This makes things fast. But still, the non-accel compositor should be better, since there is no compositing to be made (pixmap to pixmap copy). KeithP and CWorth say there may be a code path where the RENDER implementation doesn't detect that it can simply use XCopyArea(). This comment shows where this happens in the server: http://bugzilla.gnome.org/show_bug.cgi?id=314616#c9
I've tried Federico's code. In theory his comment is correct, but in practice, it does not work. Without EXA: ~0.72s for fill. With EXA: ~0.03s for fill. Nevertheless, dragging windows around in GNOME becomes painfully slow with EXA on. Composite and Render don't seem to affect these things. However xcompmgr further slows that down to ~1 frame per second when dragging a relatively small window.
I've enabled EXA on my system with X.org 6.9 final and it is still slow :(
I think this eMail I responded to on the GNOME desktop-devel list narrows things down. It is perhaps some interaction with Nautilus and the X.org acceleration/EXA code: James Livingston wrote: > One thing I noticed is that the time is greatly affected by whether > Nautilus is drawing the desktop or not. I normally don't, but when > turned on the time was up to around a second. Drawing the icons and text > might take extra time, but is there something Nautilus is doing that > causes it to go that much slower? BINGO. This narrows in on the culprit. Disabling "show_desktop" makes the whole desktop 3-4 times more snappy, especially with EXA. It appears that (at least with radeon), nautilus' desktop drawing breaks very drastically. But even with top-of-the-line nVidia (with closed driver), desktop scaling speed is very much improved without nautilus. --Pat
This is also discussed in SUSE's bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=117163 What I found out so far: The test case in attachment #4388 [details] only reveals this problem on configurations with a framebuffer width of 1400 or higher. In both cases (fast+slow) fbCopyAreammx (which doesn't use MMX at all BTW) is called. Running glxinfo could also trigger this by allocating more graphics memory and leaving no space for the pixmaps any longer. Though this is only a very rough guess. For 1280x1024 framebuffers the off-screen pixmap seems to be created in regular memory (src+dest bytes): fbCopyAreammx: src 0xaf4d3008 0/0 dest 0xaef5f008 0/0 size 1400/1021 fbCopyAreammx: src bytes 0xaf4d3060 stride 15e0 dest bytes 0xaef5f060 stride 15e0 byte_width 15e0 fbCopyAreammx: 0.037907s aef5f000-afa6f000 rw-p aef5f000 00:00 0 afa6f000-b7a6f000 rw-s d8000000 03:02 26724 /dev/mem while for 1400x1050 the pixmap is created in card memory: fbCopyAreammx: src 0x820fcc0 0/1055 dest 0x820fd20 0/2105 size 1400/1021 fbCopyAreammx: src bytes 0xb0048a00 stride 1600 dest bytes 0xb05ec600 stride 1600 byte_width 15e0 fbCopyAreammx: 0.981807s 081c8000-0838a000 rw-p 081c8000 00:00 0 [heap] afa9e000-b7a9e000 rw-s d8000000 03:02 26724 /dev/mem This happens because the requested pixmaps are of size 1400x1021 and 1400x1050. AFAIR The current memory manager is stride based, that is it can only allocate pixmaps in graphics memory up to the framebuffer width. Card memory is incredibly slow to read, that's what's hitting us here. So what actually should happen is that the XAA CopyArea function should have been called, which does everything in graphics memory with the GPU. We still have one problem: what shall we do if we have to copy a pixmap from GPU memory to host memory? This will always be slow.
Created attachment 4545 [details] Updated big-fill.c OK, my example program has hardcoded values for my particular screen resolution (1400x1050) :) Here is an updated version which picks up your screen resolution and uses values based on that. Hopefully that will be more useful in diagnosing the problem.
(In reply to comment #7) > Card memory is incredibly slow to read, that's what's hitting us here. So what > actually should happen is that the XAA CopyArea function should have been > called, which does everything in graphics memory with the GPU. This makes a lot of sense. As Keith and Carl mentioned once, there is code in the server-side implementation of RENDER that needs to detect that it is copying a non-alpha pixmap to another pixmap, and it can just use CopyArea instead of copying the pixels by hand. We *always* used XCopyArea() from the client in the past, since we didn't use Cairo. Even then, however, we had no guarantee that both pixmaps would be in graphics memory. Maybe we got lucky in that both pixmaps always lived in the graphics card. Or does CopyArea have some magic that makes copies fast from graphics memory to plain memory?
(In reply to comment #8) > Updated big-fill.c > > OK, my example program has hardcoded values for my particular screen resolution > (1400x1050) :) I noticed, and that was actually good, because it moved me into the right direction (by not exposing the bug to me first ;) (In reply to comment #9) > This makes a lot of sense. As Keith and Carl mentioned once, there is code in > the server-side implementation of RENDER that needs to detect that it is copying > a non-alpha pixmap to another pixmap, and it can just use CopyArea instead of > copying the pixels by hand. Actually, that is (partially) already done in XComposite, Cairo only hit a corner case that wasn't accelerated. I would call this a bug in Cairo as well, 1st) because it uses PictOpOver with a source without alpha and without a mask (should do PictOpSrc in this case), 2nd) because it enables repeat even for 1:1 copies. Anyone willing to report this to Cairo guys? I know David is working on glitz, but this should be detected and worked around earlier in the library. Anyway, patch for Xorg pending which makes this path fast and the code ugly (there is a *long* if() statement). > Cairo. Even then, however, we had no guarantee that both pixmaps would be in > graphics memory. Maybe we got lucky in that both pixmaps always lived in the > graphics card. Or does CopyArea have some magic that makes copies fast from > graphics memory to plain memory? As long as the pixmap was not wider than the framebuffer, you would typically get them in gfx memory, using the fully accelerated CopyArea. Typically.
Created attachment 4547 [details] [review] Proposed patch for improving XAAComposite Fastpath This patch improves the Fastpath from XAAComposite so that this corner case (yes, it is a corner case) is accelerated as well. This patch should be discussed here, as I'm not 100% sure about the meaning of some elements (pDrawable->x & y). As the slow path is definitively broken WRT some others (see bug #5796), this shouldn't create any regressions, though. Results here (Radeon 7500): Slow path (Framebuffer width < Pixmap width): 37ms Fast path (Framebuffer width >= Pixmap width): without patch: 900-1200ms with patch: 1-1.2ms Acceleration factor: approx. 1000 =-)
If noone objects I will submit this to CVS during the next week.
(In reply to comment #10) > I would call this a bug in Cairo as well, 1st) because it uses PictOpOver with a > source without alpha and without a mask (should do PictOpSrc in this case), 2nd) > because it enables repeat even for 1:1 copies. The first one, yes, cairo could drop to PictOpSrc if the source surface has no alpha even if the user selects CAIRO_OPERATOR_OVER. As for repeat, I don't think cairo sets that unless the application asks for it. I suppose we could do some fairly complicated check to determine that it won't actually _need_ repeating, but I think this would actually be more robust either above in the application (eg. nautilus), or below in the X server. In either of those cases there are fewer coodinate systems involved so it's going to be easier to get the will-need-repeat analysis correct. > Anyone willing to report this to Cairo guys? I'm following this bug at least. If you'd like to report anything more focused and cairo-specific in cairo's bugzilla, then I'd be happy to track them as well. -Carl
(In reply to comment #13) > As for repeat, I don't think cairo sets that unless the application asks for it. I didn't do an in-depth analysis of big-fill, but in the Xserver I get a Composite request with src_repeat set. Maybe Federico can comment on what his source is doing here. > I suppose we could do some fairly complicated check to determine that it won't > actually _need_ repeating, but I think this would actually be more robust either > above in the application (eg. nautilus), or below in the X server. In either of > those cases there are fewer coodinate systems involved so it's going to be > easier to get the will-need-repeat analysis correct. Right, but the current test in the Xserver will only work as well as long as not transformations are involved either. I agree it is the application which should *not* use repeat if possible at all. > > Anyone willing to report this to Cairo guys? > > I'm following this bug at least. If you'd like to report anything more focused > and cairo-specific in cairo's bugzilla, then I'd be happy to track them as well. I won't have time the next few weeks, so I can only leave this to you folks. It all depends on how bugzilla-focused your development system is. It's no time critical thing as we will have a workaround in the Xserver for the most common case now.
That patch is very interesting. Thanks a lot for cooking it, Matthias :) Will it also handle the case where we *do* need to repeat the source? You can test this by making the back_pixmap smaller than temp_pixmap in big-fill.c - just make it 300x300 or so. Carl: you asked me to test OPERATOR_SOURCE in big-fill.c, and it made no difference.
(In reply to comment #15) > That patch is very interesting. Thanks a lot for cooking it, Matthias :) > > Will it also handle the case where we *do* need to repeat the source? You can > test this by making the back_pixmap smaller than temp_pixmap in big-fill.c - > just make it 300x300 or so. There should be a similar fix possible for the X server to handle the repeat case, (which is to make the server act the same as if doing an XFillRectangle under the influence of XSetFillStyle(..., FillTiled). > Carl: you asked me to test OPERATOR_SOURCE in big-fill.c, and it made no difference. OK. We already have cairo-side optimizations for both non-repeat and repeating cases. The test for the non-repeating case looks like this: if (!have_mask && is_integer_translation && src_attr->extend == CAIRO_EXTEND_NONE && !needs_alpha_composite && _surfaces_compatible(src, dst)) { return DO_XCOPYAREA; } where needs_alpha_composite is set by: if (op == CAIRO_OPERATOR_SOURCE || (!surface_has_alpha && (op == CAIRO_OPERATOR_OVER || op == CAIRO_OPERATOR_ATOP || op == CAIRO_OPERATOR_IN))) return FALSE; and hopefully the surface_has_alpha flag is being set properly, (your change to use CAIRO_OPERATOR_SOURCE suggests that surface_has_alpha is not the problem). When the tests above succeed, cairo uses XCopyArea instead of XRenderComposite. For the repeat case, the test in cairo looks like: if (is_integer_translation && src_attr->extend == CAIRO_EXTEND_REPEAT && (src->width != 1 || src->height != 1)) { if (!have_mask && !needs_alpha_composite && _surfaces_compatible (dst, src)) { return DO_XTILE; } return DO_UNSUPPORTED; } And if this succeeds then instead of XRenderComposite, cairo calls into: XSetTSOrigin (dst->dpy, dst->gc, - (itx + src_attr.x_offset), - (ity + src_attr.y_offset)); XSetTile (dst->dpy, dst->gc, src->drawable); XSetFillStyle (dst->dpy, dst->gc, FillTiled); XFillRectangle (dst->dpy, dst->drawable, dst->gc, dst_x, dst_y, width, height); So, if you're getting XRenderComposite called in either of these cases, there is a bug in some of the logic that feeds the tests I mention above. Beyond that, even if cairo didn't do any of these optimizations, it would be good if the server optimized these cases itself. Matthias has proposed a patch for the XCopyArea case, and something more should be needed for the FillTiled case.
If I turn off CAIRO_EXTEND_REPEAT in big-fill.c, it becomes fast. GTK+ needs REPEAT turned on in the problematic code path in Nautilus for the following reason. When it gets an exposure event, GTK+ creates a temporary pixmap for double-buffering. The desktop window (almost the size of the root window) has a background pixmap with a photo of your dog. GTK+ clears this temporary double-buffer pixmap by filling it with the background pixmap, using a Cairo pattern, cairo_rectangle(), cairo_fill(). You can see where this is going. I'm putting a workaround in GTK+ for this (at least in our Novell package); I'll attach it in a second.
(In reply to comment #16) > > For the repeat case, the test in cairo looks like: > if (is_integer_translation && > src_attr->extend == CAIRO_EXTEND_REPEAT && > (src->width != 1 || src->height != 1)) > { > if (!have_mask && > !needs_alpha_composite && > _surfaces_compatible (dst, src)) > { > return DO_XTILE; > } > > return DO_UNSUPPORTED; > } Actually, there's a subtle issue here. If the server is characterized as not having "buggy_repeat" (that is Xorg > 6.8.2 or XFree86 4.5.0) then after the XCopyArea optimization check cairo currently decides to go immediately to XRenderComposite before even checking to see if FillTile with XFillRectangle might do the trick. We can certainly change that if we have evidence that it will help. Oddly enough, I have access to a machine now (with Xorg 7.0.0) for which making cairo use either of XCopyArea, XFillRectangle is never faster than using XRenderComposite---always resulting in more than 2 seconds of time for the operation. This is with the big-fill test case, with CAIRO_EXTEND_REPEAT or CAIRO_EXTEND_NONE. I'm not sure why the X server would always be so slow, but I also don't know what more cairo could do in a situation like this. -Carl I'm surprised to see all of those paths > And if this succeeds then instead of XRenderComposite, cairo calls into: > > XSetTSOrigin (dst->dpy, dst->gc, > - (itx + src_attr.x_offset), - (ity + src_attr.y_offset)); > XSetTile (dst->dpy, dst->gc, src->drawable); > XSetFillStyle (dst->dpy, dst->gc, FillTiled); > > XFillRectangle (dst->dpy, dst->drawable, dst->gc, > dst_x, dst_y, width, height); > > So, if you're getting XRenderComposite called in either of these cases, there is > a bug in some of the logic that feeds the tests I mention above. > > Beyond that, even if cairo didn't do any of these optimizations, it would be > good if the server optimized these cases itself. Matthias has proposed a patch > for the XCopyArea case, and something more should be needed for the FillTiled case. > >
(In reply to comment #16) > > Will it also handle the case where we *do* need to repeat the source? You can > > test this by making the back_pixmap smaller than temp_pixmap in big-fill.c - > > just make it 300x300 or so. > > There should be a similar fix possible for the X server to handle the repeat > case, (which is to make the server act the same as if doing an XFillRectangle > under the influence of XSetFillStyle(..., FillTiled). I can look into this. However, AFAIR this is often not accelerated. So perhaps we should check for the size of the source tile and according to some heuristics just call multiple XCopyArea instead (which are likely to be accelerated). > > Carl: you asked me to test OPERATOR_SOURCE in big-fill.c, and it made no > difference. Sure, as repeat is on and that case was not handled in the PictOpSrc case as well. > We already have cairo-side optimizations for both non-repeat and repeating > cases. The test for the non-repeating case looks like this: AFAICS the optimization in X has been exactly the same, so it was actually never to be triggered :-P > where needs_alpha_composite is set by: > > if (op == CAIRO_OPERATOR_SOURCE || > (!surface_has_alpha && > (op == CAIRO_OPERATOR_OVER || > op == CAIRO_OPERATOR_ATOP || > op == CAIRO_OPERATOR_IN))) > return FALSE; You could actually just set op to CAIRO_OPERATOR_SOURCE if this if-statement is true. That would help for all cases. > For the repeat case, the test in cairo looks like: [...] > And if this succeeds then instead of XRenderComposite, cairo calls into: > > XSetTSOrigin (dst->dpy, dst->gc, > - (itx + src_attr.x_offset), - (ity + src_attr.y_offset)); > XSetTile (dst->dpy, dst->gc, src->drawable); > XSetFillStyle (dst->dpy, dst->gc, FillTiled); > > XFillRectangle (dst->dpy, dst->drawable, dst->gc, > dst_x, dst_y, width, height); This can easily be not accelerated in the driver. Checking for potential acceleration is not that trivial (at least I don't know how to do it right now). I'll discuss this at Xorg Developer Conference. > Beyond that, even if cairo didn't do any of these optimizations, it would be > good if the server optimized these cases itself. Matthias has proposed a patch > for the XCopyArea case, and something more should be needed for the FillTiled case. Personally I feel this should always be something for the Xserver, because 1) it would help non-cairo applications as well 2nd) the Xserver can more easily decide whether some accerlation is available and either use it or work around it. (In reply to comment #18) > Actually, there's a subtle issue here. If the server is characterized as not > having "buggy_repeat" (that is Xorg > 6.8.2 or XFree86 4.5.0) then after the > XCopyArea optimization check cairo currently decides to go immediately to > XRenderComposite before even checking to see if FillTile with XFillRectangle > might do the trick. > > We can certainly change that if we have evidence that it will help. I'm not entirely sure. Though it certainly wouldn't do no harm in the current situation - except for EXA driver maybe. > Oddly enough, I have access to a machine now (with Xorg 7.0.0) for which making > cairo use either of XCopyArea, XFillRectangle is never faster than using > XRenderComposite---always resulting in more than 2 seconds of time for the > operation. This is with the big-fill test case, with CAIRO_EXTEND_REPEAT or > CAIRO_EXTEND_NONE. Can you check that machine whether it uses EXA? AFAIK EXA currently *only* accelerates Render. > I'm not sure why the X server would always be so slow, but I also don't know > what more cairo could do in a situation like this. I guess the best thing would be to have the decision in the Xserver and cairo should always do Composite. It should also do some trivial optimizations like OPERATOR_SOURCE and removing REPEAT if not necessary, but the Xserver should these itself as well. However, we should first get these optimizations in the Xserver, and remove these acceleration hacks in cairo after they the optimizations have been established. May even something like another version check.
(In reply to comment #19) > > I guess the best thing would be to have the decision in the Xserver and cairo > should always do Composite. Historically, that's what cairo has done. The difficulty is when Composite is actually slower than alternate, existing code paths in deployed servers. That's the justification for things like cairo's XCopyArea optimization. Putting stuff like this into cairo is "dangerous" though as it's not future proof, and may end up doing something slower in the future as new Render acceleration is added while the support for old core requests stagnates. So, as you said, the trick is in how one can characterize the performance of Render. > However, we should first get these optimizations in the Xserver, and remove > these acceleration hacks in cairo after they the optimizations have been > established. May even something like another version check. Makes sense to me. I think it would be legitimate for Render versions to advertise things like "Any version of Render >= X.Y.Z optimizes any OVER compositing with an opaque source and an identity transformation as a simple copy" or whatever. That would at least provide some guidance for making decisions in things like cairo.
Hi, Carl was playing around remotely on my laptop. The X server he was playing on was using XAA in a dual head (merged fb) setup. At the time there was another X server on a different VT, that was also using XAA but had the the NoOffscreenPixmaps option enabled. The X server Carl was using had NoOffscreenPixmaps disabled (the default).
(In reply to comment #12) > If noone objects I will submit this to CVS during the next week. trivially correct; please commit.
(In reply to comment #19) > Can you check that machine whether it uses EXA? AFAIK EXA currently *only* > accelerates Render. No, it accelerates solid fills and XCopyArea-style blits as well. - ajax
(In reply to comment #20) > Makes sense to me. I think it would be legitimate for Render versions to > advertise things like "Any version of Render >= X.Y.Z optimizes any OVER > compositing with an opaque source and an identity transformation as a simple > copy" or whatever. That would at least provide some guidance for making > decisions in things like cairo. I go back and forth on this issue and I usually land on the side of not advertising these sorts of details. There's a few reasons: - We're not the only Render implementation in the world (XiG for example) - Acceleration architecture has significant impact on what paths are considered optimized - OpenGL doesn't - It's completely inappropriate to overload the API or protocol version numbers to indicate performance hints I might accept advertising something like a build number, that when combined with the server's existing version information would give a fairly good idea of what the performance characteristics are. Although, to get the complete profile you'd have to have a fairly long tuple: server build, Render build, acceleration architecture name and build, driver name and build, and hardware subclass. There's probably some good justification for exporting this sort of information from the server but it seems far beyond our scope here. For this bug in particular I'm inclined to say "don't use a broken X" and rely on the distro path to get patches backported. This seems a clear candidate for the next stable server release, for example. And for what it's worth I'm working on translating a slightly more general form of this optimization to EXA, which already has a Composite op reduction stage of a sort.
It looks like there's a minor issue in the diff. Remember that an x8r8g8b8 source must be treated as if alpha was 1.0, so drawing that to an a8r8g8b8 dst using a straight copy for Over or Src would be wrong, as the dest wouldn't end up with a correct alpha channel. Other than that, it looks good to me.
*** Bug 5289 has been marked as a duplicate of this bug. ***
Finally updated patch and commited it. I also think I can commit the following fastpath corner case improvement (as x is not specified in xrgb), even though it *will* behave differently compared to fb (x is initialized with 0xff in fbStore_x8r8g8b8): RCS file: /cvs/xorg/xserver/xorg/hw/xfree86/xaa/xaaPict.c,v retrieving revision 1.12 diff -u -p -r1.12 xaaPict.c --- hw/xfree86/xaa/xaaPict.c 11 May 2006 10:18:08 -0000 1.12 +++ hw/xfree86/xaa/xaaPict.c 11 May 2006 10:21:57 -0000 @@ -516,7 +516,10 @@ XAAComposite (CARD8 op, (!pSrc->repeat || (xSrc >= 0 && ySrc >= 0 && xSrc+width<=pSrc->pDrawable->width && ySrc+height<=pSrc->pDrawable->height)) && - ((op == PictOpSrc && pSrc->format == pDst->format) || + ((op == PictOpSrc && + ((pSrc->format==pDst->format) || + (pSrc->format==PICT_a8r8g8b8 && pDst->format==PICT_x8r8g8b8) || + (pSrc->format==PICT_a8b8g8r8 && pDst->format==PICT_x8b8g8r8))) || (op == PictOpOver && !pSrc->alphaMap && !pDst->alphaMap && pSrc->format==pDst->format && (pSrc->format==PICT_x8r8g8b8 || pSrc->format==PICT_x8b8g8r8))))
Ajax, shall I submit to 7.1 just now before RC3?
Committed to git. (In reply to comment #16) > Beyond that, even if cairo didn't do any of these optimizations, it would be > good if the server optimized these cases itself. Matthias has proposed a patch > for the XCopyArea case, and something more should be needed for the FillTiled case. Leaving this bug open for the FillTiled case.
What Frederic described sounds related to my infamous "very strange radeon behaviour" problem. If I start Xorg, 2D ist slow. Moving the Terminals etc. is slow. But now the strange thing: If I run glxinfo it's fast(er)... for a while.. Some operations (after "speeding it up" with glxinfo) seem to put the r200 back into the "slow mode". I seem to hit these specific operations at least every few minutes or so. Especially QT apps like Konqueror exhibit this phenomenon. On some webpages the scrolling performance is ultra slow, but after running glxinfo while on the site its fast again. The same strange performance bug haunts Composite/xcompmgr. Only there the difference is more extreme. I have the feeling Gnome applications are more immune to this problem but I'm not 100% sure. I reported the problem a while ago but nobody seemed to understand what I meant (perhaps my bad english). My card is a Radeon RV280 (rev 01) on AMD64. I hope this helps..
I forgot to mention that I of course tested the latest GIT version.
(In reply to comment #30) > If I start Xorg, 2D ist slow. Moving the Terminals etc. is slow. But now the > strange thing: If I run glxinfo it's fast(er)... for a while.. > > Some operations (after "speeding it up" with glxinfo) seem to put the r200 > back into the "slow mode". I seem to hit these specific operations at least > every few minutes or so. This seems to be related to how full of pixmaps your video memory is. Here's a cumbersome but more or less reliable way to reproduce the slowdown: 1. Start X. See that "moving terminals" is fast. 2. Start Firefox. Load one of those pages with a *ton* of photos. Wink wink, you know what kind of page. Or open many tabs with such pages (different pages, so that they show different images). Since Firefox keeps pixmaps for all the images that are displayed on all its open tabs, it's easy to fill up VRAM like this. 3. See that "moving terminals" is slow again. I have no idea if running glxinfo will "make it fast" again.
I don't think these last few comments are the same issue.
Sorry about the phenomenal bug spam, guys. Adding xorg-team@ to the QA contact so bugs don't get lost in future.
now that evince uses cairo I have a problem very similar to this: when opening a pdf and moving a terminal over it, everything gets stucked and the cpu is used so much! the same thing happens when scrolling the I also have the firefox scrolling problem in pages with big images And I also have big performance issues when using compiz Can my problems be related to this bug? (especially the evince one: it is vital to see pdfs!)
The original patch appears in git (ea5e0eab), XAA is (hopefully) deprecated for most people, no followup patches have been posted, and "recent" comments appear to be horribly off-track. Closing; if anybody has further patches, send them to the mailing list.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.