Summary: | Performance of cairo is not good without XRender extension | ||
---|---|---|---|
Product: | cairo | Reporter: | Ginn Chen <ginn.chen> |
Component: | xlib backend | Assignee: | Ginn Chen <ginn.chen> |
Status: | RESOLVED FIXED | QA Contact: | cairo-bugs mailing list <cairo-bugs> |
Severity: | enhancement | ||
Priority: | medium | CC: | brian.cameron, brian.lu, elaine.xiong |
Version: | 1.4.10 | ||
Hardware: | SPARC | ||
OS: | Solaris | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Speedup Frequency Chart, Firefox 3 time / Firefox 2 time * 100%
Using medialib if possible Modified according to Ginn's comments no-medialib improvement patch for xlib-surface patch for xlib-surface (addressing Behdad Esfahbod's comments) |
Description
Ginn Chen
2007-07-10 05:02:45 UTC
This issue is preventing user to use Firefox 3 on machine which doesn't have XRender extension support. (In reply to comment #0) > I can install a specific video adapter on my SPARC box to turn XRender on, and > give more test results. Do you really need to install additional hardware to "turn XRender on"? Doesn't XSun have a pure software implementation of XRender? > I think there's some work can be done in cairo lib to get the performance > better. In the general sense, yes, but in this particular case, I don't think so. The main issue here isn't cairo, although it is involved in the problem. That is, cairo (by design) tries to use the underlying graphics system where possible (win32 on windows, quartz on mac, etc). The problem here is that cairo is using an X server without modern rendering capabilities (XRender is 7 years old now). The poor performance is _primarily_ due to your X server, not cairo. Cairo was designed to match the semantics of (and thus take advantage of) the X Render extension. Compatibility with XRender-less X servers was implemented to ease the transition, but making cairo _fast_ on XRender-less X servers has never been a goal, AFAIK. (In reply to comment #2) > The poor performance is _primarily_ due to your X server, not cairo. > > Cairo was designed to match the semantics of (and thus take advantage of) the X > Render extension. Compatibility with XRender-less X servers was implemented to > ease the transition, but making cairo _fast_ on XRender-less X servers has > never been a goal, AFAIK. Perfectly said, Daniel. We'd gladly accept patches to improve the performance, but it's not something that I'm aware that anyone is planning to work on. Closing as not a bug. -Carl One thing we noticed in looking into this is when Xrender is not used, cairo is really bad about not caching GraphicContexts. When Xrender is present, it seems to nicely cache them, but when it goes into the fallback logic it always creates new graphics contexts over and over. This might be an improvement opportunity that wouldn't be too difficult to fix. Just a comment, even though I see this bug is already closed. We'll happily accept patches! (In reply to comment #4) > One thing we noticed in looking into this is when Xrender is not used, cairo is > really bad about not caching GraphicContexts. Unfortunately, recreating GCs probably accounts for only a small portion of the ~2x slowdown that is reported here. But it would be nice to be wrong. If you're serious about working on performance issues, I highly recommend a profile-based approach over the "hmmm...that code looks slow" approach. I'm surprised that we don't get many dtrace profiles from you Sun folks in these performance bug reports. Dtrace is the bomb! XSun only support XRender for some video adapter, e.g. m64, XVR-100. For modern PCI-E cards, like XVR-300, XVR-2500, they're not supported yet. Elaine Xiong at Sun did some testing with the Sun collect performance analysis tool and also used an X traffic recording tool to identify what is going on when Xrender is not being used. Even though people aren't planning on fixing this problem in the near term, does it make sense to close the bug? Perhaps this bug report should be left open so that people can discuss their findings and perhaps at some point enough understanding will show up that some low hanging fruit could be addressed. At Sun, we currently support the Xorg Xserver on the x86 platform and the Xsun Xserver on the Sparc platform. We are eventually (probably over the next year) going to migrate from Xsun to Xorg and the performance problems that Sparc users see will go away. However, any attention to improving performance would make Solaris Sparc users happier in the meantime. Note that Xsun does support Xrender on some graphics cards, but only a handful. Also it is not on by default, so users need to turn it on by hand to improve GNOME performance. Not quite ideal, but we are slowly making progress in this ara. (In reply to comment #7) > XSun only support XRender for some video adapter, e.g. m64, XVR-100. > > For modern PCI-E cards, like XVR-300, XVR-2500, they're not supported yet. IMHO, that is the real bug here. XRender support should not be so closely tied to the hardware. There are mechanisms and implementations available for supporting XRender purely in software, or am I missing something? (In reply to comment #8) > Even though people aren't planning on fixing this problem in the near term, > does it make sense to close the bug? Perhaps this bug report should be left > open so that people can discuss their findings and perhaps at some point enough > understanding will show up that some low hanging fruit could be addressed. Open or closed, we're still discussing. :-) But sure, if you think there might be more useful attention brought to the issue by leaving it open, here you go. It's definitely accurate that "Performance of cairo is not good without XRender extension", so I'll just note this as an enhancement request. Have fun with cairo, everybody, -Carl I switched to XVR-100 and redo the test. Ultra45, 2*1.6GHz SPARC CPU, 4G Mem, XVR-100 Firefox 2 with XRender off 1083.3ms Firefox 2 with XRender on 1045.4ms Firefox 3 with XRender on 862.4ms Created attachment 10676 [details]
Speedup Frequency Chart, Firefox 3 time / Firefox 2 time * 100%
Speedup Frequency Chart, Firefox 3 time / Firefox 2 time * 100%
X-axis, 100% means no speedup comparing to Firefox 2. <100% means slower, >100% means faster.
Y-axis is the pages in the test suite dropped in this area.
I used collect/analyser tools shipped in SunStudio12 to do some profiling. The most time consuming call is memcpy() and following is the related callstack: memcpy() fbBlt() fbCompositeSrcSrc_nxn() cairo_pixman_composite() cairo_image_surface_composite() cairo_surface_composite() cairo_surface_fallback_composite() cairo_surface_composite() (In reply to comment #13) > I used collect/analyser tools shipped in SunStudio12 to do some profiling. The > most time consuming call is memcpy() and following is the related callstack: That looks about right. Cairo here is filling in where XRender should go. If you're bent on turning cairo into a good implementation of XRender (instead of working on making your X server have a good implementation of XRender), you might want to look into using mediaLib to make that bitblt fast (I'm pretty sure that mediaLib could help make a really smoking version of fbCompositeSrcSrc_nxn). Brian has already looked into mediaLib integration, he just hadn't posted any convincing profiles (e.g. cairo-perf results) to justify it. Perhaps the case discussed here would make a perfect example of how mediaLib makes cairo much faster for a common case. Incidentally, I'm a little disappointed that Sun only open sourced the uninteresting version of mediaLib. I'm still waiting for the useful version to be free'd. It's almost like Sun just wanted to look the part (in the announcement), and hope that people wouldn't notice that they weren't actually open sourcing the real thing. I just hope I'm wrong. Neyond making the blit faster, you'll want to make sure it's doing no more than necessary. Everytime you hit a fallback within cairo it will be doing a sequence of: Allocate temorary image surface XGetImage from X to surface Do software rendering on surface XPutImage from surface to X Discard temporary surface and doing that many times in a loop can be quite damaging to performance, (copying the same data over and over). So an improved version would keep a persistent image surface around and not copy data that hasn't changed. But at that point you might as well just make the application explicitly use cairo's image surface backend directly and explicitly do the XPutImage when results are complete. Or, as mentioned before, just fix the broken X server so that it implements Render itself and cairo never needs to do any fallbacks for this. In response to comment #14 and the related bug #10151. The mediaLib team is planning to also release the "interesting" bits of the project sometime in the near future. There are some issues that they have been working through. I don't know the exact details, but I did ping them on this a few weeks ago and they said it should happen in the next month or so. Just, FYI, it has been my prodding the mediaLib team and the fact that it is used in GTK+ and cairo (at least in our Sun patched versions) that has caused them to recognize the value in moving mediaLib to opensource licensing. I should probably explain the situation about the "fixing the Xserver to use Xrender" situation at Sun. The main problem is that Xrender support requires support in the graphics card driver. On Sparc, Sun currently uses the Xsun Xserver. With this Xserver, the Xrender extension only works with a few graphics cards. Also since Xrender with Xsun is somewhat experimental it is off by default and users need to turn it on manually to enable it. Sun currently uses the Xorg Xserver on x86 machines, and we are migrating to using Xorg on Sparc, but this migration will likely take a long time (probably over a year) since Xorg currently has no drivers to support Sparc graphics cards at all and Sun is currently working to write these drivers. It is expected that we won't have reasonable Xorg graphics card coverage for quite some time. So, until the Xorg migration happens, Sparc will continue to be 5% slower due to lack of Xrender support. I'm not sure yet if Sun will decide to simply deal with low performance until the Xorg migration is done, but if there are low-hanging performance improvements that could be made to the cairo code to make it work better without Xrender, then I'm sure we'd be interested in helping to code and/or test. Your comments about what areas of the code we'd need to look into to fix this problem is much appreciated since we are still thinking about whether we should resource improving this performance or just deal with the low performance on Sparc for the next year (or however long it takes for the driver team to fix the issues). At the moment I don't think Sun has immediate plans to release the next version of Solaris, and if they decide to release a version before the Xorg migration is completed, this might become a more important issue. I know that firefox 3.0 performance is quite bad due to lack of Xrender, for example. Also, I wanted you to understand the situation here at Sun, so that you know that if you see any opportunities to speed up this code, it would make us at Sun really happy. And also so you understand that we are working to actually resolve the underlying problems, it just is going to take a bit of time. (In reply to comment #17) > I should probably explain the situation about the "fixing the Xserver to use > Xrender" situation at Sun. > ... Thanks Brian, this is all very good to know, and it sounds encouraging. It seems that the bottom line is that Sun has waited too long to properly support XRender, and now they're getting bitten by it because apps are depending on XRender for decent performance (for good reason; that was the plan that was laid out 7 years ago). I'm glad there is a plan in action at Sun to fix that. And thanks for all your work pushing FLOSS from the "inside"; I'm sure it's a difficult task. :) I just heard word from the mediaLib team that September is the planned date for releasing the full mediaLib code to the public under the CDDL license. Just an update since I pinged them based on your query. They are excited to hear that people from the opensource community are interested in using their code. Yes, I'd agree that Sun is getting bit by the fact that the Xorg migration is taking so long, but I'm not sure how much faster Sun could move on this issue. The problem is that Sun has a lot of internal dependencies on the existing Xsun Xserver (Sun Ray for example) and getting everyone to migrate to Xorg has been a slower and more painful process than you might imagine. But the light at the end of the tunnel is finally starting to appear, though as I said, I think it will likely take another year for things to really switch over. I think I may have hit this bug. What I notice is that running gtk applications (which are using cairo) over a slowish ssh connection to a Xvnc (4.1.2) server is unbearably slow. An xmon dump shows many XGetImage/XPutImage requests. I do not believe the Xvnc server supports Xrender. It's not listed in the XQueryExtension commands dumped by xmon. Anyone have an opinion if this is the same bug? Is there a better Xvnc server than the one I'm using? Created attachment 12418 [details] [review] Using medialib if possible I think USE_MEDIALIB is a better name than HAVE_MLIB_H. Also the change of indent outside #ifdef MLIB should be avoided. Created attachment 12764 [details] [review] Modified according to Ginn's comments Created attachment 12765 [details] [review] no-medialib improvement The mediaLib team recently released it under CDDL, including all VIS/VIS2/MMX/SSE2 code. So now, people on Linux might consider seeing how well these mediaLib tunings improve things on Linux. I have heard that mediaLib builds and runs well on many Linux distros. The mediaLib 2.5 source code is now available for download from: http://www.opensolaris.org/os/downloads/devpro/ The tarball devpro-mlib-src-20071106.tar.bz2 consists of source code for building all versions of libmlib.so.2 and libmlib_sse2.so.2 under /usr/lib. In other words, all VIS/ VIS2 and MMX/SSE2 code for mediaLib is now open sourced. It has over 4000 files, or 2.4 million lines of code. More information about mediaLib can be found at: http://www.sun.com/processors/vis/mlib.html Please send your questions and comments on mediaLib to: (In reply to comment #25) > The mediaLib team recently released it under CDDL, including all > VIS/VIS2/MMX/SSE2 code. So now, people on Linux might consider seeing how well > these mediaLib tunings improve things on Linux. I have heard that mediaLib > builds and runs well on many Linux distros. That's great news, thanks for the update Brian. Obviously the CDDL poses some license compatibility issues, but all in all this is quite a significant contribution. Thanks for pushing for this. (In reply to comment #23) > Created an attachment (id=12764) [details] > Modified according to Ginn's comments Looking good. How about some cairo-perf results? Note, you should probably run the regression tests and performance tests by running "make check". Note that the scripts src/check-def.sh, src/check-plt.sh and perf/cairo-perf-diff need to be changed so the "#!/bin/sh" at the top is changed to "#!/bin/bash" to work properly on Solaris. Note also, there is an updated medialib patch in bug 10151. The old patch causes regressions. I'll check an updated patch into Solaris spec-files today, so I'd recommend building with that to avoid some of the regression tests failing due to that patch. > Note also, there is an updated medialib patch in bug 10151.
So, to increase your chances of getting all this upstream, I recommend consolidating these two patches (the one here at #11529 and the other at #10151) into a single "stream" of git commits. The first commit in the stream should be the introduction USE_MEDIALIB in the build process (these two patches do so in slightly different ways, BTW). The next commit would be a single application of medialib, with the corresponding speedup. Then another single application with speedup, etc, etc, leaving out any applications that cause regressions.
I think this is all I've got to say about this medialib saga (#11529 and #10151). As long as you're 1) not causing any regressions 2) you have measured significant performance benefits and 3) your patches are in a logical stream of git commits, you should have a good chance of convincing Soeren to accept your contributions. When you're ready, you might want to fire off an email to the list with a subject of something like 'pixman patches, please review' to get Soeren's attention.
I'm done. Hope I've been of some help. Good luck (I have to disappear for the next week or so).
I don't have access to a Sparc machine for testing without the Xrender extension, but if the people who wrote this patch run the regression tests and performance tests and show improvement, then I'd be happy to merge this patch with the patch from the other bug (#10151) and provide a single patch for consideration. I think the configure.ac check from the other patch (where we actually validate a function is in the expected mlib library) is a bit better than this one (just checking for the header file). I also understand pixman is being extracted into a separate library, so is there a bugzilla database to provide patches for that library somewhere? I've made the #!/bin/sh -> #!/bin/bash changes---thanks for pointing them out. As for a pixman bugzilla---this one is just fine. You can change the product on this bug entry from cairo to pixman. Then you'll simply want to make sure your patches apply to the separated pixman and test there. -Carl Some history: the scripts used to run through /bin/bash, but then some weird Unix didn't have bash, but its /bin/sh was well capable of running the scripts, so we switched. I probably just rewrite the bashism... I tested the patch in firefox trunk code. There is about 9% performance improvement on my SunBlade 1500 (sparc machine). (In reply to comment #28) > Note, you should probably run the regression tests and performance tests by > running "make check". Note that the scripts src/check-def.sh, src/check-plt.sh > and perf/cairo-perf-diff need to be changed so the "#!/bin/sh" at the top is > changed to "#!/bin/bash" to work properly on Solaris. > I can't find cairo-pref-diff in latest cairo code base. Can you tell where is it and how to run regression testing? I'm not familiar with cairo code base. Thanks Steps for running regression tests: 1) Go into the test directory of the cairo module 2) Run "make check" - compare the output with and without this patch. The same number of tests should pass with & without the patch. Steps for running performance tests 1) Go into the perf directory of the cairo module 2) Run "make cairo-perf-diff-files" to build the cairo-perf-diff-files utility" 3) Change "#!/bin/sh" on the first line of cairo-perf-diff to "#!/bin/bash" since this is needed to run it on Solaris 4) Run "make check" with and without your patch applied, redirect the output to a file like this "make check > perf-without-patch.txt" and "make check > perf-with-patch.txt" 5) Run "cairo-perf-diff-files" passing it the two output files. The first argument should be the output file without the patch and the second argument should be the output file with the patch. If you reverse these, then it will show a slowdown instead of a speedup. For example: "cairo-perf-diff files perf-without-patch.txt perf-with-patch.txt". This should generate a report showing you the speedup/slowdown of the change. I'd attach the output of #2, and also let us know whether test #1 shows any tests failing. If any tests fail with the patch which work okay without the patch, then we need to figure out why before this can go upstream. Does this help? (In reply to comment #35) > Steps for running regression tests: > > 1) Go into the test directory of the cairo module > 2) Run "make check" > > - compare the output with and without this patch. The same number of tests > should pass with & without the patch. > > Steps for running performance tests > > 1) Go into the perf directory of the cairo module > 2) Run "make cairo-perf-diff-files" to build the cairo-perf-diff-files utility" > 3) Change "#!/bin/sh" on the first line of cairo-perf-diff to "#!/bin/bash" > since this is needed to run it on Solaris The latest cairo-perf-diff has changed to "#!/bin/bash", we don't need to do this any more. > 4) Run "make check" with and without your patch applied, redirect the output > to a file like this "make check > perf-without-patch.txt" and "make check > > perf-with-patch.txt" > I run the command on the latest cairo code base without these patches and got the following error message: X server does not have the Render extension. Error: Failed to create target surface: xlib make: Fatal error: Command failed for target `check-TESTS' Current working directory /export/home/brian/cairo-ws/cairo/perf make: Fatal error: Command failed for target `check-am' I checked the cairo-perf.c and found that cairo-perf can't run on an X system without Render extension. Is there any workaround? Thanks a lot Created attachment 16734 [details] [review] patch for xlib-surface This approach doesn't use medialib, and improves the performance of Firefox 3 greatly. 1) fallback lately in _cairo_xlib_surface_composite, so that we can use XCopyArea when RENDER is not available 2) use XFillRectange for _cairo_xlib_surface_fill_rectangles. It's much faster that the fall back composite code. e.g. if we're drawing widget borders by fill_rectangles, the fall back code will do XGetImage and XPutImage for the whole widget extents. Use this patch along with the patch on https://bugzilla.mozilla.org/show_bug.cgi?id=435739 will improve Firefox 3 by several times. I also have a patch for cairo-surface-fallback.c. The idea is we don't need get the real image data for cairo_surface_acquire_dest_image() in some cases, e.g. OP_CLEAR. Then we can save a call to XGetImage(). But I don't know how to write this patch nicely, cairo_op is not passed through _fallback_init(). Comment on attachment 16734 [details] [review] patch for xlib-surface Looks great. Thanks! Comments below. >diff --git a/src/cairo-xlib-surface.c b/src/cairo-xlib-surface.c >index ec66c4d..a80a265 100644 >--- a/src/cairo-xlib-surface.c >+++ b/src/cairo-xlib-surface.c >@@ -1475,9 +1475,6 @@ _cairo_xlib_surface_composite (cairo_operator_t op, > > _cairo_xlib_display_notify (dst->screen_info->display); > >- if (!CAIRO_SURFACE_RENDER_HAS_COMPOSITE (dst)) >- return CAIRO_INT_STATUS_UNSUPPORTED; >- > operation = _categorize_composite_operation (dst, op, src_pattern, > mask_pattern != NULL); > if (operation == DO_UNSUPPORTED) >@@ -1514,6 +1511,11 @@ _cairo_xlib_surface_composite (cairo_operator_t op, > switch (operation) > { > case DO_RENDER: >+ if (!CAIRO_SURFACE_RENDER_HAS_COMPOSITE (dst)) { >+ status = CAIRO_INT_STATUS_UNSUPPORTED; >+ goto BAIL; >+ } You actually don't need this block. Check the bottom of _recategorize_composite_operation(). It already does that check. So, just remove the check and we're better. > status = _cairo_xlib_surface_set_attributes (src, &src_attr); > if (status) > goto BAIL; >@@ -1629,8 +1631,60 @@ _cairo_xlib_surface_fill_rectangles (void *abstract_surface, > > _cairo_xlib_display_notify (surface->screen_info->display); > >- if (!CAIRO_SURFACE_RENDER_HAS_FILL_RECTANGLE (surface)) >+ if (!CAIRO_SURFACE_RENDER_HAS_FILL_RECTANGLE (surface)) { >+ if (op == CAIRO_OPERATOR_CLEAR || >+ ((op == CAIRO_OPERATOR_SOURCE || op == CAIRO_OPERATOR_OVER) && >+ (color->alpha_short >= 0xff00))) { Use CAIRO_ALPHA_SHORT_IS_OPAQUE here. >+ XGCValues gcv; >+ int a_width=0, r_width=0, g_width=0, b_width=0; >+ int a_shift=0, r_shift=0, g_shift=0, b_shift=0; >+ int a = color->alpha_short >> 8; >+ int r = color->red_short >> 8; >+ int g = color->green_short >> 8; >+ int b = color->blue_short >> 8; >+ >+ if (surface->visual->class == TrueColor) { >+ _characterize_field (surface->a_mask, &a_width, &a_shift); >+ _characterize_field (surface->r_mask, &r_width, &r_shift); >+ _characterize_field (surface->g_mask, &g_width, &g_shift); >+ _characterize_field (surface->b_mask, &b_width, &b_shift); >+ gcv.foreground = (_field_from_8 (a, a_width, a_shift) | >+ _field_from_8 (r, r_width, r_shift) | >+ _field_from_8 (g, g_width, g_shift) | >+ _field_from_8 (b, b_width, b_shift)); >+ } else { >+ cairo_xlib_visual_info_t *visual_info; >+ cairo_int_status_t status; >+ >+ status = _cairo_xlib_screen_get_visual_info (surface->screen_info, >+ surface->visual, >+ &visual_info); >+ if (status) >+ return CAIRO_INT_STATUS_UNSUPPORTED; >+ >+ gcv.foreground = >+ visual_info->rgb333_to_pseudocolor[_field_from_8 (r, 3, 6) | >+ _field_from_8 (g, 3, 3) | >+ _field_from_8 (b, 3, 0)]; >+ } >+ GC xgc = XCreateGC (surface->dpy, surface->drawable, GCForeground, >+ &gcv); >+ if (!xgc) >+ return _cairo_error (CAIRO_STATUS_NO_MEMORY); >+ >+ for (i = 0; i < num_rects; i++) { >+ XFillRectangle (surface->dpy, surface->drawable, xgc, >+ rects[i].x, rects[i].y, >+ rects[i].width, rects[i].height); >+ } >+ >+ XFreeGC(surface->dpy, xgc); >+ return CAIRO_STATUS_SUCCESS; >+ } >+ > return CAIRO_INT_STATUS_UNSUPPORTED; >+ } Please add the new code in a new function. Otherwise looks good. I'll commit it when an updated patch arrives. Created attachment 16756 [details] [review] patch for xlib-surface (addressing Behdad Esfahbod's comments) Thanks for the comments. Committed after fixing. Seems like the extra XRENDER check was indeed needed. My bad. Anyway, thanks. Should this bug be closed now? Ah, I think both src and dst need to be checked. OK, I agree to close it for now. I'll file separate bugs for other enhancements if necessary. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.