ARM, N800, Hard Float Point.
Xlib backend, 16 depth color.
Cairo have very big problems with image rendering on devices without Hardware Composite and only with TrueColor support.
When I'm trying to render some images with alpha channel (32 bit), on 16 bit surface, and scrolling them, cairo doing ImageComposite for each 1pix/scroll step.
For each composite operation, cairo doing conversation from 32->16, don't do results caching.
Probably this problem can be fixed by supporting 16 depth (see Bug 4945)...
Created attachment 9008 [details]
Small printf/log from cairo-xlib-surface.c, _cairo_xlib_surface_composite
Created attachment 9009 [details]
Oprofile plaintext log.
Hmm... we have 2 different problems:
cairo_surface_backend_t cairo_xlib_surface_backend =
NULL, /* paint */
That means cairo do composite fallback for each paint operation.
2) Composite operation very slow because it required permanent conversation from 32/24->16.
Result: Cairo xlib surface very unhappy on such devices like N800...
I'd like to close this bug, as it's really just a result of several misunderstandings which were hopefully cleared up in this thread:
Here are the general take-home lessons:
1) You shouldn't be compositing with a 16-bit surface in cairo's image backend (your attached oprofile report shows that you are). The cairo image backend is not meant to be used with 16-bit images. You'll want to look into whatever is making this happen. You probably want to convert your images into xlib surfaces and then just deal with compositing xlib surfaces together. This approach also pushes the compositing down into the xserver where it will (should) be faster. As of this moment (cairo git vs. xorg git), it will be significantly faster on the xserver side (see the patches to xorg that I refer to at the bottom of this message).
2) You say that the 32->16 bit color conversion should be cached. That's doesn't make sense, at least not the way you seem to mean it. Perhaps what you want is to turn your images into 16-bit xlib surfaces and do compositing on those, as suggested above? Of course, you'll lose your alpha channel, but any 32->16 bit "caching" would have that same disadvantage.
3) As pointed out in the mailing list thread, the "missing paint operation" isn't a cause for slowdown.
4) As pointed out in the thread, if you can composite your surfaces together once and just scroll that final result, do that instead. The "final result" that I'm referring to would be an xlib surface that is the result of compositing your two surfaces.
5) You might want to look into why fbFetchTransformed is being called. Right now, compositing+transformation can be a cause of slowness. If you're just translating the surface, maybe you can just try a set_source_surface with the offset. I'm really just guessing here, as you haven't provided any code for me to look over.
FYI, I have submitted 2 patches to xorg that speed up 32-bit argb OVER 16-bit rgb that might help you here:
Created attachment 34082 [details] [review]
Updated and more simple patch, fixed painting 16bpp image -> 16bpp xsurface
What about 2x more data transferred in memory bus, just because of 32bpp...? performance is really different when we are using 16bpp and comparing with 32bpp..
That was the main reason of why N900 still using 16bpp depth...
Also in latest mozilla e10s (content process separation) we are going to use shared memory buffers, and paint the with cairo-16bpp image surface.
Also this is about cairo-xlib....
Oh, this is probably wrong bug