Created attachment 69773 [details]
cairo-perf results for Onboard starting up with Ambiance theme
I'm investigating why Onboard with the default theme takes upwards of 30s to start up on a Nexus 7 with Ubuntu 12.10. It's barely usable even with the simplest themes, where just a plain filled and stroked rectangle is drawn per key.
A cairo-perf-trace of Onboard starting up shows the Nexus 7 to be ~200x slower than a Sandy Bridge i3 laptop, but single-threaded CPU performance differs just by a factor of roughly 4.5 (see Attachment). Also the cairo image backends are vastly faster than the xcb/xlib one's, which is the reverse of the i3's results.
The Nexus 7 has a quad-core Tegra 3, me thinks the hardware isn't necessarily the limiting factor. Is this a driver issue? Xorg takes close to 100% CPU during heavy cairo rendering with (python-based) Onboard.
Here's the original bug report:
gtkperf on the Nexus 7:
GtkPerf 0.40 - Starting testing: Thu Nov 8 18:58:56 2012
GtkEntry - time: 0.77
GtkComboBox - time: 17.09
GtkComboBoxEntry - time: 6.95
GtkSpinButton - time: 2.42
GtkProgressBar - time: 2.91
GtkToggleButton - time: 7.59
GtkCheckButton - time: 2.12
GtkRadioButton - time: 3.66
GtkTextView - Add text - time: 6.94
GtkTextView - Scroll - time: 1.13
GtkDrawingArea - Lines - time: 12.66
GtkDrawingArea - Circles - time: 27.67
GtkDrawingArea - Text - time: 19.19
GtkDrawingArea - Pixbufs - time: 2.31
Total time: 113.47
nvidia-tegra3 binary Xorg driver 16.0-0ubuntu3
Created attachment 69774 [details]
cairo-trace --profile onboard
Are you sure that you want to report this against cairo-xcb? Because I am pretty sure that next to nothing uses this and certainly ubuntu doesn't by default. So I'm tempted to reassign this to cairo-xlib.
Oh and: It might be helpful to run your trace through cairo-analyze-trace (which is in the perf/ subdirectory). That should help identifying the slow operation.
Thanks, I wasn't sure which one to pick. You're right, the main context has a cairo.XlibSurface as target.
I'll attach the results of cairo-analyse-trace. That lookes very useful. If I understand the percentages right, I'd need to cut down on mask calls first and it was perhaps a good idea to cache gradient strokes and fills.
This is with Onboard trunk, btw.
Created attachment 69795 [details]
cairo-analyse-trace onboard.trace >onboard_ambiance_nexus7_analyse_trace.txt
The really interesting aspect of the trace is the where the drawing is nested 10s of levels deep (>50) with all the results being kept around in a stack until they get masked onto the output. It's certainly the first time I've seen that pattern in a trace, and it may indeed be the right approach for you, but it looks inefficient.
Also you have to be aware that you will be limited by driver quality when using cairo-xlib, which is likely to be a contributing factor as well as the hw differences for comparing SNB to the Nexus7.
Hmm, there shouldn't actually be more than five or so nesting levels, the first probably being GTK's double buffering. How can I read the levels from the trace? I assumed it somehow repeats at Observing '/home/ubuntu/onboard.trace'...., are those nesting levels too?
Most of the nesting happens only occasianally, i.e. on startup, resizing, etc. and is then masked to the main context all other times. Drawing all the keys at once with shadows and gradients turned out too slow even on non-mobile devices, hence all that caching.
Driver quality is what I'm afraid of. Would that explain why the image backends in the first attachment are so much faster than cairo-xlib?
My guess for nesting levels comes from reading the trace, and in particular the use of '51 index set-source' implies that are roughly 20 or so surfaces in the stack.
Yes, I use the relative speed of cairo-image to cairo-xlib to judge driver quality. (No driver implementation should be slower than is possible using the cpu...)
Thanks for the hints, that helped. Apparently deep nesting happens when patterns are shared between contexts. I'm not sure if this was a save thing to do at all. I've switched them to surfaces and now there are no high index values in the trace anymore.
I had hopes, but unfortunately this did nothing for the performance issue, it may even do slightly worse.
Bypassing the driver does make a difference, though:
$ GDK_RENDERING=image ./onboard
That number is the initial caching plus the first redraw. That's probably already good enough for the startup, but not yet for all later interaction. From there on python/python-gi show up as bottlenecks too.
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/cairo/cairo/issues/236.