Bug 56891 - Very slow rendering on ASUS Nexus 7, ARM, Tegra 3
Summary: Very slow rendering on ASUS Nexus 7, ARM, Tegra 3
Status: RESOLVED MOVED
Alias: None
Product: cairo
Classification: Unclassified
Component: xlib backend (show other bugs)
Version: 1.12.2
Hardware: ARM Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: cairo-bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-08 20:29 UTC by marmuta
Modified: 2018-08-25 13:51 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
cairo-perf results for Onboard starting up with Ambiance theme (1.25 KB, text/plain)
2012-11-08 20:29 UTC, marmuta
Details
cairo-trace --profile onboard (89.95 KB, application/octet-stream)
2012-11-08 20:35 UTC, marmuta
Details
cairo-analyse-trace onboard.trace >onboard_ambiance_nexus7_analyse_trace.txt (1.33 MB, text/plain)
2012-11-09 08:59 UTC, marmuta
Details

Note You need to log in before you can comment on or make changes to this bug.
Description marmuta 2012-11-08 20:29:12 UTC
Created attachment 69773 [details]
cairo-perf results for Onboard starting up with Ambiance theme

I'm investigating why Onboard with the default theme takes upwards of 30s to start up on a Nexus 7 with Ubuntu 12.10. It's barely usable even with the simplest themes, where just a plain filled and stroked rectangle is drawn per key.

A cairo-perf-trace of Onboard starting up shows the Nexus 7 to be ~200x slower than a Sandy Bridge i3 laptop, but single-threaded CPU performance differs just by a factor of roughly 4.5 (see Attachment). Also the cairo image backends are vastly faster than the xcb/xlib one's, which is the reverse of the i3's results.

The Nexus 7 has a quad-core Tegra 3, me thinks the hardware isn't necessarily the limiting factor. Is this a driver issue? Xorg takes close to 100% CPU during heavy cairo rendering with (python-based) Onboard.

Here's the original bug report:
https://bugs.launchpad.net/ubuntu-nexus7/+bug/1070760

gtkperf on the Nexus 7:
GtkPerf 0.40 - Starting testing: Thu Nov  8 18:58:56 2012

GtkEntry - time:  0.77
GtkComboBox - time: 17.09
GtkComboBoxEntry - time:  6.95
GtkSpinButton - time:  2.42
GtkProgressBar - time:  2.91
GtkToggleButton - time:  7.59
GtkCheckButton - time:  2.12
GtkRadioButton - time:  3.66
GtkTextView - Add text - time:  6.94
GtkTextView - Scroll - time:  1.13
GtkDrawingArea - Lines - time: 12.66
GtkDrawingArea - Circles - time: 27.67
GtkDrawingArea - Text - time: 19.19
GtkDrawingArea - Pixbufs - time:  2.31
 --- 
Total time: 113.47

libcairo2 1.12.2-1ubuntu2
nvidia-tegra3 binary Xorg driver 16.0-0ubuntu3
Comment 1 marmuta 2012-11-08 20:35:40 UTC
Created attachment 69774 [details]
cairo-trace --profile onboard
Comment 2 Uli Schlachter 2012-11-08 22:37:41 UTC
Are you sure that you want to report this against cairo-xcb? Because I am pretty sure that next to nothing uses this and certainly ubuntu doesn't by default. So I'm tempted to reassign this to cairo-xlib.
Comment 3 Uli Schlachter 2012-11-08 22:50:16 UTC
Oh and: It might be helpful to run your trace through cairo-analyze-trace (which is in the perf/ subdirectory). That should help identifying the slow operation.
Comment 4 marmuta 2012-11-09 08:44:14 UTC
Thanks, I wasn't sure which one to pick. You're right, the main context has a cairo.XlibSurface as target.

I'll attach the results of cairo-analyse-trace. That lookes very useful. If I understand the percentages right, I'd need to cut down on mask calls first and it was perhaps a good idea to cache gradient strokes and fills.
This is with Onboard trunk, btw.
Comment 5 marmuta 2012-11-09 08:59:00 UTC
Created attachment 69795 [details]
cairo-analyse-trace onboard.trace >onboard_ambiance_nexus7_analyse_trace.txt
Comment 6 Chris Wilson 2012-11-09 09:12:55 UTC
The really interesting aspect of the trace is the where the drawing is nested 10s of levels deep (>50) with all the results being kept around in a stack until they get masked onto the output. It's certainly the first time I've seen that pattern in a trace, and it may indeed be the right approach for you, but it looks inefficient.

Also you have to be aware that you will be limited by driver quality when using cairo-xlib, which is likely to be a contributing factor as well as the hw differences for comparing SNB to the Nexus7.
Comment 7 marmuta 2012-11-09 09:54:50 UTC
Hmm, there shouldn't actually be more than five or so nesting levels, the first probably being GTK's double buffering. How can I read the levels from the trace? I assumed it somehow repeats at Observing '/home/ubuntu/onboard.trace'...., are those nesting levels too?
Most of the nesting happens only occasianally, i.e. on startup, resizing, etc. and is then masked to the main context all other times. Drawing all the keys at once with shadows and gradients turned out too slow even on non-mobile devices, hence all that caching. 

Driver quality is what I'm afraid of. Would that explain why the image backends in the first attachment are so much faster than cairo-xlib?
Comment 8 Chris Wilson 2012-11-09 10:04:03 UTC
My guess for nesting levels comes from reading the trace, and in particular the use of '51 index set-source' implies that are roughly 20 or so surfaces in the stack.

Yes, I use the relative speed of cairo-image to cairo-xlib to judge driver quality. (No driver implementation should be slower than is possible using the cpu...)
Comment 9 marmuta 2012-11-09 16:16:55 UTC
Thanks for the hints, that helped. Apparently deep nesting happens when patterns are shared between contexts. I'm not sure if this was a save thing to do at all. I've switched them to surfaces and now there are no high index values in the trace anymore. 
I had hopes, but unfortunately this did nothing for the performance issue, it may even do slightly worse.

Bypassing the driver does make a difference, though:
$ ./onboard
draw             13635.882ms

$ GDK_RENDERING=image ./onboard
draw              1371.348ms

That number is the initial caching plus the first redraw. That's probably already good enough for the startup, but not yet for all later interaction. From there on python/python-gi show up as bottlenecks too.
Comment 10 GitLab Migration User 2018-08-25 13:51:52 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/cairo/cairo/issues/236.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.