Bug 28067 - Performance regression in synthetic micro-benchmark [due to global stroke tessellation]
Summary: Performance regression in synthetic micro-benchmark [due to global stroke tes...
Status: RESOLVED MOVED
Alias: None
Product: cairo
Classification: Unclassified
Component: image backend (show other bugs)
Version: 1.9.6
Hardware: Other All
: medium normal
Assignee: Carl Worth
QA Contact: cairo-bugs mailing list
URL:
Whiteboard:
Keywords:
: 31589 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-05-11 08:46 UTC by Clemens Eisserer
Modified: 2018-08-25 13:54 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
synthetic micro-benchmark (635 bytes, text/plain)
2010-05-11 11:39 UTC, Clemens Eisserer
Details

Description Clemens Eisserer 2010-05-11 08:46:34 UTC
Hi,

Compared to cairo-1.8 I get quite a serious performance regression running the stupid micro-benchmark attached.

time ./cbench_cairo1_10 
real    0m34.819s
user    0m32.556s
sys     0m1.567s

time ./cbench_cairo1_8
real    0m18.965s
user    0m17.765s
sys     0m0.938s

Guess its caused by the scan rasterizer.

- Clemens
Comment 1 Chris Wilson 2010-05-11 09:15:27 UTC
I'm holding my breath waiting for the benchmark...

I think I can guess which path is underperforming... pixman_shader_t, ftw... ;-)
Comment 2 Clemens Eisserer 2010-05-11 11:39:30 UTC
Created attachment 35570 [details]
synthetic micro-benchmark
Comment 3 Clemens Eisserer 2010-05-11 11:41:11 UTC
sorry, somehow I managed to forget the attachement ;)

I modified the benchmark a bit, now I get:

cairo-1.8    0m8.448s
cairo-1.9.6  0m24.570s

It seems the new approach has problems with complex paths or probably many intersections.
Comment 4 Clemens Eisserer 2010-05-11 11:42:40 UTC
@Chris: By the way, do you see any chance of extending XRender with RLE encoded masks? I guess it should give better performance than the trapezoid approach even there.
Comment 5 Chris Wilson 2010-05-11 11:47:24 UTC
Oddly enough, with lots of intersections like that, the Tor scan rasteriser should be much faster than Bentley-Ottmann. Looks like something has gone very wrong. Clemens, can I use that benchmark under a liberal licence like MIT - then I can include in the synthetic tests.

Passing RLE masks to RENDER is definitely a task to be done, just enhancing RENDER itself is a very low priority compared with making direct rendering (i.e. mesa) fast (and just getting the 2D drivers to work would be a good start!).
Comment 6 Clemens Eisserer 2010-05-11 11:52:49 UTC
sure, MIT licensing is fine for me.
strange, glad the report contains some value after all ;)
Comment 7 Chris Wilson 2010-05-12 13:25:05 UTC
Joonas reminded me that the bigger change between 1.8 and 1.10 that is affecting this benchmark is self-intersection removal. Whilst stroking in 1.8 we would generate a trap for each segment at a time, causing incorrect results on overlapping segments and joints. In 1.10, we generate the mask for all the edges in a single pass, which is much slower as the alogrithms scale in the number of edges and intersections O((n+k) log n) [best case, we suspect that our implementation scales nowhere nearly that well!] but visually much more pleasing.

Short answer, we might not be able to fix the regression because we have chosen correctness over performance here.

Longer answer, changing the stroker has a significant impact on the number of edges and intersections feed into the rasteriser and may recover the lost performance...
Comment 8 Andrea Canciani 2010-08-10 06:14:10 UTC
Replacing the insertion sort with mergesort alleviates this problem (in particular it guarantees that times scale about linearly when increasing the lines in the path to be stroked), but doesn't catch up with 1.8.
See http://cgit.freedesktop.org/cairo/commit/?id=56ea51fdcc273531b5e86b921aad19237a1c9415
Comment 9 M Joonas Pihlaja 2010-11-15 18:27:17 UTC
*** Bug 31589 has been marked as a duplicate of this bug. ***
Comment 10 Chris Wilson 2012-05-01 11:48:14 UTC
(In reply to comment #8)
> Replacing the insertion sort with mergesort alleviates this problem (in
> particular it guarantees that times scale about linearly when increasing the
> lines in the path to be stroked), but doesn't catch up with 1.8.
> See
> http://cgit.freedesktop.org/cairo/commit/?id=56ea51fdcc273531b5e86b921aad19237a1c9415

Andrea, can you port that mergesort to 1.12 and lets see how it performs on the traces.
Comment 11 Chris Wilson 2012-05-01 12:00:15 UTC
cairo-1.8:   4.98user 0.00system 0:04.99elapsed 99%CPU
cairo-1.10: 11.48user 0.00system 0:11.50elapsed 99%CPU
cairo-1.12:  9.91user 0.00system 0:09.92elapsed 99%CPU
Comment 12 Chris Wilson 2012-05-01 15:10:00 UTC
Andrea committed his patch 18 months ago, no wonder I had a strange conflict when trying to apply it!
Comment 13 GitLab Migration User 2018-08-25 13:54:47 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/cairo/cairo/issues/261.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.