Bug 43064

Summary: stroke performance is super-linear in number of points in the path
Product: cairo Reporter: Oleg Bulatov <oleg>
Component: generalAssignee: Carl Worth <cworth>
Status: RESOLVED MOVED QA Contact: cairo-bugs mailing list <cairo-bugs>
Severity: normal    
Priority: medium    
Version: 1.10.2   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: benchmark
gdk benchmark

Description Oleg Bulatov 2011-11-18 05:42:27 UTC
Attached example measure time to connect two points.

On my machine results grow (but should be const):

N_INIT=100; N_STEP=100; N_MAX=1000; REPEATS=40
100 dots: 39840.1923 ns
200 dots: 48038.0220 ns
300 dots: 57573.4119 ns
400 dots: 64636.9643 ns
500 dots: 68067.0950 ns
600 dots: 69748.7817 ns
700 dots: 70620.0463 ns
800 dots: 70686.8542 ns
900 dots: 71738.0177 ns
1000 dots: 72309.4676 ns

N_INIT=1000; N_STEP=1000; N_MAX=10000; REPEATS=40
1000 dots: 72496.1516 ns
2000 dots: 69100.6237 ns
3000 dots: 69073.2320 ns
4000 dots: 69369.9200 ns
5000 dots: 70577.7072 ns
6000 dots: 71680.0731 ns
7000 dots: 73351.9655 ns
8000 dots: 74332.7942 ns
9000 dots: 75775.3052 ns
10000 dots: 77600.2078 ns
Comment 1 Oleg Bulatov 2011-11-18 05:44:23 UTC
Created attachment 53653 [details]
benchmark
Comment 2 Uli Schlachter 2011-11-18 06:37:17 UTC
I might be missing something, but why exactly would you expect the results to be constant? You are doing more function calls, cairo has to handle more points and the resulting path becomes more complex (many self 'intersections').

Also, the visible result is different with drawing the line only once vs drawing it many times due to antialiasing.
Comment 3 Oleg Bulatov 2011-11-18 07:25:08 UTC
Drawing straight line between two points shouldn't depend on its background. I I suspect drawing time for broken line O(N), where N = number of points. If it O(N*sqrt(N)) or something like that, calling stroke and new_path after each point will give O(N) if sublines approximently equal.

For range 1000..6000 dots it is true.

cairo_set_antialias(cr, CAIRO_ANTIALIAS_NONE); do not solve this problem.

This loop works faster for large N:
for (i = 1; i < n; ++i)
{
    cairo_new_path(cr);
    cairo_move_to(cr, (double)(i - 1) / n * WIDTH, data[i - 1]);
    cairo_line_to(cr, (double)i / n * WIDTH, data[i]);
    cairo_stroke(cr);
}

What do you mean by 'intersections'? If line has intersection, then it should be darker here.
Comment 4 Andrea Canciani 2011-11-18 07:54:08 UTC
(In reply to comment #3)
> Drawing straight line between two points shouldn't depend on its background. I
> I suspect drawing time for broken line O(N), where N = number of points. If it
> O(N*sqrt(N)) or something like that, calling stroke and new_path after each

I would wish something like O(N*lg(n)), which IIRC is optimal.
Anything better than that would just mean that you're hitting a special case that's "easier" than the average one.
(The optimal tessellation and segment intersection algorithms are superlinear, hence you should expect superlinear performance when stroking)

Can you please provide references which indicate why you expect stroking to be linear?
(Also, can you perform your test in the range 10000-1000000 step 10000? The second run of results looks quite stable, which might mean that you're actually in one of the special "lucky" cases and the first few runs had different timings because most of the path could fit in faster caches)

> point will give O(N) if sublines approximently equal.
> 
> For range 1000..6000 dots it is true.
> 
> cairo_set_antialias(cr, CAIRO_ANTIALIAS_NONE); do not solve this problem.

Cairo quality and performance when doing non-antialiased drawing can and should improve, but it is mostly unrelated to your problem (drawing an opaque source OVER a destination through a non-antialiased stroke using a non-intersecting/non-antialiased clipping might probably be optimized for your use case... but it is worth it? how often are such conditions going to be met in general software?).

> 
> This loop works faster for large N:
> for (i = 1; i < n; ++i)
> {
>     cairo_new_path(cr);
>     cairo_move_to(cr, (double)(i - 1) / n * WIDTH, data[i - 1]);
>     cairo_line_to(cr, (double)i / n * WIDTH, data[i]);
>     cairo_stroke(cr);
> }

Yes, but it computes a different output.
Example: when using a half opaque color, you draw multiple times where the strokes overlap.
(When using a fully-opaque color, you have the same issue because of antialiasing, but it is much less visible).

> 
> What do you mean by 'intersections'? If line has intersection, then it should
> be darker here.

No, that's not the definition of stroking in Cairo.
When you perform a stroke in Cairo, the library computes which region is affected by the stroke, then draws the source through it.
Comment 5 Oleg Bulatov 2011-11-18 09:06:15 UTC
> Yes, but it computes a different output.
Oh.. Now I understand why it should be O(N*lg(N)) :)

> Also, can you perform your test in the range 10000-1000000 step 10000?
I perform this test in the range 10000-300000 (on 300000 dots it takes 208 seconds, so I stop it).

N_INIT=10000; N_STEP=10000; N_MAX=1000000; REPEATS=1
10000 dots: 77473.6787 ns
20000 dots: 85283.5266 ns
30000 dots: 90020.6350 ns
40000 dots: 94101.4462 ns
50000 dots: 134415.7382 ns
60000 dots: 219809.3580 ns
70000 dots: 295649.2472 ns
80000 dots: 380468.2879 ns
90000 dots: 452315.9048 ns
100000 dots: 523139.9335 ns
110000 dots: 560467.5491 ns
120000 dots: 590674.1382 ns
130000 dots: 601031.4354 ns
140000 dots: 610107.1283 ns
150000 dots: 621255.9912 ns
160000 dots: 636620.0321 ns
170000 dots: 639096.1297 ns
180000 dots: 640634.2400 ns
190000 dots: 649307.6680 ns
200000 dots: 656461.8706 ns
210000 dots: 659141.9941 ns
220000 dots: 658025.4864 ns
230000 dots: 659949.7492 ns
240000 dots: 661589.7778 ns
250000 dots: 663293.6351 ns
260000 dots: 667306.9699 ns
270000 dots: 678863.1729 ns
280000 dots: 676629.7568 ns
290000 dots: 692568.7273 ns
300000 dots: 696615.8439 ns
^C

I can rerun it and get all data if required. But timings grows faster N*lg(N) (memory reallocation?).
Comment 6 Andrea Canciani 2011-11-18 10:32:44 UTC
(In reply to comment #5)
> > Yes, but it computes a different output.
> Oh.. Now I understand why it should be O(N*lg(N)) :)
> 
> > Also, can you perform your test in the range 10000-1000000 step 10000?
> I perform this test in the range 10000-300000 (on 300000 dots it takes 208
> seconds, so I stop it).

Thanks a lot (oops, I didn't mean to take so much compute-time).
It definitely increases more than I expected, but the overall behavior looks ok.

> 
> N_INIT=10000; N_STEP=10000; N_MAX=1000000; REPEATS=1
> 10000 dots: 77473.6787 ns
> 20000 dots: 85283.5266 ns
> 30000 dots: 90020.6350 ns
> 40000 dots: 94101.4462 ns
> 50000 dots: 134415.7382 ns
> 60000 dots: 219809.3580 ns
> 70000 dots: 295649.2472 ns
> 80000 dots: 380468.2879 ns
> 90000 dots: 452315.9048 ns
> 100000 dots: 523139.9335 ns
> 110000 dots: 560467.5491 ns
> 120000 dots: 590674.1382 ns
> 130000 dots: 601031.4354 ns
> 140000 dots: 610107.1283 ns
> 150000 dots: 621255.9912 ns
> 160000 dots: 636620.0321 ns
> 170000 dots: 639096.1297 ns
> 180000 dots: 640634.2400 ns
> 190000 dots: 649307.6680 ns
> 200000 dots: 656461.8706 ns
> 210000 dots: 659141.9941 ns
> 220000 dots: 658025.4864 ns
> 230000 dots: 659949.7492 ns
> 240000 dots: 661589.7778 ns
> 250000 dots: 663293.6351 ns
> 260000 dots: 667306.9699 ns
> 270000 dots: 678863.1729 ns
> 280000 dots: 676629.7568 ns
> 290000 dots: 692568.7273 ns
> 300000 dots: 696615.8439 ns
> ^C
> 
> I can rerun it and get all data if required. But timings grows faster N*lg(N)
> (memory reallocation?).

Yes, it might be something like that. Plotting your data shows 2 almost-horizontal parts, with a sudden change in between. Maybe there is room for improvement in the preallocation size estimation and/or in the reallocation policy... but, again, in that test you're stroking a huge amount of segments, so tuning the allocation for such a unusual case might be a bad idea.

(I'm not sure about this, but another possibility is that the change is caused by a sudden increase in the number of intersections and degeneracies. This is very possible, because segments are probably almost vertical under those conditions).

To me, it looks like the bug could already be closed...
Does anybody has a better explanation of the weird performance curve?
Would this kind of benchmark be appropriate for the micro-benchmark suite? (or do we already have something like this? I remember a couple of similar benchmarks by Joonas, but I cannot find them in cairo-perf-micro)
Comment 7 Chris Wilson 2011-11-18 12:55:15 UTC
That particular benchmark is indeed close to one type of pathological case for the scan-line rasteriser: a block of nearly vertical lines. There is a micro-benchmark for that in the shape of vlines. The obvious optimisation in this case is to render using a vertical sweep line, which can be easily done in the application by rotating the lines by 90 degrees rendering into a temporary buffer, and then rotating back through 90 degrees on compositing onto the output.

I've considered doing so automatically if sum_dx << sum_dy, but I've seen no real world cases where there was not a better alternative method of rendering or at a last resort could optimise its own rendering far better than cairo could.
Comment 8 Oleg Bulatov 2011-11-19 04:56:31 UTC
Created attachment 53684 [details]
gdk benchmark

Different benchmark, but that related to my problem (slow canvas in Firefox).

N_INIT=100; N_STEP=200; N_MAX=3000; REPEATS=3
100 dots: 4109,7300 ns
300 dots: 10954,9300 ns
500 dots: 21467,7667 ns
700 dots: 22729,7843 ns
900 dots: 22069,1122 ns
1100 dots: 26346,5330 ns
1300 dots: 31056,7331 ns
1500 dots: 36175,2776 ns
1700 dots: 41937,3943 ns
1900 dots: 46618,5988 ns
2100 dots: 52099,1421 ns
2300 dots: 58593,5009 ns
2500 dots: 63088,4559 ns
2700 dots: 66985,8751 ns
2900 dots: 70339,9824 ns

Is it expected results?
Comment 9 Oleg Bulatov 2011-11-19 05:01:42 UTC
N_INIT=1000; N_STEP=1000; N_MAX=10000; REPEATS=3
1000 dots: 26231,9583 ns
2000 dots: 52911,4275 ns
3000 dots: 73177,9663 ns
4000 dots: 89960,2191 ns
5000 dots: 103140,6220 ns
6000 dots: 114414,5051 ns
7000 dots: 127430,1330 ns
8000 dots: 138734,9332 ns
9000 dots: 150994,8329 ns
10000 dots: 162519,6564 ns

3000 dots totally takes 0.2 seconds (only 5 lines per second)
10000 dots totally takes 1.6 seconds
Comment 10 Oleg Bulatov 2011-11-19 05:15:00 UTC
Oops... Sorry, last tests from another machine.

Old test (wthout gdk) on new machine also slowdowns on range 1000...10000:
N_INIT=1000; N_STEP=1000; N_MAX=10000; REPEATS=3
1000 dots: 58929.7287 ns
2000 dots: 54751.3917 ns
3000 dots: 68980.1011 ns
4000 dots: 78669.5042 ns
5000 dots: 84290.0571 ns
6000 dots: 88731.6039 ns
7000 dots: 93703.8769 ns
8000 dots: 93936.6567 ns
9000 dots: 96050.4159 ns
10000 dots: 98065.9577 ns
Comment 11 Chris Wilson 2011-11-19 08:12:53 UTC
My plan for improving firefox is to pass paths via Render and do the union of the path inside the xserver using spans rather than trapezoidalisation which is the major bottleneck there.
Comment 12 GitLab Migration User 2018-08-25 14:01:58 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/cairo/cairo/issues/318.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.