Attached example measure time to connect two points. On my machine results grow (but should be const): N_INIT=100; N_STEP=100; N_MAX=1000; REPEATS=40 100 dots: 39840.1923 ns 200 dots: 48038.0220 ns 300 dots: 57573.4119 ns 400 dots: 64636.9643 ns 500 dots: 68067.0950 ns 600 dots: 69748.7817 ns 700 dots: 70620.0463 ns 800 dots: 70686.8542 ns 900 dots: 71738.0177 ns 1000 dots: 72309.4676 ns N_INIT=1000; N_STEP=1000; N_MAX=10000; REPEATS=40 1000 dots: 72496.1516 ns 2000 dots: 69100.6237 ns 3000 dots: 69073.2320 ns 4000 dots: 69369.9200 ns 5000 dots: 70577.7072 ns 6000 dots: 71680.0731 ns 7000 dots: 73351.9655 ns 8000 dots: 74332.7942 ns 9000 dots: 75775.3052 ns 10000 dots: 77600.2078 ns
Created attachment 53653 [details] benchmark
I might be missing something, but why exactly would you expect the results to be constant? You are doing more function calls, cairo has to handle more points and the resulting path becomes more complex (many self 'intersections'). Also, the visible result is different with drawing the line only once vs drawing it many times due to antialiasing.
Drawing straight line between two points shouldn't depend on its background. I I suspect drawing time for broken line O(N), where N = number of points. If it O(N*sqrt(N)) or something like that, calling stroke and new_path after each point will give O(N) if sublines approximently equal. For range 1000..6000 dots it is true. cairo_set_antialias(cr, CAIRO_ANTIALIAS_NONE); do not solve this problem. This loop works faster for large N: for (i = 1; i < n; ++i) { cairo_new_path(cr); cairo_move_to(cr, (double)(i - 1) / n * WIDTH, data[i - 1]); cairo_line_to(cr, (double)i / n * WIDTH, data[i]); cairo_stroke(cr); } What do you mean by 'intersections'? If line has intersection, then it should be darker here.
(In reply to comment #3) > Drawing straight line between two points shouldn't depend on its background. I > I suspect drawing time for broken line O(N), where N = number of points. If it > O(N*sqrt(N)) or something like that, calling stroke and new_path after each I would wish something like O(N*lg(n)), which IIRC is optimal. Anything better than that would just mean that you're hitting a special case that's "easier" than the average one. (The optimal tessellation and segment intersection algorithms are superlinear, hence you should expect superlinear performance when stroking) Can you please provide references which indicate why you expect stroking to be linear? (Also, can you perform your test in the range 10000-1000000 step 10000? The second run of results looks quite stable, which might mean that you're actually in one of the special "lucky" cases and the first few runs had different timings because most of the path could fit in faster caches) > point will give O(N) if sublines approximently equal. > > For range 1000..6000 dots it is true. > > cairo_set_antialias(cr, CAIRO_ANTIALIAS_NONE); do not solve this problem. Cairo quality and performance when doing non-antialiased drawing can and should improve, but it is mostly unrelated to your problem (drawing an opaque source OVER a destination through a non-antialiased stroke using a non-intersecting/non-antialiased clipping might probably be optimized for your use case... but it is worth it? how often are such conditions going to be met in general software?). > > This loop works faster for large N: > for (i = 1; i < n; ++i) > { > cairo_new_path(cr); > cairo_move_to(cr, (double)(i - 1) / n * WIDTH, data[i - 1]); > cairo_line_to(cr, (double)i / n * WIDTH, data[i]); > cairo_stroke(cr); > } Yes, but it computes a different output. Example: when using a half opaque color, you draw multiple times where the strokes overlap. (When using a fully-opaque color, you have the same issue because of antialiasing, but it is much less visible). > > What do you mean by 'intersections'? If line has intersection, then it should > be darker here. No, that's not the definition of stroking in Cairo. When you perform a stroke in Cairo, the library computes which region is affected by the stroke, then draws the source through it.
> Yes, but it computes a different output. Oh.. Now I understand why it should be O(N*lg(N)) :) > Also, can you perform your test in the range 10000-1000000 step 10000? I perform this test in the range 10000-300000 (on 300000 dots it takes 208 seconds, so I stop it). N_INIT=10000; N_STEP=10000; N_MAX=1000000; REPEATS=1 10000 dots: 77473.6787 ns 20000 dots: 85283.5266 ns 30000 dots: 90020.6350 ns 40000 dots: 94101.4462 ns 50000 dots: 134415.7382 ns 60000 dots: 219809.3580 ns 70000 dots: 295649.2472 ns 80000 dots: 380468.2879 ns 90000 dots: 452315.9048 ns 100000 dots: 523139.9335 ns 110000 dots: 560467.5491 ns 120000 dots: 590674.1382 ns 130000 dots: 601031.4354 ns 140000 dots: 610107.1283 ns 150000 dots: 621255.9912 ns 160000 dots: 636620.0321 ns 170000 dots: 639096.1297 ns 180000 dots: 640634.2400 ns 190000 dots: 649307.6680 ns 200000 dots: 656461.8706 ns 210000 dots: 659141.9941 ns 220000 dots: 658025.4864 ns 230000 dots: 659949.7492 ns 240000 dots: 661589.7778 ns 250000 dots: 663293.6351 ns 260000 dots: 667306.9699 ns 270000 dots: 678863.1729 ns 280000 dots: 676629.7568 ns 290000 dots: 692568.7273 ns 300000 dots: 696615.8439 ns ^C I can rerun it and get all data if required. But timings grows faster N*lg(N) (memory reallocation?).
(In reply to comment #5) > > Yes, but it computes a different output. > Oh.. Now I understand why it should be O(N*lg(N)) :) > > > Also, can you perform your test in the range 10000-1000000 step 10000? > I perform this test in the range 10000-300000 (on 300000 dots it takes 208 > seconds, so I stop it). Thanks a lot (oops, I didn't mean to take so much compute-time). It definitely increases more than I expected, but the overall behavior looks ok. > > N_INIT=10000; N_STEP=10000; N_MAX=1000000; REPEATS=1 > 10000 dots: 77473.6787 ns > 20000 dots: 85283.5266 ns > 30000 dots: 90020.6350 ns > 40000 dots: 94101.4462 ns > 50000 dots: 134415.7382 ns > 60000 dots: 219809.3580 ns > 70000 dots: 295649.2472 ns > 80000 dots: 380468.2879 ns > 90000 dots: 452315.9048 ns > 100000 dots: 523139.9335 ns > 110000 dots: 560467.5491 ns > 120000 dots: 590674.1382 ns > 130000 dots: 601031.4354 ns > 140000 dots: 610107.1283 ns > 150000 dots: 621255.9912 ns > 160000 dots: 636620.0321 ns > 170000 dots: 639096.1297 ns > 180000 dots: 640634.2400 ns > 190000 dots: 649307.6680 ns > 200000 dots: 656461.8706 ns > 210000 dots: 659141.9941 ns > 220000 dots: 658025.4864 ns > 230000 dots: 659949.7492 ns > 240000 dots: 661589.7778 ns > 250000 dots: 663293.6351 ns > 260000 dots: 667306.9699 ns > 270000 dots: 678863.1729 ns > 280000 dots: 676629.7568 ns > 290000 dots: 692568.7273 ns > 300000 dots: 696615.8439 ns > ^C > > I can rerun it and get all data if required. But timings grows faster N*lg(N) > (memory reallocation?). Yes, it might be something like that. Plotting your data shows 2 almost-horizontal parts, with a sudden change in between. Maybe there is room for improvement in the preallocation size estimation and/or in the reallocation policy... but, again, in that test you're stroking a huge amount of segments, so tuning the allocation for such a unusual case might be a bad idea. (I'm not sure about this, but another possibility is that the change is caused by a sudden increase in the number of intersections and degeneracies. This is very possible, because segments are probably almost vertical under those conditions). To me, it looks like the bug could already be closed... Does anybody has a better explanation of the weird performance curve? Would this kind of benchmark be appropriate for the micro-benchmark suite? (or do we already have something like this? I remember a couple of similar benchmarks by Joonas, but I cannot find them in cairo-perf-micro)
That particular benchmark is indeed close to one type of pathological case for the scan-line rasteriser: a block of nearly vertical lines. There is a micro-benchmark for that in the shape of vlines. The obvious optimisation in this case is to render using a vertical sweep line, which can be easily done in the application by rotating the lines by 90 degrees rendering into a temporary buffer, and then rotating back through 90 degrees on compositing onto the output. I've considered doing so automatically if sum_dx << sum_dy, but I've seen no real world cases where there was not a better alternative method of rendering or at a last resort could optimise its own rendering far better than cairo could.
Created attachment 53684 [details] gdk benchmark Different benchmark, but that related to my problem (slow canvas in Firefox). N_INIT=100; N_STEP=200; N_MAX=3000; REPEATS=3 100 dots: 4109,7300 ns 300 dots: 10954,9300 ns 500 dots: 21467,7667 ns 700 dots: 22729,7843 ns 900 dots: 22069,1122 ns 1100 dots: 26346,5330 ns 1300 dots: 31056,7331 ns 1500 dots: 36175,2776 ns 1700 dots: 41937,3943 ns 1900 dots: 46618,5988 ns 2100 dots: 52099,1421 ns 2300 dots: 58593,5009 ns 2500 dots: 63088,4559 ns 2700 dots: 66985,8751 ns 2900 dots: 70339,9824 ns Is it expected results?
N_INIT=1000; N_STEP=1000; N_MAX=10000; REPEATS=3 1000 dots: 26231,9583 ns 2000 dots: 52911,4275 ns 3000 dots: 73177,9663 ns 4000 dots: 89960,2191 ns 5000 dots: 103140,6220 ns 6000 dots: 114414,5051 ns 7000 dots: 127430,1330 ns 8000 dots: 138734,9332 ns 9000 dots: 150994,8329 ns 10000 dots: 162519,6564 ns 3000 dots totally takes 0.2 seconds (only 5 lines per second) 10000 dots totally takes 1.6 seconds
Oops... Sorry, last tests from another machine. Old test (wthout gdk) on new machine also slowdowns on range 1000...10000: N_INIT=1000; N_STEP=1000; N_MAX=10000; REPEATS=3 1000 dots: 58929.7287 ns 2000 dots: 54751.3917 ns 3000 dots: 68980.1011 ns 4000 dots: 78669.5042 ns 5000 dots: 84290.0571 ns 6000 dots: 88731.6039 ns 7000 dots: 93703.8769 ns 8000 dots: 93936.6567 ns 9000 dots: 96050.4159 ns 10000 dots: 98065.9577 ns
My plan for improving firefox is to pass paths via Render and do the union of the path inside the xserver using spans rather than trapezoidalisation which is the major bottleneck there.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/cairo/cairo/issues/318.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.