Created attachment 15964 [details] [review] Patch skipping mask coordinates For the last week or so I've been chasing curiously bad glyph drawing (and other compositing) performance on R300. The observed behavior was that we could draw ~72,000 vertices / second (and thus 18,000 boxes / second) without regard to the size of the triangles or how much setup we were doing around them. Some investigation revealed that the problem was apparently calculating vertex coordinates in the VAP for texture 1 without having a texture 1. For glyph drawing, before and after performance is: before after String length glyphs/sec glyphs/sec ------------- ------------------------ 1 17266 32109 5: 18083 87673 10: 18187 126210 20: 18224 151805 50: 18248 173471 I also constructed a simpler benchmark that composited a batch of N boxes of size MxM. (Using a clip region to get all the boxes in a batch drawn in a single go.) before after GL Size Count box/s Mpix/s box/s Mpix/s box/s Mpix/s ----- ---- ----- ------ ----- ------ ------- ------ 10x10 5 18581 1.9 304433 31 1110150 111 10x10 20 18583 1.9 421277 42 1105830 111 10x10 50 18568 1.9 404612 40 1110340 111 20x20 20 18335 7.3 276540 106 872726 349 50x50 20 16844 42.1 46949 117 121075 302 (I hope bugzilla will not mangle the above tables) The third column gives an idea of how much we can improve further, since it shows the same thing being done by the r300 3d driver which avoids the unnecessary repeated texture setup and cache flushing we are doing in the 2D driver. We should be able to come close to the 1 million glyph/sec mark for longer glyph strings of small characters. The patch I'm attaching: - May not quite apply cleanly without the patches from bug 15371, but is independent. - Has only been tested on R300, not older or newer cards. (I wanted to keep the coordinates we emitted the same everywhere though I suspect performance gains will be minimal elsewhere.) - Has only been tested for CP not MMIO I don't think there will be any major problems on other cards or MMIO but their might be typos or some register that I forgot to adjust.
Created attachment 15965 [details] pycairo program used to measure glyph performance
Created attachment 15966 [details] pycairo program used to measure box performance
Created attachment 15967 [details] GL program used to measure box performance Compile with: gcc -g -Wall -o gl-box-bench gl-box-bench.c `pkg-config --cflags --libs gl glu` -lglut
I has just written an almost identical patch after you found out the cause, so I've gone ahead and committed it: 99435b7c18d931ea620044d0fdb4cc93dfcc6331 it also fixes a few regs you missed on older chips.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.