Bug 15546

Summary: Radeon: Omit mask coordinates
Product: xorg Reporter: Owen Taylor <otaylor>
Component: Driver/RadeonAssignee: xf86-video-ati maintainers <xorg-driver-ati>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Patch skipping mask coordinates
none
pycairo program used to measure glyph performance
none
pycairo program used to measure box performance
none
GL program used to measure box performance none

Description Owen Taylor 2008-04-16 23:12:24 UTC
Created attachment 15964 [details] [review]
Patch skipping mask coordinates

For the last week or so I've been chasing curiously bad glyph
drawing (and other compositing) performance on R300. The observed 
behavior was that we could draw ~72,000 vertices / second (and
thus 18,000 boxes / second) without regard to the size of the
triangles or how much setup we were doing around them. Some 
investigation revealed that the problem was apparently calculating 
vertex coordinates in the VAP for texture 1 without having a texture 1.

For glyph drawing, before and after performance is:

                 before         after
String length    glyphs/sec     glyphs/sec
-------------    ------------------------
1                17266           32109
5:               18083           87673
10:              18187          126210
20:              18224          151805
50:              18248          173471

I also constructed a simpler benchmark that composited a
batch of N boxes of size MxM. (Using a clip region to 
get all the boxes in a batch drawn in a single go.)

            before        after      	 GL
Size  Count box/s Mpix/s  box/s Mpix/s  box/s   Mpix/s
----- ----  ----- ------  -----	------  ------- ------
10x10    5  18581    1.9  304433    31  1110150   111
10x10   20  18583    1.9  421277    42	1105830   111
10x10   50  18568    1.9  404612    40  1110340	  111
20x20   20  18335    7.3  276540   106   872726   349
50x50   20  16844   42.1   46949   117   121075   302

(I hope bugzilla will not mangle the above tables)

The third column gives an idea of how much we can improve
further, since it shows the same thing being done by the 
r300 3d driver which avoids the unnecessary repeated 
texture setup and cache flushing we are doing in the 2D
driver. We should be able to come close to the 1 million
glyph/sec mark for longer glyph strings of small characters.

The patch I'm attaching:

 - May not quite apply cleanly without the patches from
   bug 15371, but is independent.
 - Has only been tested on R300, not older or newer 
   cards. (I wanted to keep the coordinates we emitted
   the same everywhere though I suspect performance
   gains will be minimal elsewhere.)
 - Has only been tested for CP not MMIO

I don't think there will be any major problems on other
cards or MMIO but their might be typos or some register
that I forgot to adjust.
Comment 1 Owen Taylor 2008-04-16 23:14:50 UTC
Created attachment 15965 [details]
pycairo program used to measure glyph performance
Comment 2 Owen Taylor 2008-04-16 23:15:13 UTC
Created attachment 15966 [details]
pycairo program used to measure box performance
Comment 3 Owen Taylor 2008-04-16 23:16:10 UTC
Created attachment 15967 [details]
GL program used to measure box performance

Compile with:
gcc -g -Wall -o gl-box-bench gl-box-bench.c `pkg-config --cflags --libs gl glu` -lglut
Comment 4 Alex Deucher 2008-04-16 23:48:11 UTC
I has just written an almost identical patch after you found out the cause, so I've gone ahead and committed it:
99435b7c18d931ea620044d0fdb4cc93dfcc6331
it also fixes a few regs you missed on older chips.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.