Bug 4668 - EXA subpixel glyph rendering terribly slow
Summary: EXA subpixel glyph rendering terribly slow
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/Acceleration/EXA (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: high normal
Assignee: Eric Anholt
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: xorg-7.2
  Show dependency treegraph
 
Reported: 2005-10-03 01:52 UTC by Pierre Ossman
Modified: 2006-04-27 22:04 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Pierre Ossman 2005-10-03 01:52:09 UTC
The EXA "acceleration" of sub-pixel antialiased glyphs is so terribly slow that
software rendering runs circles around it.

If this problem is difficult to solve it would be nice to have EXA avoid
accelerating these glyphs until a solution can be found.

xorg CVS from 2005-09-30. Card is Radeon Mobility 9200.
Comment 1 Eric Anholt 2005-10-04 00:32:43 UTC
If you edit exapict.c to change the line in exaGlyphs():
    if (!pExaScr->info->accel.PrepareComposite) {
to
    if (!pExaScr->info->accel.PrepareComposite || (maskFormat &&
NeedsComponent(maskFormat->format))) {

does that make the performance more like you'd expect?
Comment 2 Pierre Ossman 2005-10-04 04:27:43 UTC
Afraid not. It's still terribly slow.
Comment 3 Eric Anholt 2005-10-08 19:03:56 UTC
I committed the patch I suggested, which improved runtimes of ls -lR in my
gnome-terminal by 62%.  Could you test with CVS and see if you're still having
the problem?  If you are, could you please tell me what exactly you're running
to see this issue, and how it compares to without using EXA? (numbers, please)
Comment 4 Pierre Ossman 2005-10-11 05:53:49 UTC
Testing this proved difficult since there is no way to get 'time' to catch
rendering time in terminals. x11perf fortunatly had some undocumented tests so I
ran the following:

$ x11perf --aa24text --rgb24text
        -> -aa24text    Char in 30-char aa line (Charter 24)
        -> -rgb24text   Char in 30-char rgb line (Charter 24)
        -> -rgb24text   Char in 30-char rgb core line (Charter 24)

Results:

EXA:

6400000 trep @   0.0047 msec (214000.0/sec): Char in 30-char aa line (Charter 24)
  64000 trep @   0.4997 msec (  2000.0/sec): Char in 30-char rgb line (Charter 24)
 128000 trep @   0.2240 msec (  4460.0/sec): Char in 30-char rgb core line
(Charter 24)

EXA, no RenderAccel:

 320000 trep @   0.0756 msec ( 13200.0/sec): Char in 30-char aa line (Charter 24)
 480000 trep @   0.0641 msec ( 15600.0/sec): Char in 30-char rgb line (Charter 24)
 128000 trep @   0.2240 msec (  4460.0/sec): Char in 30-char rgb core line
(Charter 24)

XAA:

8000000 trep @   0.0034 msec (295000.0/sec): Char in 30-char aa line (Charter 24)
 480000 trep @   0.0642 msec ( 15600.0/sec): Char in 30-char rgb line (Charter 24)
 128000 trep @   0.2215 msec (  4520.0/sec): Char in 30-char rgb core line
(Charter 24)

XAA, no RenderAccel:

 480000 trep @   0.0738 msec ( 13600.0/sec): Char in 30-char aa line (Charter 24)
 480000 trep @   0.0643 msec ( 15600.0/sec): Char in 30-char rgb line (Charter 24)
 128000 trep @   0.2222 msec (  4500.0/sec): Char in 30-char rgb core line
(Charter 24)


The results here are consistent with the percieved performance from day-to-day
usage.

xorg CVS from 2005-10-11.
Comment 5 Adam Jackson 2005-10-20 08:22:14 UTC
the way you time ls -lR is by saying

time ls -lR

since ls will block until output (and therefore rendering) completes this is a
pretty effective way of measuring rendering time.
Comment 6 Pierre Ossman 2005-10-20 08:37:44 UTC
ls will only block until gnome-terminal accepts all data, not until rendering is
complete. time reports the same value with the different font settings, but
there is a significant difference if you measure it externally. E.g. time claims
0.5s but you can easily measure the time to several seconds.

That said, if you give it enough data the buffers do not screw up the results
too much. One run gave 6 seconds vs. 42 seconds for grayscale vs. subpixel with
EXA on. With software rendering subpixel requires 8 seconds.

x11perf tests X operations directly so it is a much better tool here.
Comment 7 Adam Jackson 2005-11-19 04:33:13 UTC
(In reply to comment #6)
> ls will only block until gnome-terminal accepts all data, not until rendering is
> complete.

except that exit() calls fflush(stdout), so yes, it does wait.

at any rate exa is experimental in 7.0 anyway, so this is not a 7.0 blocker,
though i'll certainly take fixes for it if they arise.
Comment 8 Adam Jackson 2006-04-25 05:47:32 UTC
(In reply to comment #7)
> at any rate exa is experimental in 7.0 anyway, so this is not a 7.0 blocker,
> though i'll certainly take fixes for it if they arise.

Utter lack of motion on this since 7.0 means it's either no longer an issue or
it's not enough of one to block 7.1.  Moving out to 7.2.
Comment 9 Pierre Ossman 2006-04-25 22:07:08 UTC
Did a test of the current version and got:

 320000 trep @   0.0829 msec ( 12100.0/sec): Char in 30-char aa line (Charter 24)
 320000 trep @   0.0823 msec ( 12100.0/sec): Char in 30-char rgb line (Charter 24)
 320000 trep @   0.0825 msec ( 12100.0/sec): Char in 30-char rgb core line
(Charter 24)

Very strange... I can see that it uses grayscale for the first and sub-pixel for
the other test, so it isn't doing the same thing three times.

So sub-pixel rendering has gotten a lot better, but "normal" aa doesn't seem to
be accelerated anymore.
Comment 10 Michel Dänzer 2006-04-25 22:32:59 UTC
Is this on a 'naked' X server? Running xserver HEAD? If so, can you try with
Option "MigrationHeuristics" "greedy" and "always"?
Comment 11 Pierre Ossman 2006-04-25 22:42:31 UTC
This is fedora rawhide, which I believe is at least partly taken from CVS.

xorg-x11-drv-ati-6.6.0-1
xorg-x11-server-Xorg-1.0.99.901-5

The driver doesn't seem to contain the strings you mention, so I'll have to
build it from CVS. I'll get back to you. :)
Comment 12 Michel Dänzer 2006-04-25 22:46:34 UTC
(In reply to comment #11)
> The driver doesn't seem to contain the strings you mention, [...]

It's an option in EXA, not the driver.
Comment 13 Pierre Ossman 2006-04-25 23:00:20 UTC
So I discovered. And it is also included in Red Hat's package. I'll commence the
testing then. :)
Comment 14 Pierre Ossman 2006-04-25 23:12:39 UTC
Ehm... where do I specify this option? If I put it in ServerFlags it doesn't say
anything anywhere, and if I put it in Device then it says that it ignores it.
Comment 15 Michel Dänzer 2006-04-25 23:22:19 UTC
Ah, looks like RC1 didn't support several migration schemes yet.
Comment 16 Pierre Ossman 2006-04-25 23:51:15 UTC
Meaning I should do what? Compile the server from CVS? And where should the
option be in that case?
Comment 17 Michel Dänzer 2006-04-25 23:56:07 UTC
(In reply to comment #16)
> Meaning I should do what? Compile the server from CVS? 

If you want to try different migration schemes, yes; at least EXA.

> And where should the option be in that case?

I have it in the device section.
Comment 18 Pierre Ossman 2006-04-26 06:16:14 UTC
I tried replacing libexa.so from a current CVS build and added
MigrationHeuristics to the device section, but all I got was:

(WW) RADEON(0): Option "MigrationHeuristics" is not used

I also get some funky glitches here and there so I'm guessing replacing just
libexa.so wasn't completely safe. :)
Comment 19 Eric Anholt 2006-04-26 09:58:22 UTC
OK, I've committed an update which improves the situtuation, but doesn't quite
fix it.  Man, I hate Glyphs.  Here are some comparisons on my laptop between
XAA, EXA right before my patch, and EXA right after my patch:

1: /home/anholt/text-xaa
2: /home/anholt/text-exa-before
3: /home/anholt/text-exa-after

    1              2                   3           Operation
--------   -----------------   -----------------   -----------------
178000.0   110000.0 (  0.62)   183000.0 (  1.03)   Char in 30-char aa line
(Charter 24) 
 17200.0     3150.0 (  0.18)    16200.0 (  0.94)   Char in 30-char rgb line
(Charter 24) 

1: /home/anholt/text-compmgr-xaa
2: /home/anholt/text-compmgr-exa-before
3: /home/anholt/text-compmgr-exa-after

    1              2                   3           Operation
--------   -----------------   -----------------   -----------------
167000.0   163000.0 (  0.98)   103000.0 (  0.62)   Char in 30-char aa line
(Charter 24) 
 16100.0    16100.0 (  1.00)    53700.0 (  3.34)   Char in 30-char rgb line
(Charter 24) 

And ls -lR times without a compmgr, with aa and rgb (smaller is better, ouch!):

x /home/anholt/time-xaa
+ /home/anholt/time-exa-before
* /home/anholt/time-exa-after
+--------------------------------------------------------------------------+
|    x xx  x                       x  + +             * **        *       *|
||______M____A____________|        |____M__A_______| |___M___A________|    |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5          6.03          6.61          6.08         6.184    0.24151605
+   5          6.67          7.04          6.72         6.764    0.15630099
Difference at 95.0% confidence
        0.58 +/- 0.296677
        9.37904% +/- 4.7975%
        (Student's t, pooled s = 0.203421)
*   5          6.99          7.38          7.05         7.132    0.16483325
Difference at 95.0% confidence
        0.948 +/- 0.301549
        15.3299% +/- 4.87627%
        (Student's t, pooled s = 0.206761)

x /home/anholt/time-xaa-rgb
+ /home/anholt/time-exa-rgb-before
* /home/anholt/time-exa-rgb-after
+--------------------------------------------------------------------------+
|x                               *                                      +  |
|x                               *                                      +  |
|x                              ***                                     +++|
|A                              |A|                                     MA||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5          12.5         12.89         12.58        12.648     0.1512283
+   5         78.84         80.74         79.33        79.588    0.76809505
Difference at 95.0% confidence
        66.94 +/- 0.807324
        529.254% +/- 6.38302%
        (Student's t, pooled s = 0.553552)
*   5          41.9         43.44         42.65         42.55    0.62373873
Difference at 95.0% confidence
        29.902 +/- 0.661882
        236.417% +/- 5.2331%
        (Student's t, pooled s = 0.453828)

So the current patch makes things better except for aa24text with a compmgr,
which I would guess to be because of the additional damage computation.  Beating
XAA on composited rgb24text makes me pretty happy.  But it doesn't catch us up
to XAA for gnome-terminal with subpixel, which I suspect is because the
intersection test in the patch isn't conservative enough.

Leaving this open until I can figure out what's up with gnome-terminal.
Comment 20 Pierre Ossman 2006-04-26 21:24:44 UTC
I'm not getting those numbers. Here things got worse with the latest CVS
(compared to RC1):

8000000 trep @   0.0034 msec (296000.0/sec): Char in 30-char aa line (Charter 24)
 128000 trep @   0.2123 msec (  4710.0/sec): Char in 30-char rgb line (Charter 24)
 128000 trep @   0.2132 msec (  4690.0/sec): Char in 30-char rgb core line
(Charter 24)

X also eats silly ammounts of CPU. The machine isn't really usable right now.
Comment 21 Eric Anholt 2006-04-27 09:04:15 UTC
With my commit today, the subpixel glyph rendering with Radeon driver CVS and
Xorg CVS is now 1/2 the speed of AA text rendering (96000 glyphs/sec for
rgb24text).  This makes sense, since it's done in two passes.  Gnome-terminal is
also correspondingly faster.

Marking it fixed, even though I hope to merge to stable branch.
Comment 22 Pierre Ossman 2006-04-27 19:29:30 UTC
Current CVS of ATI driver and server:

XAA:

8000000 trep @   0.0032 msec (313000.0/sec): Char in 30-char aa line (Charter 24)
 320000 trep @   0.0810 msec ( 12300.0/sec): Char in 30-char rgb line (Charter 24)
 320000 trep @   0.0810 msec ( 12300.0/sec): Char in 30-char rgb core line
(Charter 24)

EXA:

8000000 trep @   0.0033 msec (303000.0/sec): Char in 30-char aa line (Charter 24)
4800000 trep @   0.0063 msec (159000.0/sec): Char in 30-char rgb line (Charter 24)
4800000 trep @   0.0063 msec (159000.0/sec): Char in 30-char rgb core line
(Charter 24)

Anholt, where do I sign up for your fan club? ;)


There is something else broken with the current EXA though (compared to RC1).
The machine is very sluggish, particularly dragging windows around and scrolling
in firefox. I don't know much about EXA operations, but perhaps there is a "move
region" operation that has been tweaked? Should I open a new bug for this?
Comment 23 Eric Anholt 2006-04-28 04:43:31 UTC
Yes, I've also seen some other performance issues recently, and I'm working on
tracking them down.  If you wanted to bug-track it, a new bug would absolutely
be the place.
Comment 24 Pierre Ossman 2006-04-28 15:04:12 UTC
Bug 6773 has been opened. Things occasionally tend to drag out a bit so a
tracker bug is always nice. :)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.