Bug 28670

Summary: some 2D apps way too slow, fast with noaccel (regression)
Product: xorg Reporter: Martin Renold <martinxyz>
Component: Driver/RadeonAssignee: xf86-video-ati maintainers <xorg-driver-ati>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: bugs.freedesktop, louiz
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Martin Renold 2010-06-22 09:32:38 UTC
Original bugreport: https://gna.org/bugs/?16122 (Summary: zooming in MyPaint is horribly slow, same with fading effects in Firefox)

Disabling acceleration (noaccel) solves the problem.
Chipset: "ATI Radeon X300 (RV370) 5B60 (PCIE)"
Bug 26225 looks very similar to this.

The ati/radeon module version 1.7.7 does not have this problem. The problem exists in version 6.12.192 (debian testing package) and in current git.

I have tried to "git bisect" this but failed to build the older versions. The problem was introduced somewhere between the 1.7.7 tag and commit 6990f2ac647 (dated 2009-09-08).

I'm using kernel 2.6.32-5-amd64. Please let me know if you need more information.
Comment 1 Alex Deucher 2010-06-22 09:42:27 UTC
There is no 1.7.7 tag or branch.  I think this is a firefox or mypaint bug.  I suspect firefox/mypaint is using a transform with RepeatNone when they probably really want RepeatPad.  RepeatNone with a transform can't be accelerated by the hardware as sampling outside the source picture will result in non-conformant behaviour.
Comment 2 Martin Renold 2010-06-22 09:43:44 UTC
Sorry, I meant module version 6.12.7 (not 1.7.7).
Comment 3 Martin Renold 2010-06-22 09:47:05 UTC
For MyPaint I can say what we are doing, we are telling cairo to render a rotated and/or zoomed pixmap. Oprofile shows that all time is being spent in libpixman, symbol bits_image_fetch_transformed.
Comment 4 Alex Deucher 2010-06-22 09:54:10 UTC
(In reply to comment #3)
> For MyPaint I can say what we are doing, we are telling cairo to render a
> rotated and/or zoomed pixmap. Oprofile shows that all time is being spent in
> libpixman, symbol bits_image_fetch_transformed.

libpixman means you are hitting a software fallback.  The hw doesn't support RepeatNone with transformed xRGB source as it's behaviour not compliant with the render spec.  As I said, you probably want RepeatPad rather than RepeatNone.  See the comment at the end of R300CheckCompositeTexture():
http://cgit.freedesktop.org/xorg/driver/xf86-video-ati/tree/src/radeon_exa_render.c
Comment 5 Martin Renold 2010-06-22 11:59:06 UTC
Thanks for the reference. I would never have figured that out. When setting CAIRO_EXTEND_PAD, things are fast even with the latest git driver.

I don't actually care about the behaviour at the border, since it always is outside the screen. Would you reccommend to render the image to RGBA instead of RGB, to increase the chance to get an accelerated transformation also from other hardware?

We really don't want to hit such a slower-than-software fallback. It makes MyPaint unusable, while pure software rendering would still be very fast.
Comment 6 Alex Deucher 2010-06-22 13:20:50 UTC
(In reply to comment #5)
> Thanks for the reference. I would never have figured that out. When setting
> CAIRO_EXTEND_PAD, things are fast even with the latest git driver.
> 

Firefox should use the same fix.

> I don't actually care about the behaviour at the border, since it always is
> outside the screen. Would you reccommend to render the image to RGBA instead of
> RGB, to increase the chance to get an accelerated transformation also from
> other hardware?

Most 3D hardware has the same limitation; however, I'm not sure all hw drivers check for the case properly.  Ideally the render spec would not have required alpha=0 when sampling outside the source region.

> 
> We really don't want to hit such a slower-than-software fallback. It makes
> MyPaint unusable, while pure software rendering would still be very fast.

Software fallbacks are almost always slower then pure software rendering since you end up ping-ponging between hw and sw rendering.  To be fast you need either all hw or all sw.
Comment 7 Søren Sandmann Pedersen 2010-06-23 18:19:55 UTC
Wouldn't it be possible to have a simple shader that checked whether the sample point was outside the bounds and simply return 0 in that case instead of sampling?
Comment 8 Søren Sandmann Pedersen 2010-06-23 19:00:52 UTC
Even without non-uniform control flow, it seems it could still do something like this:

    c = (x < 0) || (y < 0) || (x >= width) || (y >= height);

    f = texture ...

    f = f * !c;
Comment 9 Clemens Eisserer 2010-06-24 01:28:32 UTC
the transformed RGB24+RepeatNone case is quite a common fallback situation.
If it can't be done with shaders, would a temporary ARGB32 pixmap help?
Comment 10 Michel Dänzer 2010-06-24 02:41:08 UTC
(In reply to comment #7)
> Wouldn't it be possible to have a simple shader that checked whether the sample
> point was outside the bounds and simply return 0 in that case instead of
> sampling?

One problem with that is the required filtering between samples inside and outside of the picture; it would require doing the filtering in the shader as well. Less efficient than using the hardware texture sampler filtering capabilities, though certainly doable.


(In reply to comment #9)
> the transformed RGB24+RepeatNone case is quite a common fallback situation.
> If it can't be done with shaders, would a temporary ARGB32 pixmap help?

That's probably one of many possible solutions.

Something else that might help would be for exaComposite() to try RepeatPad if the driver can't handle RepeatNone and the operation doesn't sample outside of the picture.
Comment 11 Martin Renold 2010-06-24 11:54:27 UTC
I have changed the code in MyPaint now to render to RGBA instead of RGB. After also removing the cairo.EXTEND_PAD again, the result is now 3 times faster than anything I have seen before.

Looks like I'm seeing full hardware acceleration for the first time. With the cairo.EXTEND_PAD fix the speed was roughly the same as software-only rendering.

There is a penalty for this when doing software-only (noaccel) rendering. I have measured 5% slowdown. This is acceptable so I'm leaving it like that.
Comment 12 Michel Dänzer 2010-06-24 23:49:24 UTC
(In reply to comment #11)
> Looks like I'm seeing full hardware acceleration for the first time. With the
> cairo.EXTEND_PAD fix the speed was roughly the same as software-only rendering.

Weird - maybe Cairo is falling back to client side software rendering with EXTEND_PAD?
Comment 13 Alex Deucher 2010-06-25 06:02:55 UTC
(In reply to comment #10)
> (In reply to comment #7)
> > Wouldn't it be possible to have a simple shader that checked whether the sample
> > point was outside the bounds and simply return 0 in that case instead of
> > sampling?
> 
> One problem with that is the required filtering between samples inside and
> outside of the picture; it would require doing the filtering in the shader as
> well. Less efficient than using the hardware texture sampler filtering
> capabilities, though certainly doable.

Also, it can't be accelerated on asics with limited shader capabilities.  It'd be nice to handle it in a way that's friendly to more limited hw as well.
Comment 14 Alex Deucher 2010-10-19 19:48:50 UTC
Should be improved with Karl's patches in git.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.