Summary: | some 2D apps way too slow, fast with noaccel (regression) | ||
---|---|---|---|
Product: | xorg | Reporter: | Martin Renold <martinxyz> |
Component: | Driver/Radeon | Assignee: | xf86-video-ati maintainers <xorg-driver-ati> |
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> |
Severity: | normal | ||
Priority: | medium | CC: | bugs.freedesktop, louiz |
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Martin Renold
2010-06-22 09:32:38 UTC
There is no 1.7.7 tag or branch. I think this is a firefox or mypaint bug. I suspect firefox/mypaint is using a transform with RepeatNone when they probably really want RepeatPad. RepeatNone with a transform can't be accelerated by the hardware as sampling outside the source picture will result in non-conformant behaviour. Sorry, I meant module version 6.12.7 (not 1.7.7). For MyPaint I can say what we are doing, we are telling cairo to render a rotated and/or zoomed pixmap. Oprofile shows that all time is being spent in libpixman, symbol bits_image_fetch_transformed. (In reply to comment #3) > For MyPaint I can say what we are doing, we are telling cairo to render a > rotated and/or zoomed pixmap. Oprofile shows that all time is being spent in > libpixman, symbol bits_image_fetch_transformed. libpixman means you are hitting a software fallback. The hw doesn't support RepeatNone with transformed xRGB source as it's behaviour not compliant with the render spec. As I said, you probably want RepeatPad rather than RepeatNone. See the comment at the end of R300CheckCompositeTexture(): http://cgit.freedesktop.org/xorg/driver/xf86-video-ati/tree/src/radeon_exa_render.c Thanks for the reference. I would never have figured that out. When setting CAIRO_EXTEND_PAD, things are fast even with the latest git driver. I don't actually care about the behaviour at the border, since it always is outside the screen. Would you reccommend to render the image to RGBA instead of RGB, to increase the chance to get an accelerated transformation also from other hardware? We really don't want to hit such a slower-than-software fallback. It makes MyPaint unusable, while pure software rendering would still be very fast. (In reply to comment #5) > Thanks for the reference. I would never have figured that out. When setting > CAIRO_EXTEND_PAD, things are fast even with the latest git driver. > Firefox should use the same fix. > I don't actually care about the behaviour at the border, since it always is > outside the screen. Would you reccommend to render the image to RGBA instead of > RGB, to increase the chance to get an accelerated transformation also from > other hardware? Most 3D hardware has the same limitation; however, I'm not sure all hw drivers check for the case properly. Ideally the render spec would not have required alpha=0 when sampling outside the source region. > > We really don't want to hit such a slower-than-software fallback. It makes > MyPaint unusable, while pure software rendering would still be very fast. Software fallbacks are almost always slower then pure software rendering since you end up ping-ponging between hw and sw rendering. To be fast you need either all hw or all sw. Wouldn't it be possible to have a simple shader that checked whether the sample point was outside the bounds and simply return 0 in that case instead of sampling? Even without non-uniform control flow, it seems it could still do something like this: c = (x < 0) || (y < 0) || (x >= width) || (y >= height); f = texture ... f = f * !c; the transformed RGB24+RepeatNone case is quite a common fallback situation. If it can't be done with shaders, would a temporary ARGB32 pixmap help? (In reply to comment #7) > Wouldn't it be possible to have a simple shader that checked whether the sample > point was outside the bounds and simply return 0 in that case instead of > sampling? One problem with that is the required filtering between samples inside and outside of the picture; it would require doing the filtering in the shader as well. Less efficient than using the hardware texture sampler filtering capabilities, though certainly doable. (In reply to comment #9) > the transformed RGB24+RepeatNone case is quite a common fallback situation. > If it can't be done with shaders, would a temporary ARGB32 pixmap help? That's probably one of many possible solutions. Something else that might help would be for exaComposite() to try RepeatPad if the driver can't handle RepeatNone and the operation doesn't sample outside of the picture. I have changed the code in MyPaint now to render to RGBA instead of RGB. After also removing the cairo.EXTEND_PAD again, the result is now 3 times faster than anything I have seen before. Looks like I'm seeing full hardware acceleration for the first time. With the cairo.EXTEND_PAD fix the speed was roughly the same as software-only rendering. There is a penalty for this when doing software-only (noaccel) rendering. I have measured 5% slowdown. This is acceptable so I'm leaving it like that. (In reply to comment #11) > Looks like I'm seeing full hardware acceleration for the first time. With the > cairo.EXTEND_PAD fix the speed was roughly the same as software-only rendering. Weird - maybe Cairo is falling back to client side software rendering with EXTEND_PAD? (In reply to comment #10) > (In reply to comment #7) > > Wouldn't it be possible to have a simple shader that checked whether the sample > > point was outside the bounds and simply return 0 in that case instead of > > sampling? > > One problem with that is the required filtering between samples inside and > outside of the picture; it would require doing the filtering in the shader as > well. Less efficient than using the hardware texture sampler filtering > capabilities, though certainly doable. Also, it can't be accelerated on asics with limited shader capabilities. It'd be nice to handle it in a way that's friendly to more limited hw as well. Should be improved with Karl's patches in git. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.