Summary: | 3D blitting is orders of magnitude slower than equivalent 2D blitting. | ||
---|---|---|---|
Product: | Mesa | Reporter: | Neil Roberts <nroberts> |
Component: | Drivers/DRI/i965 | Assignee: | Chris Wilson <chris> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | enhancement | ||
Priority: | medium | CC: | liquid.acid |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
meta: Try using glCopyTexSubImage2D in _mesa_meta_BlitFramebuffer
Test case showing the performance difference Move variations for blitting |
Description
Neil Roberts
2011-02-05 05:50:10 UTC
Created attachment 42961 [details] [review] meta: Try using glCopyTexSubImage2D in _mesa_meta_BlitFramebuffer In the case where glBlitFramebuffer is being used to copy to a texture without scaling it is faster if we can use the hardware to do a blit rather than having to do a texture render. In most of the drivers glCopyTexSubImage2D will use a blit so this patch makes it check for when glBlitFramebuffer is doing a simple copy and then divert to glCopyTexSubImage2D. Both r300g and r600g use a 3D blit (by drawing a textured quad), scaling or no scaling, it doesn't matter. AFAIK, r600g has no other way to do a blit. Your commit would not change anything on this hardware. Since the BlitFramebuffer function in st/mesa is simpler than CopyTexSubImage, I don't consider this an enhancement. Created attachment 43096 [details]
Test case showing the performance difference
Well at least on the Intel driver there is a faster path for blitting that glCopyTexSubImage2D uses. If it's not also beneficial for Radeon then maybe we should move the patch to be specific to the Intel drivers.
Attached is a test case to get some timing for the two functions.
Without patch:
time for glBlitFramebuffer = 122285
time for glCopyTexSubImage2D = 6097
So glCopyTexSubImage2D is 1906% faster than glBlitFramebuffer.
With the patch I get:
time for glBlitFramebuffer = 25740
time for glCopyTexSubImage2D = 6900
The patch improves the speed of glBlitFramebuffer by 375% but it's still pretty slow compared to glCopyTexSubImage2D. Maybe the cost of glBlitFramebuffer is mostly in preserving the GL state across the Mesa meta calls and the patch still does a bit of this. Maybe we should make a proper Intel-specific fast path for glBlitFramebuffer that directly calls intelEmitCopyBlit like do_copy_texsubimage does so that it can avoid affecting the GL state.
There is almost no performance difference on Radeons. I guess the patch should be made Intel-only. Another perspective. Running without the patch: =0 sandybridge:~$ DISPLAY=:0.0 ./copy-tex-subimage # gt1 time for glBlitFramebuffer = 119331 time for glCopyTexSubImage2D = 1518 =0 sandybridge:~$ DISPLAY=:0.1 ./copy-tex-subimage # radeon 5770, r600g time for glBlitFramebuffer = 15952 time for glCopyTexSubImage2D = 16237 And after applying the patch: =0 sandybridge:~$ DISPLAY=:0.0 jhbuild run ./copy-tex-subimage time for glBlitFramebuffer = 4706 time for glCopyTexSubImage2D = 1519 =0 sandybridge:~$ DISPLAY=:0.1 jhbuild run ./copy-tex-subimage time for glBlitFramebuffer = 16318 time for glCopyTexSubImage2D = 16649 Which is less of a case that we need to implement a fast path for glBlitFramebuffer, but that i965 needs to seriously fix the bottleneck uncovered by the current code. Marek, this appears to be an intel issue, agreed? (In reply to comment #6) > Marek, this appears to be an intel issue, agreed? Yes, I agree. This is where I'm up to (reducing meta-op overhead vs patch): q35 snb glBlitFramebuffer 1x1 29336 8833 22606 3990 glBlitFramebuffer 2x2 7360 2235 5684 1006 glBlitFramebuffer 4x4 1869 1440 587 269 glBlitFramebuffer 8x8 493 174 377 83 glBlitFramebuffer 16x16 151 70 112 38 glBlitFramebuffer 32x32 65 45 44 36 glBlitFramebuffer 64x64 43 38 28 24 glBlitFramebuffer 128x128 38 36 23 22 glBlitFramebuffer 256x256 36 36 23 23 glBlitFramebuffer 512x512 36 36 22 22 glBlitFramebuffer 1024x1024 36 35 22 23 glCopyTexSubImage2D 1x1 3861 1350 glCopyTexSubImage2D 2x2 990 355 glCopyTexSubImage2D 4x4 274 106 glCopyTexSubImage2D 8x8 96 43 glCopyTexSubImage2D 16x16 51 27 glCopyTexSubImage2D 32x32 39 23 glCopyTexSubImage2D 64x64 37 22 glCopyTexSubImage2D 128x128 36 22 glCopyTexSubImage2D 256x256 36 22 glCopyTexSubImage2D 512x512 36 22 glCopyTexSubImage2D 1024x1024 36 22 To put the speed into perspective, I tried a couple of other variations, essentially unrolling the meta-op blit in the test itself. On the q35: Blit Quads Tristrip Copy 1x1 29393 638 212 3786 2x2 7339 167 110 955 4x4 1852 48 20 245 8x8 469 18 15 68 16x16 123 12 9 23 32x32 38 9 9 14 64x64 16 9 9 9 128x128 10 8 9 8 256x256 9 9 9 9 512x512 9 9 9 9 1024x1024 9 9 8 9 Created attachment 43167 [details] [review] Move variations for blitting Having spent a couple of weeks tackling the underlying problem of why snb was so slow, I've finally ported the meta-op to intel and pushed. On applying: 1x1 2x2 4x4 8x8 16x16 32x32 128x128 256x256 512x512 Blit: 2113 532 134 34 8 3 1 1 0 Quads: 141 265 66 3 1 0 0 0 0 Tri: 90 133 35 1 1 0 1 0 0 Copy: 1749 437 109 28 7 2 0 1 0 so glBlitFramebuffer is now only marginally slower than glCopySubImage2D and hopefully adequate. commit c0ad70ae31ee5501281b434d56e389fc92b13a3a Author: Neil Roberts <neil@linux.intel.com> Date: Sat Feb 5 10:21:11 2011 +0000 intel: Try using glCopyTexSubImage2D in _mesa_meta_BlitFramebuffer In the case where glBlitFramebuffer is being used to copy to a texture without scaling it is faster if we can use the hardware to do a blit rather than having to do a texture render. In most of the drivers glCopyTexSubImage2D will use a blit so this patch makes it check for when glBlitFramebuffer is doing a simple copy and then divert to glCopyTexSubImage2D. This was originally proposed as an extension to the common meta-ops. However, it was rejected as using the BLT is only advantageous for Intel hardware. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33934 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.