The Mesa meta implementation of glBlitFramebuffer (which seems to be used by most of the drivers) performs a render when the source and destination framebuffers are textures. However I think this function is also commonly used to copy a region between two textures without scaling. In this case it would be good if this could boil down to a hardware blit rather than having to submit geometry. One way to do this could be to use glCopyTexSubImage2D in the meta code. This seems to end up being a blit on at least the i965 and Radeon drivers. This came about because in Clutter we were getting some pressure to use glBlitFramebuffer instead of glCopyTexSubImage to migrate images between atlas textures because it is faster on some (non-mesa) drivers. However I noticed that the opposite is true for Mesa.
Created attachment 42961 [details] [review] meta: Try using glCopyTexSubImage2D in _mesa_meta_BlitFramebuffer In the case where glBlitFramebuffer is being used to copy to a texture without scaling it is faster if we can use the hardware to do a blit rather than having to do a texture render. In most of the drivers glCopyTexSubImage2D will use a blit so this patch makes it check for when glBlitFramebuffer is doing a simple copy and then divert to glCopyTexSubImage2D.
Both r300g and r600g use a 3D blit (by drawing a textured quad), scaling or no scaling, it doesn't matter. AFAIK, r600g has no other way to do a blit. Your commit would not change anything on this hardware. Since the BlitFramebuffer function in st/mesa is simpler than CopyTexSubImage, I don't consider this an enhancement.
Created attachment 43096 [details] Test case showing the performance difference Well at least on the Intel driver there is a faster path for blitting that glCopyTexSubImage2D uses. If it's not also beneficial for Radeon then maybe we should move the patch to be specific to the Intel drivers. Attached is a test case to get some timing for the two functions. Without patch: time for glBlitFramebuffer = 122285 time for glCopyTexSubImage2D = 6097 So glCopyTexSubImage2D is 1906% faster than glBlitFramebuffer. With the patch I get: time for glBlitFramebuffer = 25740 time for glCopyTexSubImage2D = 6900 The patch improves the speed of glBlitFramebuffer by 375% but it's still pretty slow compared to glCopyTexSubImage2D. Maybe the cost of glBlitFramebuffer is mostly in preserving the GL state across the Mesa meta calls and the patch still does a bit of this. Maybe we should make a proper Intel-specific fast path for glBlitFramebuffer that directly calls intelEmitCopyBlit like do_copy_texsubimage does so that it can avoid affecting the GL state.
There is almost no performance difference on Radeons. I guess the patch should be made Intel-only.
Another perspective. Running without the patch: =0 sandybridge:~$ DISPLAY=:0.0 ./copy-tex-subimage # gt1 time for glBlitFramebuffer = 119331 time for glCopyTexSubImage2D = 1518 =0 sandybridge:~$ DISPLAY=:0.1 ./copy-tex-subimage # radeon 5770, r600g time for glBlitFramebuffer = 15952 time for glCopyTexSubImage2D = 16237 And after applying the patch: =0 sandybridge:~$ DISPLAY=:0.0 jhbuild run ./copy-tex-subimage time for glBlitFramebuffer = 4706 time for glCopyTexSubImage2D = 1519 =0 sandybridge:~$ DISPLAY=:0.1 jhbuild run ./copy-tex-subimage time for glBlitFramebuffer = 16318 time for glCopyTexSubImage2D = 16649 Which is less of a case that we need to implement a fast path for glBlitFramebuffer, but that i965 needs to seriously fix the bottleneck uncovered by the current code.
Marek, this appears to be an intel issue, agreed?
(In reply to comment #6) > Marek, this appears to be an intel issue, agreed? Yes, I agree.
This is where I'm up to (reducing meta-op overhead vs patch): q35 snb glBlitFramebuffer 1x1 29336 8833 22606 3990 glBlitFramebuffer 2x2 7360 2235 5684 1006 glBlitFramebuffer 4x4 1869 1440 587 269 glBlitFramebuffer 8x8 493 174 377 83 glBlitFramebuffer 16x16 151 70 112 38 glBlitFramebuffer 32x32 65 45 44 36 glBlitFramebuffer 64x64 43 38 28 24 glBlitFramebuffer 128x128 38 36 23 22 glBlitFramebuffer 256x256 36 36 23 23 glBlitFramebuffer 512x512 36 36 22 22 glBlitFramebuffer 1024x1024 36 35 22 23 glCopyTexSubImage2D 1x1 3861 1350 glCopyTexSubImage2D 2x2 990 355 glCopyTexSubImage2D 4x4 274 106 glCopyTexSubImage2D 8x8 96 43 glCopyTexSubImage2D 16x16 51 27 glCopyTexSubImage2D 32x32 39 23 glCopyTexSubImage2D 64x64 37 22 glCopyTexSubImage2D 128x128 36 22 glCopyTexSubImage2D 256x256 36 22 glCopyTexSubImage2D 512x512 36 22 glCopyTexSubImage2D 1024x1024 36 22
To put the speed into perspective, I tried a couple of other variations, essentially unrolling the meta-op blit in the test itself. On the q35: Blit Quads Tristrip Copy 1x1 29393 638 212 3786 2x2 7339 167 110 955 4x4 1852 48 20 245 8x8 469 18 15 68 16x16 123 12 9 23 32x32 38 9 9 14 64x64 16 9 9 9 128x128 10 8 9 8 256x256 9 9 9 9 512x512 9 9 9 9 1024x1024 9 9 8 9
Created attachment 43167 [details] [review] Move variations for blitting
Having spent a couple of weeks tackling the underlying problem of why snb was so slow, I've finally ported the meta-op to intel and pushed. On applying: 1x1 2x2 4x4 8x8 16x16 32x32 128x128 256x256 512x512 Blit: 2113 532 134 34 8 3 1 1 0 Quads: 141 265 66 3 1 0 0 0 0 Tri: 90 133 35 1 1 0 1 0 0 Copy: 1749 437 109 28 7 2 0 1 0 so glBlitFramebuffer is now only marginally slower than glCopySubImage2D and hopefully adequate. commit c0ad70ae31ee5501281b434d56e389fc92b13a3a Author: Neil Roberts <neil@linux.intel.com> Date: Sat Feb 5 10:21:11 2011 +0000 intel: Try using glCopyTexSubImage2D in _mesa_meta_BlitFramebuffer In the case where glBlitFramebuffer is being used to copy to a texture without scaling it is faster if we can use the hardware to do a blit rather than having to do a texture render. In most of the drivers glCopyTexSubImage2D will use a blit so this patch makes it check for when glBlitFramebuffer is doing a simple copy and then divert to glCopyTexSubImage2D. This was originally proposed as an extension to the common meta-ops. However, it was rejected as using the BLT is only advantageous for Intel hardware. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33934 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.