Created attachment 97323 [details]
i965 blit fastpath for PBO glDrawPixels does not work with MSAA, even though INTEL_DEBUG=pix indicates success. The attached testcase demonstrates the problem.
On master this is "fixed" by:
Author: Kenneth Graunke <firstname.lastname@example.org>
Date: Fri Feb 21 19:15:51 2014 -0800
i965: Don't try to use the hardware blitter for multisampled miptrees.
Interestingly, 10.1 has this commit that "fixes" MSAA CopyPixels:
Author: Paul Berry <email@example.com>
Date: Tue Dec 3 15:41:14 2013 -0800
i965: Don't try to use HW blitter for glCopyPixels() when multisampled.
1. Code duplication between intel_pixel* is quite unfortunate; I'm guilty for increasing duplication a bit but it wasn't well-factored before me as well;
2. The first commit mentions that blorp would handle msaa; I see that blorp is used for FBO blits, but apparently not in intel_pixel_*; any reason for that?
3. Even without blorp, I find it quite strange that meta fallback would be that slow and go through float pack-unpack (even in non-msaa case).
Comments appreciated :)
The hardware blitter doesn't understand MSAA, so we really shouldn't use it. Our other options are Meta or BLORP. We're generally trying to move away from BLORP, so my preferred solution would be to improve the Meta path.
(As an aside, it looks like _mesa_meta_DrawPixels doesn't think about MSAA either, and maybe it needs to...)
It looks like _mesa_meta_DrawPixels is interpreting ClampFragmentColor == GL_FIXED_ONLY stupidly, and thinking it needs to go through a float buffer when it doesn't. Fixing that would speed this up by around 20%.
On what hardware did you get just 20%? I'm getting about 3x overall speedup on IvyBridge (roughly 10s -> 3s run time), plus 10% faster if I change GL_BGRA to GL_RGBA in the testcase source, otherwise it performs channel swizzle for temporary texture.
As I understand to make Meta viable for this you'd need to be able to interpret PBO (linear) as a texture (normally tiled) without intermediate copying. From what I know, such capability existed in Mesa some time ago but not anymore.
Oh, great! I might have had my laptop on battery or something - it doesn't clock up properly unless it's on AC. 3x sounds believable.
Although we normally allocate textures as tiled (since it's faster), there's no hardware requirement that we do so - untiled textures work just fine. I don't see why we couldn't use a PBO as a texture source. It'd probably just require creating a gl_texture_object wrapper object.
We have a BindRenderbufferAsTexture driver hook that for turning a gl_renderbuffer into a gl_texture_object...we might need a BindPBOAsTexture hook or something.
(In reply to Alexander Monakov from comment #3)
> On what hardware did you get just 20%? I'm getting about 3x overall speedup
> on IvyBridge (roughly 10s -> 3s run time), plus 10% faster if I change
> GL_BGRA to GL_RGBA in the testcase source, otherwise it performs channel
> swizzle for temporary texture.
> As I understand to make Meta viable for this you'd need to be able to
> interpret PBO (linear) as a texture (normally tiled) without intermediate
> copying. From what I know, such capability existed in Mesa some time ago
> but not anymore.
We can totally do this on Intel. In fact, we do for doing PBO uploads today. We just don't do it for DrawPixels. It should be easy enough to use the exact same path for DrawPixels, I just didn't think it was an issue.
(In reply to Jason Ekstrand from comment #5)
> We can totally do this on Intel. In fact, we do for doing PBO uploads
> today. We just don't do it for DrawPixels. It should be easy enough to use
> the exact same path for DrawPixels, I just didn't think it was an issue.
Nice. Can you point me to relevant code or commits, please?