Bug 99247 - glReadPixels with RGB is terribly slow
Summary: glReadPixels with RGB is terribly slow
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium enhancement
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-02 11:36 UTC by Pierre Proske
Modified: 2019-09-25 18:59 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Pierre Proske 2017-01-02 11:36:37 UTC
I wrote an application recently that was ported to both Raspberry PI and Intel Braswell NUC5PPYH NUC (N3700 cpu). The NUC's glReadPixels performance was worse than the RPI (10 fps vs 15fps overall app performance where glReadPixels is the bottleneck). On my Intel HD4000 I can get 60fps under Linux.

I tried to get PBO's working on the NUC (they aren't supported under GLES) for faster pixel transfer and got no performance difference.

I then installed Windows 10 on the NUC and my performance went up to 36fps (~29fps without PBOs). This application is driving a large scale public light installation so I'm very sad to have to resort to Windows 10 as I find Linux more reliable for this kind of work.

Are PBOs not working on Braswell? Why is there such a huge difference between the open and closed drivers for a GPU -> CPU pixel transfer?

Performance on the NUC under Linux seemed fine otherwise (although I saw some glitches in the Ubuntu compositor under latest git Mesa), I got decent shader performance.

I tried a number of Mesa versions, from the stable Ubuntu 16.04 version to latest git via a PPA. No performance difference was noted across these versions.
Comment 1 Matt Turner 2017-01-02 17:25:53 UTC
What formats are you calling glReadPixels on? We have some fast paths, so I'm wondering if it's just a format we don't support.

Capturing an apitrace might also be helpful for us to investigate.
Comment 2 Pierre Proske 2017-01-03 01:22:30 UTC
The code I'm using is this, I'm pretty sure the format is RGB 888 which I'm guessing is slow. However on Windows it's running the exact same code.

void ofFbo::readToPixels(ofPixels & pixels, int attachmentPoint) const{
	if(!bIsAllocated) return;
#ifndef TARGET_OPENGLES
	getTexture(attachmentPoint).readToPixels(pixels);
#else
	pixels.allocate(settings.width,settings.height,ofGetImageTypeFromGLType(settings.internalformat));
	bind();
	int format = ofGetGLFormatFromInternal(settings.internalformat);
	glReadPixels(0,0,settings.width, settings.height, format, GL_UNSIGNED_BYTE, pixels.getData());
	unbind();
#endif
}

I'm using the OpenFrameworks Library, grabbing pixels from an FBO:
https://github.com/openframeworks/openFrameworks/blob/0.9.8/libs/openFrameworks/gl/ofFbo.cpp

I'll have a look a getting an API trace uploaded. This is the tool right?
https://github.com/apitrace/apitrace
Comment 3 Matt Turner 2017-01-03 15:21:06 UTC
(In reply to Pierre Proske from comment #2)
> I'll have a look a getting an API trace uploaded. This is the tool right?
> https://github.com/apitrace/apitrace

Yes, exactly right.
Comment 4 Pierre Proske 2017-01-06 02:16:53 UTC
Here's the API trace:
https://www.dropbox.com/s/3sczvrwxho1nxge/CallAndResponse.trace?dl=0

Turns out I'm not using glReadPixels(). In the trace I'm using PBO's. The alternative to PBO's is in fact glGetTexImage(); Neither produce acceptable results (i.e. I get around 18fps in Linux, and 30-40fps in Windows).

I'm not drawing anything to the screen, just drawing to an offscreen FBO and grabbing the pixels from the attached RGB texture. That's the slow bit.
Comment 5 Kenneth Graunke 2017-01-08 09:15:30 UTC
In this case, _mesa_meta_pbo_GetTexSubImage is failing to handle this because it doesn't believe it can create a texture of format MESA_FORMAT_RGB_UNORM8.  That means that it falls back to CPU mapping, which is stalling.

We don't advertise texturing support for MESA_FORMAT_RGB_UNORM8 (or any other 3-component 8/16-bit formats) because we can't render to them.  Not supporting those makes Mesa fall back to 4-component XRGB, which is renderable.  Normally, this allow us to do more on the GPU.

I think Jason has some hacks for dealing with 3-component formats we might be able to use.  (Vulkan doesn't play well with CPU fallbacks so he had to implement a bunch of stuff.)  We might also be able to use typed surface messages.

Jason, any thoughts on getting competent GPU PBO GetTexSubImage here?
Comment 6 Jason Ekstrand 2017-01-08 15:51:02 UTC
Pierre,
Generally, I would recommend against using RGB formats and expecting them to be fast.  Very little modern hardware has actual RGB format support.  Many drivers will have gone out of their way to try and Kane it fast for legacy applications but every time you use one you're rolling the dice and hoping you hit the optimized path.  It's almost always better to use RGBA.

Ken,
Yes, we can accelerate it.  Assuming Topi landed his blorp upload patches, it shouldn't be hard.  As of recently, blorp has full support for RGB destinations.  It won't be as fast as RGBA because we execute the fragment shader three times per pixel but it should be faster than the CPU fallback.
Comment 7 Kenneth Graunke 2017-01-08 21:06:23 UTC
Oh right, Topi even has patches to replace that path already.  Great!
Comment 8 Jason Ekstrand 2017-01-08 21:08:53 UTC
(In reply to Kenneth Graunke from comment #7)
> Oh right, Topi even has patches to replace that path already.  Great!

Yes, but it won't automatically fix this bug.  Once it lands, we'll have to hook up RGB specially because we can't render to it so we won't when try with today's code.
Comment 9 Pierre Proske 2017-01-08 22:57:15 UTC
Thanks for investigating!

I'm happy to use RGBA if that is the fastest path, I had no idea RGB was so troublesome.

It would still be great to see this fixed but if I can immediately get better fps using RGBA then I will do that :)
Comment 10 Jason Ekstrand 2017-01-08 23:04:18 UTC
(In reply to Pierre Proske from comment #9)
> Thanks for investigating!
> 
> I'm happy to use RGBA if that is the fastest path, I had no idea RGB was so
> troublesome.

Yeah... Turns out that hardware really likes pixels to have nicely aligned addresses.

> It would still be great to see this fixed but if I can immediately get
> better fps using RGBA then I will do that :)

Even if we get faster RGB paths implemented, they won't be as fast as RGBA.  So you're probably better off doing RGBA anyway.  Specifically, you probably want to use GL_UNSIGNED_BYTE and GL_RGBA in your glReadPixels call.  If you ask for the data to be returned in RGB then we have to pack it and that's slow.
Comment 11 Pierre Proske 2017-01-09 01:37:20 UTC
Wow! 

So without PBO's the app works slightly better but is still pretty slow (using glGetTexImage() and RGBA) however when I enable PBO's using RGBA the result is blistering! I'm now getting much faster rates than Windows :D :D

If it's useful I could send another trace without the PBO optimisation.

Otherwise I'm back in the land of the living :)
Comment 12 Jason Ekstrand 2017-01-09 03:28:35 UTC
(In reply to Pierre Proske from comment #11)
> Wow! 
> 
> So without PBO's the app works slightly better but is still pretty slow
> (using glGetTexImage() and RGBA) however when I enable PBO's using RGBA the
> result is blistering! I'm now getting much faster rates than Windows :D :D
> 
> If it's useful I could send another trace without the PBO optimisation.

That might be interesting.  RGBA should, apart from stalling the GPU, be reasonably fast even without the PBO.  It'd be good to know why it isn't.

> Otherwise I'm back in the land of the living :)

Glad you're happy!  Are you still interested in faster RGB?  If not, I'd like to leave the bug open but mark it as an enhancement so we don't forget RGB is slow.
Comment 13 Pierre Proske 2017-01-09 22:03:35 UTC
No I don't need faster RGB. I'm happy to leave this in your hands (I'm sure you've got plenty of other things to do). Keep up the great work...
Comment 14 GitLab Migration User 2019-09-25 18:59:43 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1559.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.