60866 – GLSL performance issues for uniform buffer objects

Bug 60866 - GLSL performance issues for uniform buffer objects

Summary: GLSL performance issues for uniform buffer objects

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Eric Anholt
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-02-15 01:05 UTC by Markus Wick
Modified:	2013-03-11 19:37 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments

Description Markus Wick 2013-02-15 01:05:11 UTC

My game needs to stream everything to the gpu. So all three buffers (ARRAY_BUFFER, ELEMENT_ARRAY_BUFFER, UNIFORM_BUFFER) will be updated before drawing.
To get useable performance, I map each buffer with MAP_UNSYNC and update only small parts in a ringbuffer manner. If a buffer is full, I orphan it by BufferData(NULL, GL_STREAM_DATA).

But my glsl shaders seem to be much slower on using uniform buffer objects.
I've two codepath for uniforms: First uses uniform buffers, the second one updates all uniforms by glUniform. So all next steps are done once with uniform buffers and once with glUniform.

All of the next files are stored on: http://markus.members.selfnet.de/i965-ubo/

For profiling, I've made an apitrace dumps.
qapitrace profiler shows that I am gpu bottlenecked almost all the time.
I've also dumped INTEL_DEBUG=wm,shader_time of both "glretrace -b".
To be complete, there is also the intel_gpu_top output.

I think the qapitrace output says that all shaders are slower, so it shouldn't be an issue with one of them. Maybe the optimizion fails for ubo uniforms?

My test environment:
Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
HD4000 GPU
3.0 Mesa 9.2-devel (git-8cabe26)

Comment 1 Eric Anholt 2013-02-16 06:40:18 UTC

The patch series I just sent out (also available as the "ubo" branch of git://people.freedesktop.org/~anholt/mesa) fixes some rendering failures with your trace on my ivb while improving performance 20%.  Unfortunately, your non-ubo trace spewed endless errors about uniform updates (have you checked for GL errors from your app?  Did the replay play back cleanly for you?), so I couldn't compare the two side by side.

Comment 2 Markus Wick 2013-03-07 07:59:55 UTC

It is fixed by these patches:
http://lists.freedesktop.org/archives/mesa-dev/2013-March/035804.html

Comment 3 Eric Anholt 2013-03-11 19:37:12 UTC

commit 4c1fdae0a01b3f92ec03b61aac1d3df500d51fc6
Author: Eric Anholt <eric@anholt.net>
Date:   Wed Mar 6 14:47:22 2013 -0800

    i965/fs: Switch to using sampler LD messages for uniform pull constants.
    
    When forcing the compiler to always generate pull constants instead of
    push constants (in order to have an easy to use testcase), improves
    performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7).
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.