Bug 37142 - Too much vertex buffers uploads
Summary: Too much vertex buffers uploads
Status: RESOLVED WONTFIX
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-12 08:11 UTC by Pierre-Eric Pelloux-Prayer
Modified: 2014-04-13 11:20 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
openarena + verbose vertex buffer upload logs (1.93 KB, text/plain)
2011-05-12 08:11 UTC, Pierre-Eric Pelloux-Prayer
Details
vertex buffer upload caching (8.90 KB, patch)
2011-05-17 05:33 UTC, Pierre-Eric Pelloux-Prayer
Details | Splinter Review

Description Pierre-Eric Pelloux-Prayer 2011-05-12 08:11:11 UTC
Created attachment 46638 [details]
openarena + verbose vertex buffer upload logs

Test case : openarena + anholt benchmark + r600g

I added several fprintf debug trace to mesa code around Vertex buffer uploading.
Basically, openarena render things using :
glVertexPointeer()
glLockArraysEXT()
several glDrawElements()
glUnlockArraysEXT()

It seems that each call to glDrawElements implies a reupload of the vertex buffers, even if it has not changed (as glLock/Unlock calls tell, at least for part of the buffer).

Another related bug (it seems) is in : cso_set_vertex_buffers() which does a test before calling : util_copy_vertex_buffers and pipe->set_vertex_buffers
This test always return true, thus the 2 above functions are always called. The test is always true because it memcmp all pipe_vertex_buffer, which contains a 'buffer' pointer, which changes at each frame (see st_draw.c:349).

I'm trying to build a patch which fix the cso_set_vertex_buffers problem and then, taking advantage of glLock/Unlock calls to fix the upload issue.

What do you think ?
Comment 1 Ian Romanick 2011-05-12 10:55:30 UTC
(In reply to comment #0)
> Created an attachment (id=46638) [details]
> openarena + verbose vertex buffer upload logs
> 
> Test case : openarena + anholt benchmark + r600g
> 
> I added several fprintf debug trace to mesa code around Vertex buffer
> uploading.
> Basically, openarena render things using :
> glVertexPointeer()
> glLockArraysEXT()
> several glDrawElements()
> glUnlockArraysEXT()
> 
> It seems that each call to glDrawElements implies a reupload of the vertex
> buffers, even if it has not changed (as glLock/Unlock calls tell, at least for
> part of the buffer).

EXT_compiled_vertex_array is dead.  It has been dead for almost 10 years.  No new applications should use it, and old applications that use it should be fast enough on modern hardware.  I really don't think we should add any code to Mesa to make this case fast.  We'd be better off submitting patches to openarena to use VBOs instead.

> Another related bug (it seems) is in : cso_set_vertex_buffers() which does a
> test before calling : util_copy_vertex_buffers and pipe->set_vertex_buffers
> This test always return true, thus the 2 above functions are always called. The
> test is always true because it memcmp all pipe_vertex_buffer, which contains a
> 'buffer' pointer, which changes at each frame (see st_draw.c:349).
> 
> I'm trying to build a patch which fix the cso_set_vertex_buffers problem and
> then, taking advantage of glLock/Unlock calls to fix the upload issue.
> 
> What do you think ?
Comment 2 Marek Olšák 2011-05-12 12:55:38 UTC
(In reply to comment #0)
> What do you think ?

You can't optimize the uploads because the sizes of all buffers are not known when glLockArraysEXT is called. The only things you've got are pointers to the buffers.
Comment 3 Pierre-Eric Pelloux-Prayer 2011-05-12 13:19:13 UTC
> EXT_compiled_vertex_array is dead.  It has been dead for almost 10 years.  No
> new applications should use it, and old applications that use it should be fast
> enough on modern hardware.  I really don't think we should add any code to Mesa
> to make this case fast.  We'd be better off submitting patches to openarena to
> use VBOs instead.

That's why I asked for external opinions. I still feel that IF it can brings significant enough performance improvement, it could be useful - mainly because openarena and all games using quake3 based engine are used in every open-source drivers benchmarks.
Comment 4 Pierre-Eric Pelloux-Prayer 2011-05-12 13:27:09 UTC
(In reply to comment #2)
> (In reply to comment #0)
> > What do you think ?
> 
> You can't optimize the uploads because the sizes of all buffers are not known
> when glLockArraysEXT is called. The only things you've got are pointers to the
> buffers.

Sure, but I was thinking to something more like : 
- glLockArraysEXT does nothing more than curently
- uploading a vertex buffer is done as before except in one case : when trying to upload the same buffer AND glLock is enabled => the uploading is skipped as data is already in GPU memory (which is the case shown in the attached trace file)
Comment 5 Fabio Pedretti 2011-05-13 00:11:50 UTC
Note that ioquake3 will probably support VBOs soon:
https://bugzilla.icculus.org/show_bug.cgi?id=4358
Comment 6 Sven Arvidsson 2011-05-13 09:22:41 UTC
(In reply to comment #3)
> That's why I asked for external opinions. I still feel that IF it can brings
> significant enough performance improvement, it could be useful - mainly because
> openarena and all games using quake3 based engine are used in every open-source
> drivers benchmarks.

Hey! A bug that can be solved by ignoring Phoronix!
Comment 7 Pierre-Eric Pelloux-Prayer 2011-05-17 05:33:51 UTC
Created attachment 46804 [details] [review]
vertex buffer upload caching

Here's a patch adding vertex buffer upload caching. 
It's not extensively tested (only with custom app and openarena), and is a bit hackish (mainly regarding : ctx->Array.NewState field, but it seems unused elsewhere).
Anyway, it allowed me to do some testing and confirm the initial guess : it brings no performance improvements to openarena :-/ (but that was expected, as profiling the game tells that u_upload_data < 1% cpu time)

So I guess the handling of EXT_compiled_vertex_array is useless - maybe with 1 exception : lower end video cards with lower bandwidth ? I can't test this scenario here, as I only have AMD HD4850.
Comment 8 Marek Olšák 2014-04-13 11:20:20 UTC
Closing. Ian's comment explains it:

> EXT_compiled_vertex_array is dead.  It has been dead for almost 10 years. 
> No new applications should use it, and old applications that use it should
> be fast enough on modern hardware.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.