Bug 105256

Summary: Slow performance using glDrawElements calls with GL_UNSIGNED_BYTE indices
Product: Mesa Reporter: Logan McNaughton <logan>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: fdsfgs, mirh
Version: 17.3   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Logan McNaughton 2018-02-26 16:46:01 UTC
Original application bug report here:

https://github.com/gonetz/GLideN64/issues/1561

This bug happens using mupen64plus (N64 emulator) and the GLideN64 graphics plugin.

The application uses VBO streaming (https://www.khronos.org/opengl/wiki/Buffer_Object_Streaming) using ARB_buffer_storage to upload the VBO data.

Multiple users that when using Mesa + r600 driver, the performance was abysmal. It turns out that converting the glDrawElementsBaseVertex calls to glDrawArrays fixed the problem.

You can see the commit that fixed the problem here:

https://github.com/loganmc10/GLideN64/commit/9bcfa67d9550c7f1cd4ba72f657facd66a4d27e4
Comment 1 Roland Scheidegger 2018-02-26 18:47:24 UTC
At a quick glance, I'd suspect the problem isn't really drawElementsBaseVertex per se, but the use of GL_UNSIGNED_BYTE indices as the code seemed to do.
The hw doesn't support ubyte indices, and this has to be emulated by converting the elements to ushort (and if you don't start from zero, IIRC this can get huge buffers allocated especially if start is negative, but I don't remember the details).
I don't know if the driver could do better, but in general you're advised to not use ubyte indices (this is a gl only feature, d3d doesn't support it, so it's unsurprising some hw doesn't support it natively).
(Note that even early GCN does not support ubyte indices, they are only supported starting with Vulcanic Islands.)
Comment 2 Logan McNaughton 2018-02-26 19:05:13 UTC
Interesting, is this documented somewhere? How do you know what data formats the HW supports? Is there any other hardware besides these cards that don't support it?

I'll have the users test using GL_UNSIGNED_SHORT
Comment 3 Logan McNaughton 2018-02-26 19:55:17 UTC
The user confirmed that using GL_UNSIGNED_SHORT fixed the issue, thanks. I would still be curious if there is some easy way to figure this out via documentation or querying the device somehow.
Comment 4 Roland Scheidegger 2018-02-26 20:05:25 UTC
(In reply to Logan McNaughton from comment #2)
> Interesting, is this documented somewhere? How do you know what data formats
> the HW supports? Is there any other hardware besides these cards that don't
> support it?
> 
> I'll have the users test using GL_UNSIGNED_SHORT

I'm not sure if AMD (or ATI) had some perf guides published where this was stated. You can, however, see this from the published register guides - https://developer.amd.com/resources/developer-guides-manuals/
It says VGT_INDEX_16 and VGT_INDEX_32 as valid values for VGT_DMA_INDEX_TYPE (for evergreen, others should be similar), interestingly however it's already a 2-bit value there.
(Actually the register guides only seem to go up to Sea Islands, so you can't even see that it is supported starting with VI.)
A look at the mesa driver is probably easier though :-). This should affect all amd/ati hw up to Sea Islands. Of course, if you use old-school non-vbo element data, it's probably no big deal.
At a very quick glance at the driver code, it looks like newer nvidia chips can handle it (everything from g80). nv30/nv40 cannot, but there it looks like nv30 will emulate index buffers anyway, so the only chips which this might make a difference is nv40 family.
Looks like all intel chips can handle it fine.

That said, even if the slowdown is due to this, I'm not convinced the driver couldn't do better (though it does not look trivial to do better). But it's one of the features which noone really expects to get used, therefore it's good enough as long as it works correctly...

You can't query this afaik and just have to know... Apparently using GL_UNSIGNED_BYTE indices was simply broken at some point with windows drivers too - it's possible the blob implements a more sophisticated emulation strategy: https://community.amd.com/thread/159040
Comment 5 Roland Scheidegger 2018-02-27 00:41:23 UTC
FWIW from a driver perspective this isn't really resolved fixed, I'd consider that a bug. Using GL_UNSIGNED_BYTE indices will never be the optimal solution, but clearly the driver could do better, as evidenced by the driver for some other OS (I don't know what the driver does, but it could for instance maintain a shadow copy, translate indices with a compute shader, the problem likely is that the driver translates the indices on the cpu for each draw call currently).
Comment 6 Logan McNaughton 2018-02-27 03:29:58 UTC
Fair enough, even a warning when MESA_DEBUG is set would be good, just something to notify the developer what is wrong
Comment 7 H4nN1baL 2018-03-11 07:56:55 UTC
Just as a constancy and this does not remain in oblivion, this is related to this another: https://bugs.freedesktop.org/show_bug.cgi?id=102204
Comment 8 GitLab Migration User 2019-09-18 19:25:18 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/631.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.