Bug 99319

Summary: godot engine poor performance
Product: Mesa Reporter: Bronson <bronsonmathews>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium    
Version: 17.0   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Bronson 2017-01-08 06:04:09 UTC
Ubuntu about:
Gallium 0.4 on AMD TAHITI (DRM 2.43.0 / 4.4.0-57-generic, LLVM 3.9.1)

lspci:
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X]

grab godot engine & samples:
https://godotengine.org/download

run one of the 3d scenes... Notice the extremely bad performance.

There is a thread open on the godot github, link here, and ill link back to this bug.
https://github.com/godotengine/godot/issues/4799
Comment 1 Thomas Helland 2017-01-08 12:57:32 UTC
I did a small profiling session with my mesa-git RX460 setup. It appears we are spending almost all of our time in amdgpu_cs_context_cleanup. Haven't invested more time into debuging this yet, but thought I'd mention it in case somebody starts working on it.
Comment 2 Thomas Helland 2017-01-08 13:11:34 UTC
A small update: It appears we are spending all of the time walking in a for-loop at the bottom of context_cleanup:

for (i = 0; i < ARRAY_SIZE(cs->buffer_indices_hashlist); i++) {
   cs->buffer_indices_hashlist[i] = -1;
}
Comment 3 Bas Nieuwenhuizen 2017-01-09 02:23:11 UTC
It seems a VBO with a stride of 14 gets used, the u_vbuf determines that is not 4 byte aligned and then translates it on the CPU to be 4-byte aligned. For the 3 vertex attributes each it maps the VBO read-only, for which radeonsi uses a staging buffer. As the copy to staging buffer needs to finish during the map we do an IB flush.

In the end we end up with 3 GFX IB flushes per draw call, which costs a lot of performance.

In the end I don't think the translation is even necessary, as all attributes are vectors of GL_BYTE and GL_HALF_FLOAT.

Two things to improve:
- don't use a staging buffer for small maps (or if we decide to do it to serve future maps, make sure we actually share the staging buffer)
- make u_vbuf smarter so that it doesn't translate ion this case.
Comment 4 Vladislav Egorov 2017-01-09 12:23:31 UTC
(In reply to Thomas Helland from comment #2)
> A small update: It appears we are spending all of the time walking in a
> for-loop at the bottom of context_cleanup:
> 
> for (i = 0; i < ARRAY_SIZE(cs->buffer_indices_hashlist); i++) {
>    cs->buffer_indices_hashlist[i] = -1;
> }

Interesting fact that GCC with -O2 doesn't optimize loops like that to memcpy or memset, even if they are large. It needs additional flag -ftree-loop-distribute-patterns from -O3 or even -mtune options to do it. Proof: https://godbolt.org/g/VwbSqK
Comment 5 Bronson 2017-02-18 00:56:58 UTC
Ive also thrown up a demo showing the situation with some stuff from my current project.
There is a build here:
https://drive.google.com/open?id=0B_nQZvJoqbFmRUtYT3pnUm14Szg
It will load straight into a map, press "p" to bring up there performance dialog.

On my various amd card in the house i get between 2-12 fps.
On my integrated intel I get about 50-60fps.
So there is definitely some issue here with the amd cards.
Comment 6 i9i7soft 2017-02-25 03:13:31 UTC
Posting to say I have this problem on my RX 460. Mesa 17, I get good performance in other games but I get <10fps in the Godot platformer demo. Is there anything I can do to help test or debug this?
Comment 7 Bas Nieuwenhuizen 2017-02-25 11:01:17 UTC
I think this is actually fixed in Mesa git. I get 60 fps for the platformer demo now.
Comment 8 i9i7soft 2017-02-27 16:18:04 UTC
(In reply to Bas Nieuwenhuizen from comment #7)
> I think this is actually fixed in Mesa git. I get 60 fps for the platformer
> demo now.

If it's really fixed, that's good. It would be interesting to know what the change was.
Comment 9 Emil Velikov 2017-02-27 16:39:59 UTC
(In reply to i9i7soft from comment #8)

> It would be interesting to know what the change was.
git bisect is your friend ;-)

If we have good performance numbers and reasonably well isolated patch I won't object to having it in 17.0. See [1], and include the perf numbers as part of the nomination.

Thanks



[1] https://www.mesa3d.org/submittingpatches.html#nominations
Comment 10 Bas Nieuwenhuizen 2017-09-09 12:56:50 UTC
I think this was actually the rework Marek did to not use the gallium shared vbo format preprocessing, which caused this. IIRC that is not really small enough for backporting, besides 17.0 not getting releases anymore.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.