Bug 98239

Summary: saints row 3: performance is limited by flushes
Product: Mesa Reporter: almos <aaalmosss>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED WORKSFORME QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 77449    
Attachments: sr3 flush.png

Description almos 2016-10-13 21:59:44 UTC
Created attachment 127280 [details]
sr3 flush.png

Performance is quite poor, while the gpu load is very low. The attached screenshot shows that gpu load is proportional to fps, and their pattern is the opposite of the amount of flushes. In typical scenes fps can barely reach 30 while gpu load is around 40% and cpu load is at 60% all the time. I turned off all special effects for this screenshot to show that this only depends on the scene complexity. In fact turning on all the effects don't affect performance much, because the gpu and the cpu are barely loaded.
Comment 1 Marek Olšák 2016-11-25 11:14:15 UTC
What's your GPU and kernel driver?
Comment 2 almos 2016-11-25 20:06:03 UTC
(In reply to Marek Olšák from comment #1)
> What's your GPU and kernel driver?

I have an R9 270x (Curaçao XT), the kernel driver is radeon. BTW the game runs noticeably smoother with Mesa 13 than with 12.
Comment 3 Marek Olšák 2016-11-26 13:51:39 UTC
Things to try:

1) You can try increasing the IB size in radeon_drm_cs.h:
struct radeon_cs_context {
    uint32_t                    buf[16 * 1024]; // HERE

2) Add buffer-wait-time to the HUD and see if it corresponds with the flushes.
Comment 4 almos 2016-12-09 19:08:12 UTC
(In reply to Marek Olšák from comment #3)
> Things to try:
> 
> 1) You can try increasing the IB size in radeon_drm_cs.h:
> struct radeon_cs_context {
>     uint32_t                    buf[16 * 1024]; // HERE
> 
> 2) Add buffer-wait-time to the HUD and see if it corresponds with the
> flushes.

I tried it with 4x larger buf. The flushes are reduced to 4-5 instead of around 15, but the performance and gpu-load remained mostly the same. It feels a bit smoother, and the fps seems more consistent, but I didn't compare it thoroughly. It seems the flush count is not a cause, but a symptom.

With 8x larger buf all textures are missing.

I also tried to monitor other data sources (e.g. dma), but nothing seems to be as correlated with the fps as the gpu-load is. The buffer-wait-time somewhat resembles, but not always.

BTW sr3 produces other interesting things, for example when starting up I get 15fps in the main menu, but after loading a game, and exiting to the main menu I get 120fps.

I also checked other games for gpu-load, and here are the results (none of them are cpu-bound):
- furmark: 100% load, fps 105-115 sinusoid (its period is different from the rotation of the doughnut, might be worth checking this out)
- amnesia: solid 60fps regardless of the vsync setting, load is 20-30%
- heaven 4: the load is 80-100% perfectly correlated to fps
- quake wars: base fps is 30, jumping to higher numbers with high frequency (on windows I get rock solid 60fps), load is 20-30%
- doom 3: fps is usually 55-60, in some areas it drops to 40 (should be rock solid 60), while the load is 8-11% uncorrelated to fps
- tf2: fps has huge variance between 40-140, load is 30-60% correlated to fps
Comment 5 Bas Nieuwenhuizen 2016-12-09 19:11:54 UTC
With 15 flushes/frame, I think it is very unlikely you are flush limited. The flushes probably just correlate with number of draw commands (because we need more buffers to store more draws) and the draws themselves probably are the bottleneck.
Comment 6 Marek Olšák 2016-12-09 19:24:06 UTC
Doom 3 is limited by the CPU. OpenGL multithreading should help with that. With stock Mesa, there is no way make Doom 3 faster.

When the GPU load is low, it means the app is CPU-bound. Multithreading is the only thing that can help with that.
Comment 7 almos 2016-12-10 00:44:34 UTC
(In reply to Marek Olšák from comment #6)
> Doom 3 is limited by the CPU. OpenGL multithreading should help with that.
> With stock Mesa, there is no way make Doom 3 faster.
> 
> When the GPU load is low, it means the app is CPU-bound. Multithreading is
> the only thing that can help with that.

Hmm, you're right. By default all cores are used 30%, but if I start doom3 with taskset -c 0, it uses 100% of core 0, and I get 35-50fps at 6-8% gpu load. I didn't know tasks were being moved between CPU cores this much. And it seems that my CPU is slower than I thought.
Comment 8 Samuel Pitoiset 2017-07-26 20:45:20 UTC
After playing SR3 a bit on RX480 with mesa 17.1.5 and LLVM 4.0.1, it appears to perform quite well. I have 70-80 FPS almost everywhere in ultra and it's GPU-bound there. Okay it's a different configuration but I think this ticket should be closed.
Comment 9 almos 2017-07-26 21:31:37 UTC
Right, I forgot to close this.

The game is CPU-bound on my machine, because it's an eON wrapped version of a terrible PC port. I've read reports that it has weird performance problems on windows too.

I've also tried glthread, but didn't help.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.