Bug 95474

Summary: Bioshock Infinite and DiRT Showdown perform very poorly on any GPU with GCN >=1.1
Product: Mesa Reporter: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: 0xe2.0x9a.0x9b
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: kcachegrind screenshot: _mesa_FenceSync

Description Jan Ziak (http://atom-symbol.net) 2016-05-18 19:19:38 UTC
See http://www.phoronix.com/scan.php?page=article&item=nv-amd-23ppw
Comment 1 Jan Ziak (http://atom-symbol.net) 2016-05-18 19:26:27 UTC
Does anybody know what the primary cause of the poor performance is?
Comment 2 Alex Deucher 2016-05-18 19:39:45 UTC
Can you identify what component caused the regression?  mesa?  llvm?  kernel?
Comment 3 Jan Ziak (http://atom-symbol.net) 2016-05-18 21:09:52 UTC
(In reply to Alex Deucher from comment #2)
> Can you identify what component caused the regression?  mesa?  llvm?  kernel?

That is a good question, but I do not know the answer.

Data:

Kernel: 4.6.0
Kernel module: radeon.ko

Resolution: 1920x1080
Game quality setting: Ultra

CPU: A10-7850K
CPU utilization: 120% (kernel-space is about 10% CPU)

GPU: R9 390
GPU utilization (radeontop): >>>> 15% <<<<
GPU performance level: forced max clocks

Based on this, it seems that the user-space component (mesa + llvm) is the bottleneck.
Comment 4 Alex Deucher 2016-05-18 21:12:31 UTC
Can you try a different kernel or mesa version?
Comment 5 Jan Ziak (http://atom-symbol.net) 2016-05-18 21:13:05 UTC
(In reply to Alex Deucher from comment #2)
> Can you identify what component caused the regression?  mesa?  llvm?  kernel?

Output from "perf report":

# Samples: 660K of event 'cycles'
# Event count (approx.): 568704142352
#
# Overhead  Command        Shared Object      Symbol
     9.94%  G.26           radeonsi_dri.so    [.] pb_cache_is_buffer_compat
     2.51%  G.26           libgcc_s.so.1      [.] __umoddi3
     1.66%  G.26           libc-2.22.so       [.] _int_malloc
     1.52%  G.26           libc-2.22.so       [.] _int_free
     1.42%  G.26           radeonsi_dri.so    [.] radeon_drm_cs_add_buffer
     1.22%  bioshock.i386  radeonsi_dri.so    [.] radeon_cs_context_cleanup
     1.14%  bioshock.i386  [kernel.vmlinux]   [k] reservation_object_add_shared_fence
     1.02%  G.26           libc-2.22.so       [.] __libc_calloc
     0.93%  G.26           libpthread-2.22.so [.] pthread_mutex_lock
     0.81%  bioshock.i386  [kernel.vmlinux]   [k] __ww_mutex_lock_interruptible
     0.79%  G.26           bioshock.i386      [.] 0x00000000001a8853
     0.79%  G.26           radeonsi_dri.so    [.] pb_cache_reclaim_buffer
     0.78%  G.26           libpthread-2.22.so [.] __pthread_mutex_unlock_usercnt
     0.74%  G.26           libc-2.22.so       [.] malloc
     0.72%  bioshock.i386  radeonsi_dri.so    [.] radeon_drm_cs_emit_ioctl_oneshot
     0.66%  bioshock.i386  [radeon]           [k] radeon_bo_list_validate
     0.60%  G.26           radeonsi_dri.so    [.] __x86.get_pc_thunk.bx
     0.56%  G.26           bioshock.i386      [.] 0x00000000001a8e53
     0.56%  G.26           radeonsi_dri.so    [.] set_add
     0.55%  G.26           radeonsi_dri.so    [.] ir_expression::accept
     0.54%  bioshock.i386  [ttm]              [k] ttm_bo_list_ref_sub
     0.54%  G.26           radeonsi_dri.so    [.] _mesa_glsl_parse
     0.53%  G.26           radeonsi_dri.so    [.] radeon_lookup_buffer
     0.52%  G.26           libc-2.22.so       [.] __memcmp_sse4_2
     0.48%  G.26           radeonsi_dri.so    [.] visit_list_elements
     0.47%  bioshock.i386  [kernel.vmlinux]   [k] reservation_object_reserve_shared
     0.46%  G.26           radeonsi_dri.so    [.] si_reset_buffer_resources
     0.45%  G.26           radeonsi_dri.so    [.] hash_table_search
     0.45%  bioshock.i386  [drm]              [k] drm_gem_object_lookup
     0.42%  G.26           libc-2.22.so       [.] malloc_consolidate
     0.41%  bioshock.i386  [radeon]           [k] radeon_sync_fence
     0.41%  G.26           radeonsi_dri.so    [.] set_search
     0.40%  G.26           radeonsi_dri.so    [.] u_default_transfer_inline_write
     0.40%  bioshock.i386  [ttm]              [k] ttm_bo_add_to_lru
     0.38%  G.26           radeonsi_dri.so    [.] st_validate_state
     0.36%  bioshock.i386  [ttm]              [k] ttm_bo_del_from_lru
Comment 6 Jan Ziak (http://atom-symbol.net) 2016-05-18 21:14:57 UTC
(In reply to Alex Deucher from comment #4)
> Can you try a different kernel or mesa version?

I will try tomorrow.
Comment 7 Jan Ziak (http://atom-symbol.net) 2016-05-19 17:36:27 UTC
(In reply to Alex Deucher from comment #4)
> Can you try a different kernel or mesa version?

LLVM 3.8 + Mesa 11.2.2 -> same result
Comment 8 Jan Ziak (http://atom-symbol.net) 2016-05-20 13:22:16 UTC
(In reply to Alex Deucher from comment #4)
> Can you try a different kernel or mesa version?

Kernel 4.4.6 radeon.ko + LLVM 3.8 + Mesa 11.2.2 -> same result
Comment 9 Jan Ziak (http://atom-symbol.net) 2016-05-24 20:21:36 UTC
Created attachment 124063 [details]
kcachegrind screenshot: _mesa_FenceSync

I ran callgrind on Bioshock with mesa-git.

Callgrind instrumentation was enabled only when the Bioshock benchmark was rendering frames from the game. The benchmark graphics quality was set to Medium.

The screenshot I am sending indicates that Bioshock expects a faster _mesa_FenceSync implementation.
Comment 10 Jan Ziak (http://atom-symbol.net) 2016-05-24 21:06:32 UTC
Also, the first part of the Bioshock benchmark causes Mesa to compile some shaders almost every frame.
Comment 11 Marek Olšák 2016-05-24 21:24:04 UTC
CPU profiling is usable only if you've built Mesa and LLVM with -fno-omit-frame-pointer.

This article suggests that the regression happened between 11.2 and master. Do you disagree with that?
http://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Padoka-May-Ubuntu-16&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Phoronix+%28Phoronix%29
Comment 12 Marek Olšák 2016-05-25 12:09:01 UTC
(In reply to Jan Ziak from comment #9)
> Created attachment 124063 [details]
> kcachegrind screenshot: _mesa_FenceSync
> 
> I ran callgrind on Bioshock with mesa-git.
> 
> Callgrind instrumentation was enabled only when the Bioshock benchmark was
> rendering frames from the game. The benchmark graphics quality was set to
> Medium.
> 
> The screenshot I am sending indicates that Bioshock expects a faster
> _mesa_FenceSync implementation.

I wouldn't trust callgrind, because it runs on a CPU emulator.

The proper way to profile this is to build Mesa with -fno-omit-frame-pointer and use sysprof, which is very easy to use. Sysprof can also save the results to disk.
Comment 13 Jan Ziak (http://atom-symbol.net) 2016-05-25 14:14:07 UTC
(In reply to Marek Olšák from comment #11)
> This article suggests that the regression happened between 11.2 and master.
> Do you disagree with that?
> http://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Padoka-May-
> Ubuntu-
> 16&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Phoronix+%28Pho
> ronix%29

I am unable to reproduce the results from the article on my machine with Mesa 11.2.0. Mesa 11.2.0 and mesa-git have similar performance on my machine.

The 37.35 FPS (Ultra quality setting) in the Phoronix article for Mesa 11.2.0 is highly unlikely. 

Bioshock benchmark ignores the command-line option -ForceCompatLevel=N if there exists "$HOME/.local/share/irrationalgames/bioshockinfinite".
Comment 14 Marek Olšák 2016-08-01 17:33:47 UTC
I've done some profiling.

Bioshock Infinite:
- the game is CPU-bound most of the time
- some small performance enhancements have landed already
- the FenceSync optimization is a work in progress, expect a 30% improvement
- most of the scratch buffer usage is for private memory, not VGPR spilling (this may be a defect in our indirect indexing)
- if I'm not taking private memory usage into account, it's still in top 2 of the worst VGPR spilling apps

DiRT Showdown:
- the game is GPU-bound
- there are a bunch of very slow pixel shaders using while loops, it's unclear how to make them faster
- most of the scratch buffer usage is for VGPR spilling
- it's in top 2 of the worst VGPR spilling apps
Comment 15 Jan Ziak (http://atom-symbol.net) 2016-08-14 21:27:37 UTC
I am closing this issue and marking it as fixed.

Bioshock Infinite benchmark @1080p runs at 33 FPS on Ultra settings on A10-7850K+R9-390. R9-390 is a GCN 1.1 GPU.

Phoronix claims to have been able to reach about 80 FPS on Ultra settings with GCN 1.2+ GPUs and a fast 4(8) cores(threads) Skylake CPU:

http://phoronix.com/scan.php?page=news_item&px=RadeonSI-Mesa-Git-BioShock-Test

----

DiRT Showdown: It is better to have a separate freedesktop.org bug tracking DiRT Showdown performance issues than to postpone closing this bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.