See http://www.phoronix.com/scan.php?page=article&item=nv-amd-23ppw
Does anybody know what the primary cause of the poor performance is?
Can you identify what component caused the regression? mesa? llvm? kernel?
(In reply to Alex Deucher from comment #2) > Can you identify what component caused the regression? mesa? llvm? kernel? That is a good question, but I do not know the answer. Data: Kernel: 4.6.0 Kernel module: radeon.ko Resolution: 1920x1080 Game quality setting: Ultra CPU: A10-7850K CPU utilization: 120% (kernel-space is about 10% CPU) GPU: R9 390 GPU utilization (radeontop): >>>> 15% <<<< GPU performance level: forced max clocks Based on this, it seems that the user-space component (mesa + llvm) is the bottleneck.
Can you try a different kernel or mesa version?
(In reply to Alex Deucher from comment #2) > Can you identify what component caused the regression? mesa? llvm? kernel? Output from "perf report": # Samples: 660K of event 'cycles' # Event count (approx.): 568704142352 # # Overhead Command Shared Object Symbol 9.94% G.26 radeonsi_dri.so [.] pb_cache_is_buffer_compat 2.51% G.26 libgcc_s.so.1 [.] __umoddi3 1.66% G.26 libc-2.22.so [.] _int_malloc 1.52% G.26 libc-2.22.so [.] _int_free 1.42% G.26 radeonsi_dri.so [.] radeon_drm_cs_add_buffer 1.22% bioshock.i386 radeonsi_dri.so [.] radeon_cs_context_cleanup 1.14% bioshock.i386 [kernel.vmlinux] [k] reservation_object_add_shared_fence 1.02% G.26 libc-2.22.so [.] __libc_calloc 0.93% G.26 libpthread-2.22.so [.] pthread_mutex_lock 0.81% bioshock.i386 [kernel.vmlinux] [k] __ww_mutex_lock_interruptible 0.79% G.26 bioshock.i386 [.] 0x00000000001a8853 0.79% G.26 radeonsi_dri.so [.] pb_cache_reclaim_buffer 0.78% G.26 libpthread-2.22.so [.] __pthread_mutex_unlock_usercnt 0.74% G.26 libc-2.22.so [.] malloc 0.72% bioshock.i386 radeonsi_dri.so [.] radeon_drm_cs_emit_ioctl_oneshot 0.66% bioshock.i386 [radeon] [k] radeon_bo_list_validate 0.60% G.26 radeonsi_dri.so [.] __x86.get_pc_thunk.bx 0.56% G.26 bioshock.i386 [.] 0x00000000001a8e53 0.56% G.26 radeonsi_dri.so [.] set_add 0.55% G.26 radeonsi_dri.so [.] ir_expression::accept 0.54% bioshock.i386 [ttm] [k] ttm_bo_list_ref_sub 0.54% G.26 radeonsi_dri.so [.] _mesa_glsl_parse 0.53% G.26 radeonsi_dri.so [.] radeon_lookup_buffer 0.52% G.26 libc-2.22.so [.] __memcmp_sse4_2 0.48% G.26 radeonsi_dri.so [.] visit_list_elements 0.47% bioshock.i386 [kernel.vmlinux] [k] reservation_object_reserve_shared 0.46% G.26 radeonsi_dri.so [.] si_reset_buffer_resources 0.45% G.26 radeonsi_dri.so [.] hash_table_search 0.45% bioshock.i386 [drm] [k] drm_gem_object_lookup 0.42% G.26 libc-2.22.so [.] malloc_consolidate 0.41% bioshock.i386 [radeon] [k] radeon_sync_fence 0.41% G.26 radeonsi_dri.so [.] set_search 0.40% G.26 radeonsi_dri.so [.] u_default_transfer_inline_write 0.40% bioshock.i386 [ttm] [k] ttm_bo_add_to_lru 0.38% G.26 radeonsi_dri.so [.] st_validate_state 0.36% bioshock.i386 [ttm] [k] ttm_bo_del_from_lru
(In reply to Alex Deucher from comment #4) > Can you try a different kernel or mesa version? I will try tomorrow.
(In reply to Alex Deucher from comment #4) > Can you try a different kernel or mesa version? LLVM 3.8 + Mesa 11.2.2 -> same result
(In reply to Alex Deucher from comment #4) > Can you try a different kernel or mesa version? Kernel 4.4.6 radeon.ko + LLVM 3.8 + Mesa 11.2.2 -> same result
Created attachment 124063 [details] kcachegrind screenshot: _mesa_FenceSync I ran callgrind on Bioshock with mesa-git. Callgrind instrumentation was enabled only when the Bioshock benchmark was rendering frames from the game. The benchmark graphics quality was set to Medium. The screenshot I am sending indicates that Bioshock expects a faster _mesa_FenceSync implementation.
Also, the first part of the Bioshock benchmark causes Mesa to compile some shaders almost every frame.
CPU profiling is usable only if you've built Mesa and LLVM with -fno-omit-frame-pointer. This article suggests that the regression happened between 11.2 and master. Do you disagree with that? http://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Padoka-May-Ubuntu-16&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Phoronix+%28Phoronix%29
(In reply to Jan Ziak from comment #9) > Created attachment 124063 [details] > kcachegrind screenshot: _mesa_FenceSync > > I ran callgrind on Bioshock with mesa-git. > > Callgrind instrumentation was enabled only when the Bioshock benchmark was > rendering frames from the game. The benchmark graphics quality was set to > Medium. > > The screenshot I am sending indicates that Bioshock expects a faster > _mesa_FenceSync implementation. I wouldn't trust callgrind, because it runs on a CPU emulator. The proper way to profile this is to build Mesa with -fno-omit-frame-pointer and use sysprof, which is very easy to use. Sysprof can also save the results to disk.
(In reply to Marek Olšák from comment #11) > This article suggests that the regression happened between 11.2 and master. > Do you disagree with that? > http://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Padoka-May- > Ubuntu- > 16&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Phoronix+%28Pho > ronix%29 I am unable to reproduce the results from the article on my machine with Mesa 11.2.0. Mesa 11.2.0 and mesa-git have similar performance on my machine. The 37.35 FPS (Ultra quality setting) in the Phoronix article for Mesa 11.2.0 is highly unlikely. Bioshock benchmark ignores the command-line option -ForceCompatLevel=N if there exists "$HOME/.local/share/irrationalgames/bioshockinfinite".
I've done some profiling. Bioshock Infinite: - the game is CPU-bound most of the time - some small performance enhancements have landed already - the FenceSync optimization is a work in progress, expect a 30% improvement - most of the scratch buffer usage is for private memory, not VGPR spilling (this may be a defect in our indirect indexing) - if I'm not taking private memory usage into account, it's still in top 2 of the worst VGPR spilling apps DiRT Showdown: - the game is GPU-bound - there are a bunch of very slow pixel shaders using while loops, it's unclear how to make them faster - most of the scratch buffer usage is for VGPR spilling - it's in top 2 of the worst VGPR spilling apps
I am closing this issue and marking it as fixed. Bioshock Infinite benchmark @1080p runs at 33 FPS on Ultra settings on A10-7850K+R9-390. R9-390 is a GCN 1.1 GPU. Phoronix claims to have been able to reach about 80 FPS on Ultra settings with GCN 1.2+ GPUs and a fast 4(8) cores(threads) Skylake CPU: http://phoronix.com/scan.php?page=news_item&px=RadeonSI-Mesa-Git-BioShock-Test ---- DiRT Showdown: It is better to have a separate freedesktop.org bug tracking DiRT Showdown performance issues than to postpone closing this bug.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.