109055 – ~10% perf drop in Sascha Willems Vulkan Multithreading demo

Bug 109055 - ~10% perf drop in Sascha Willems Vulkan Multithreading demo

Summary: ~10% perf drop in Sascha Willems Vulkan Multithreading demo

Status:	VERIFIED WONTFIX

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/Vulkan/intel (show other bugs)
Version:	git
Hardware:	Other All

Importance:	medium normal
Assignee:	Intel 3D Bugs Mailing List
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:	regression

Depends on:
Blocks:	mesa-19.0
	Show dependency tree / graph

Reported:	2018-12-13 18:06 UTC by Eero Tamminen
Modified:	2019-03-01 08:47 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Eero Tamminen 2018-12-13 18:06:58 UTC

Setup:
* SKL or KBL device (don't have data from others)
* Ubuntu 18.04 / Unity
* drm-tip v4.19 or newer kernel
* git version of X with modifiers (dmabuf capable) enabled
* Mesa git version

Test-case:
* multithreading --fullscreen --benchmark --benchwarmup 3 --benchruntime 20

Result:
* FPS drops by 5-10%

Between following Mesa commits:
41c8f991379d1a 2018-11-12 18:28:04 util: Fix warning in u_cpu_detect on non-x86
25b48e3df93dee 2018-11-14 12:12:09 st/xa: Bump minor

Along with the perf drop, one can see both CPU & GPU power usage drops (according to RAPL), and GPU spending 30-40% in RC6 instead of 0%.

I.e. there's a Mesa change that makes this GPU test CPU bound.

I didn't see any significant perf changes in other benchmarks (Vulkan or GL).

Comment 1 Eero Tamminen 2019-01-11 15:32:20 UTC

There was some change between following commits:
* 2018-12-31 19:52:08 8c93ef5de9: radv: Do a cache flush if needed before reading predicates.
* 2019-01-02 18:09:04 7d6babf995: nir: add a way to print the deref chain

That changed the multithreading test perf:
* Fixed half of KBL-7 GT2 regression
* Fixed half of combined perf regression for SKL-i5 GT2, for the indicated interval, and another (as large) regression at end of November
* Improved perf a lot on BDW-3 GT2, much more than the small regression

GPU still spends significant time in RC6 (~20% on BDW-i3, 35-40% on SKL-i5, 45% on KBL-i7).  CPU vs GPU power usage changes differ between platforms, but they're fairly small.  All in all, pretty nice with clearly increased perf.

Comment 2 Eero Tamminen 2019-02-06 11:58:02 UTC

In more detail:
* BDW-i3 GT2 (256KB, 3MB LLC), BSW and BXT didn't regress originally, and improved clearly between commits indicated in above comment
* There was no improvement on HSW-7 GT2 (1024KB, 8MB LLC), SKL-i7 GT4e (1024KB, 6MB LLC), KBL-i7 GT3e (512KB, 4MB LLC), and I don't have data on whether they regressed originally
* Original regression was large on SKL-i5 GT2 (1024KB, 6MB LLC), SKL-i5 GT3e  (512KB, 4MB LLC), KBL-i7 GT2 (512KB, 4MB LLC).  SKL GT3e didn't improve, others did

Current status for things on which I still have data that they had clearly regressed originally:
* SKL & KBL GT2: 10-15% behind original perf
* SKL GT3e: ~20% behind original perf

There's some possible confusion here due to our build server and other changes happening at same time with v4.20-rc kernel STIBP mitigations.  If somebody can confirm the regressions that would be nice.

I'm fine with WONTFIX or WORKSFORME resolutions though, Vulkan multithreading isn't very interesting use-case (at least yet).

Comment 3 Mark Janes 2019-02-28 17:15:28 UTC

The i915 team has agreed that this is WONTFIX

Comment 4 Eero Tamminen 2019-03-01 08:47:16 UTC

verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.