Created attachment 107100 [details] dmesg.log Environment: ----------------------------------- Platform:IVB/BYT/HSW Libdrm:(master)libdrm-2.4.58 Mesa:(master)40aabc0e809fa7523606f15c053e0d6ac01d9b9e Xserver:(master)xorg-server-1.16.0-335-gcc59be38b7eff52a1d003b390f2994c73ee0b3e9 Xf86_video_intel:(master)2.99.916-70-gac492b9af99919d7c579ee4dd636ef6aab90c945 Cairo:(master)fbb0a260b707cb5f02a14cc368c6f2f0d63564c3 Libva:(master)5faa5f50382af6d2f58ba07bbc64d2e9e63abad9 Libva_intel_driver:(master)925c98afcd381e52b37eb3870c3c80ff9c59a069 Kernel:(drm-intel-nightly)7101d84020f63f1da7e0c5d021cdd6be4d515de5 Bug detailed description: --------------------------------------------- Synmark2_DrvCtx performance reduced 50%~80%on IVB/BYT-M/HSW. This problem exist both on gnome-session and Raw X. It's kernel (drm-intel-next-queued)regression,bisect result show first bad commit is: commit 1ed26b0b84dab119de93723ad646229db748842d Author: Michel Thierry <michel.thierry@intel.com> AuthorDate: Fri Sep 5 14:13:16 2014 +0100 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Mon Sep 8 09:42:19 2014 +0200 drm/i915: Enable full PPGTT on gen7 BTW,this issue also exist on drm-intel-nightly branch. Please see dmesg.log for detail. Reproduce steps: --------------------------------------------- 1. xinit& 2. ./synmark2 OglDrvCtx
Clarify why QA report this bug late: because there was another bug always opening: Bug 82753 - [IVB/BYT-M/HSW Bisected]Synmark2_V6.0(OglDrvCtx) performance reduced ~15% which puzzled QA’s eyes,so did not timely bisect the bad commit.
From a first glance, it would be just be the cost of allocating so many contexts and ppgtt that we are causing eviction storms.
So what is happening is that we are effectively leaking contexts, forcing us to evict and stall. git://people.freedesktop.org/~ickle/linux-2.6 requests should contain the fix (in http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=requests&id=4a9a7642673ec3365ce6de7fb91238ab996ac1f0) which should get full-ppgtt to within about 10% of aliasing-pgtt in OglDrvCtx. (Since we allocate and clear the entire PDE upon context creation, there will be measurable extra overhead.)
(In reply to Chris Wilson from comment #3) > So what is happening is that we are effectively leaking contexts, forcing us > to evict and stall. git://people.freedesktop.org/~ickle/linux-2.6 requests > should contain the fix (in > http://cgit.freedesktop.org/~ickle/linux-2.6/commit/ > ?h=requests&id=4a9a7642673ec3365ce6de7fb91238ab996ac1f0) which should get > full-ppgtt to within about 10% of aliasing-pgtt in OglDrvCtx. (Since we > allocate and clear the entire PDE upon context creation, there will be > measurable extra overhead.) This issue has been fixed with this patch.
I checked this on BYT & HSW GT3e, with yesterdays 3D stack (when tracking another issue related to context re-creation). While there was some drop on BYT, the drop on GT3e was huge, at worst >100x worse performance than with older kernel version.
Looking at the numbers, this seems to have been fixed on 23rd? If this is the case, please mark bug fixed, so that it can be verified.
(In reply to Eero Tamminen from comment #6) > Looking at the numbers, this seems to have been fixed on 23rd? If this is > the case, please mark bug fixed, so that it can be verified. Nope. The bug is a transient leak of the vm space and remains unfixed. Given sufficient stress, all that is required is for a client to open/close many contexts, and the aperture will be filled with inactive contexts until we are forced to evict all the inactive contexts to make space for a new one. Whether or not it impacts synmark, we can and should write an igt to try and demonstrate the DoS.
Do you by "transient" mean that leakage has some trigger condition when it starts, and that it doesn't happen before that? I don't think there's been any change in how things are tested, and performance numbers both in our own testing and at QA have been for few days about [1] at same level as before. [1] This test has very large variance like other CPU / scheduling depend tests, and can be affected by "unrelated" changes elsewhere than in 3D code, so it's hard to say whether it's exactly at former level.
(In reply to Eero Tamminen from comment #8) > Do you by "transient" mean that leakage has some trigger condition when it > starts, and that it doesn't happen before that? Yes, we [the kernel] acquire an extra reference to the ppgtt when it used on the GPU and that under the test conditions, this reference is not expected to be automatically released until the process exits. There are a few external factors that may trigger the release earlier, but the test demonstrates that we do hold onto a reference to an inactive and closed ppgtt - causing additional resource pressure.
(In reply to Eero Tamminen from comment #8) > I don't think there's been any change in how things are tested, and > performance numbers both in our own testing and at QA have been for few days > about [1] at same level as before. What's changed is that full-ppgtt should be disabled again by default.
(In reply to Chris Wilson from comment #10) > (In reply to Eero Tamminen from comment #8) > > I don't think there's been any change in how things are tested, and > > performance numbers both in our own testing and at QA have been for few days > > about [1] at same level as before. QA bisected it to this commit: ----------------------------------------------- commit c5cb5e3bf6f015e38b454c9f7a0db7fd8e9def56 Author: Daniel Vetter <daniel.vetter@ffwll.ch> AuthorDate: Wed Oct 22 11:18:51 2014 +0200 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Wed Oct 22 11:18:51 2014 +0200 Revert "drm/i915: Enable full PPGTT on gen7" ----------------------------------------------- > What's changed is that full-ppgtt should be disabled again by default. Ok, so while the feature itself is still buggy, this bug is fixed (by disabling the buggy feature). Marking it as such.
Close this bug, as Synmark2_DrvCtx performance increased after disabled Full PPGTT on Gen7.
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.