Bug 84501 - [IVB/BYT/HSW Bisected]Synmark2_DrvCtx performance reduced 50%~80% (PPGTT)
Summary: [IVB/BYT/HSW Bisected]Synmark2_DrvCtx performance reduced 50%~80% (PPGTT)
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 82753
  Show dependency treegraph
 
Reported: 2014-09-30 07:12 UTC by lili
Modified: 2017-10-06 14:35 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg.log (54.54 KB, text/plain)
2014-09-30 07:12 UTC, lili
no flags Details

Description lili 2014-09-30 07:12:26 UTC
Created attachment 107100 [details]
dmesg.log

Environment:
-----------------------------------
Platform:IVB/BYT/HSW
Libdrm:(master)libdrm-2.4.58
Mesa:(master)40aabc0e809fa7523606f15c053e0d6ac01d9b9e
Xserver:(master)xorg-server-1.16.0-335-gcc59be38b7eff52a1d003b390f2994c73ee0b3e9
Xf86_video_intel:(master)2.99.916-70-gac492b9af99919d7c579ee4dd636ef6aab90c945
Cairo:(master)fbb0a260b707cb5f02a14cc368c6f2f0d63564c3
Libva:(master)5faa5f50382af6d2f58ba07bbc64d2e9e63abad9
Libva_intel_driver:(master)925c98afcd381e52b37eb3870c3c80ff9c59a069
Kernel:(drm-intel-nightly)7101d84020f63f1da7e0c5d021cdd6be4d515de5

Bug detailed description:
---------------------------------------------
Synmark2_DrvCtx performance reduced 50%~80%on IVB/BYT-M/HSW.
This problem exist both on gnome-session and Raw X.
It's kernel (drm-intel-next-queued)regression,bisect result show first bad commit is:
commit 1ed26b0b84dab119de93723ad646229db748842d
Author:     Michel Thierry <michel.thierry@intel.com>
AuthorDate: Fri Sep 5 14:13:16 2014 +0100
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Mon Sep 8 09:42:19 2014 +0200

    drm/i915: Enable full PPGTT on gen7

BTW,this issue also exist on drm-intel-nightly branch.
Please see dmesg.log for detail.

Reproduce steps:
---------------------------------------------
1.	xinit&
2.	./synmark2 OglDrvCtx
Comment 1 wendy.wang 2014-09-30 09:41:18 UTC
Clarify why QA report this bug late:
because there was another bug always opening: Bug 82753 - [IVB/BYT-M/HSW Bisected]Synmark2_V6.0(OglDrvCtx) performance reduced ~15%
which puzzled QA’s eyes,so did not timely bisect the bad commit.
Comment 2 Chris Wilson 2014-09-30 20:31:25 UTC
From a first glance, it would be just be the cost of allocating so many contexts and ppgtt that we are causing eviction storms.
Comment 3 Chris Wilson 2014-10-01 08:17:45 UTC
So what is happening is that we are effectively leaking contexts, forcing us to evict and stall.

git://people.freedesktop.org/~ickle/linux-2.6 requests

should contain the fix (in http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=requests&id=4a9a7642673ec3365ce6de7fb91238ab996ac1f0) which should get full-ppgtt to within about 10% of aliasing-pgtt in OglDrvCtx. (Since we allocate and clear the entire PDE upon context creation, there will be measurable extra overhead.)
Comment 4 lili 2014-10-09 07:01:05 UTC
(In reply to Chris Wilson from comment #3)
> So what is happening is that we are effectively leaking contexts, forcing us
> to evict and stall.

git://people.freedesktop.org/~ickle/linux-2.6 requests
> should contain the fix (in
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/
> ?h=requests&id=4a9a7642673ec3365ce6de7fb91238ab996ac1f0) which should get
> full-ppgtt to within about 10% of aliasing-pgtt in OglDrvCtx. (Since we
> allocate and clear the entire PDE upon context creation, there will be
> measurable extra overhead.)

This issue has been fixed with this patch.
Comment 5 Eero Tamminen 2014-10-15 12:37:54 UTC
I checked this on BYT & HSW GT3e, with yesterdays 3D stack (when tracking another issue related to context re-creation).  While there was some drop on BYT, the drop on GT3e was huge, at worst >100x worse performance than with older kernel version.
Comment 6 Eero Tamminen 2014-10-27 09:26:11 UTC
Looking at the numbers, this seems to have been fixed on 23rd?  If this is the case, please mark bug fixed, so that it can be verified.
Comment 7 Chris Wilson 2014-10-27 20:54:57 UTC
(In reply to Eero Tamminen from comment #6)
> Looking at the numbers, this seems to have been fixed on 23rd?  If this is
> the case, please mark bug fixed, so that it can be verified.

Nope. The bug is a transient leak of the vm space and remains unfixed. Given sufficient stress, all that is required is for a client to open/close many contexts, and the aperture will be filled with inactive contexts until we are forced to evict all the inactive contexts to make space for a new one. Whether or not it impacts synmark, we can and should write an igt to try and demonstrate the DoS.
Comment 8 Eero Tamminen 2014-10-28 14:16:43 UTC
Do you by "transient" mean that leakage has some trigger condition when it starts, and that it doesn't happen before that?

I don't think there's been any change in how things are tested, and performance numbers both in our own testing and at QA have been for few days about [1] at same level as before.

[1] This test has very large variance like other CPU / scheduling depend tests, and can be affected by "unrelated" changes elsewhere than in 3D code, so it's hard to say whether it's exactly at former level.
Comment 9 Chris Wilson 2014-10-28 16:42:44 UTC
(In reply to Eero Tamminen from comment #8)
> Do you by "transient" mean that leakage has some trigger condition when it
> starts, and that it doesn't happen before that?

Yes, we [the kernel] acquire an extra reference to the ppgtt when it used on the GPU and that under the test conditions, this reference is not expected to be automatically released until the process exits. There are a few external factors that may trigger the release earlier, but the test demonstrates that we do hold onto a reference to an inactive and closed ppgtt - causing additional resource pressure.
Comment 10 Chris Wilson 2014-10-28 17:02:39 UTC
(In reply to Eero Tamminen from comment #8)
> I don't think there's been any change in how things are tested, and
> performance numbers both in our own testing and at QA have been for few days
> about [1] at same level as before.

What's changed is that full-ppgtt should be disabled again by default.
Comment 11 Eero Tamminen 2014-10-29 13:12:16 UTC
(In reply to Chris Wilson from comment #10)
> (In reply to Eero Tamminen from comment #8)
> > I don't think there's been any change in how things are tested, and
> > performance numbers both in our own testing and at QA have been for few days
> > about [1] at same level as before.

QA bisected it to this commit:
-----------------------------------------------
commit c5cb5e3bf6f015e38b454c9f7a0db7fd8e9def56
Author:     Daniel Vetter <daniel.vetter@ffwll.ch>
AuthorDate: Wed Oct 22 11:18:51 2014 +0200
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Wed Oct 22 11:18:51 2014 +0200

    Revert "drm/i915: Enable full PPGTT on gen7"
-----------------------------------------------


> What's changed is that full-ppgtt should be disabled again by default.

Ok, so while the feature itself is still buggy, this bug is fixed (by disabling the buggy feature).  Marking it as such.
Comment 12 wendy.wang 2014-11-17 01:21:14 UTC
Close this bug, as Synmark2_DrvCtx performance increased after disabled Full PPGTT on Gen7.
Comment 13 Elizabeth 2017-10-06 14:35:07 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.