67062 – [HSW ULT bisected] Glbenchmark2.7.0 EgyptHDoffscreen performance reduce by 10%

Bug 67062 - [HSW ULT bisected] Glbenchmark2.7.0 EgyptHDoffscreen performance reduce by 10%

Summary: [HSW ULT bisected] Glbenchmark2.7.0 EgyptHDoffscreen performance reduce by 10%

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	All Linux (All)

Importance:	high major
Assignee:	Ben Widawsky
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-07-19 05:47 UTC by ye.tian
Modified:	2015-05-20 01:27 UTC (History)
CC List:	2 users (show)

See Also:	69870
i915 platform:
i915 features:

Attachments
dmesg info (9.16 KB, text/plain) 2013-07-19 05:47 UTC, ye.tian	no flags	Details
Xorg.0.log (22.48 KB, text/plain) 2013-07-19 05:48 UTC, ye.tian	no flags	Details
try eLLC default age of 3 (1.29 KB, patch) 2013-07-26 21:28 UTC, Ben Widawsky	no flags	Details \| Splinter Review
try LLC default age of 3 (1.38 KB, patch) 2013-07-26 21:28 UTC, Ben Widawsky	no flags	Details \| Splinter Review
View All

Description ye.tian 2013-07-19 05:47:03 UTC

System Environment:       
----------------------------------------------
Platform: HSW ULT 
Libdrm:	(master)libdrm-2.4.46-2-gfea5408098c3c3057958e85ea9d7146f0b08749e
Mesa:	(master)148f0deb065d8b64e15f951063fac40650ac257a
Xserver:	(master)xorg-server-1.14.99.1-137-g74469895e39fa38337f59edd64c4031ab9bb51d8
Xf86_video_intel:(master)2.21.12-12-g6c8b15d321044d4a81cb187cc5e1ac094eb82367
Cairo:	(master)03c81d414d4edb710c91f96ddb7dbf73e5432583
Libva:	(staging)6ba83cd306629e7579912627edab7a86d8c9ae1c
Libva_intel_driver: (staging)1caf179b1425b13cacaa421c688c6df8369668c6
Kernel:	(drm-intel-next-queued)8f588cfc349bbbd8ae62a13679b9efba41645064

Bug detailed description:
----------------------------------------------
Glbenchmark2.7.0 Glbenchmark2.7.0 EgyptHDoffscreen performance reduce by 10% on HSW ULT.
The problem doesn’t exist on HSW desktop and IVB. please see dmesg and Xorg.0 attached.
It's Kernel(-intel-next-queued) regression. By bisected, show that 0d8ff15e9 is the first bad commit.

commit 0d8ff15e9a15f2b393e53337a107b7a1e5919b6d
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Thu Jul 4 11:02:03 2013 -0700

drm/i915/hsw: Set correct Haswell PTE encodings.

Performance
----------------------------------------------
                                             50b44a4   0d8ff1
EgyptHD_C24Z16_FixedTime_Offscreen              96       86 
TRex_C24Z16_FixedTimeStep_Offscreen             56       50

Reproduce steps:
---------------------------------------------
1, xinit&
2, vblank_mode=0 ./GLBenchmark -data ../../data/  -w 1920 -h 1080 -t 2701101

Comment 1 ye.tian 2013-07-19 05:47:56 UTC

Created attachment 82652 [details]
dmesg info

Comment 2 ye.tian 2013-07-19 05:48:24 UTC

Created attachment 82653 [details]
Xorg.0.log

Comment 3 Chris Wilson 2013-07-19 09:04:56 UTC

I presume this regression will be recovered once mesa is setting the correct mocs.

Comment 4 Chris Wilson 2013-07-22 07:59:35 UTC

Did you retest with latest mesa and its mocs?

Comment 5 ye.tian 2013-07-23 06:24:47 UTC

With latest mesa, the performance improved.

EgyptHD_FixedTime_Offscreen   Performance
-----------------------------------------------------
                     (queued_50b44a4)(queued_0d8ff1) (queued_86281e) latest
(master)00d32cd5              97          86            86
(master)19031294 (latest)     103         97            98

Comment 6 Daniel Vetter 2013-07-23 06:52:47 UTC

But still a little regression, albeit less so. Afaik MOCS on ivb/hsw is rather limited and can't set the age control stuff. So I guess we still need to correct this in the kernel.

Comment 7 Chris Wilson 2013-07-23 07:06:37 UTC

Right, but iirc, the mocs should be setting the cache age to 0 - why I thought it would restore the regression (and posssibly even improve). Maybe still a missing mocs? Or some other secondary effect.

Comment 8 Chris Wilson 2013-07-26 21:10:16 UTC

Just to clarify, mocs sets age3. 3 is the youngest age and stays in longest. 0 is the oldest and first to evict.

So my understanding is totally backwards, and the PTE should set 0 with userspace using mocs to mark high priority textures.

Comment 9 Ben Widawsky 2013-07-26 21:28:03 UTC

Created attachment 83060 [details] [review]
try eLLC default age of 3

Comment 10 Ben Widawsky 2013-07-26 21:28:29 UTC

Created attachment 83061 [details] [review]
try LLC default age of 3

Comment 11 Ben Widawsky 2013-07-26 21:29:07 UTC

Ye Tian,

Please try both patches in all 3 configurations:
with one patch
with the second patch
with both patches.

Thanks.

Comment 12 Kenneth Graunke 2013-07-28 23:24:38 UTC

To clarify: Mesa currently doesn't set any MOCS overrides for LLC/eLLC.  It sets them to "Use the PTE values."  (The recent MOCS additions only set L3 cacheability, which is different and separate.)

We're relying on the kernel to set everything to be LLC+eLLC WB-cacheable (or WT cacheable where appropriate).  In the future, we may try some heuristics to play with ages, but it's hard to know what the right settings are, and there are no concrete plans currently.

Comment 13 Ben Widawsky 2013-07-29 23:13:28 UTC

(In reply to comment #12)
> To clarify: Mesa currently doesn't set any MOCS overrides for LLC/eLLC.  It
> sets them to "Use the PTE values."  (The recent MOCS additions only set L3
> cacheability, which is different and separate.)
> 
> We're relying on the kernel to set everything to be LLC+eLLC WB-cacheable
> (or WT cacheable where appropriate).  In the future, we may try some
> heuristics to play with ages, but it's hard to know what the right settings
> are, and there are no concrete plans currently.

I was wondering about this. I can't find any mention of L3 + MOCS. Where did you guys find this? Docs make be believe 01b is uncached.

Comment 14 ye.tian 2013-07-30 02:34:47 UTC

(In reply to comment #11)

Performance for the two patches:
First patch:  100 fps
second patch: 105 fps (LLC default )
Both patches: 105 fps 

But I don't know how to try both patches:
eLLC default 
 	if (level != I915_CACHE_NONE)
-		pte |= HSW_WB_ELLC_LLC_AGE0;
+		pte |= HSW_ELLC;
 
LLC default 
  	if (level != I915_CACHE_NONE)
-		pte |= HSW_WB_LLC_AGE0;
+		pte |= HSW_LLC;

Does it mean:
if (level != I915_CACHE_NONE)
-		pte |= HSW_WB_ELLC_LLC_AGE0;
+		pte |= HSW_ELLC;
+               pte |= HSW_LLC;

Comment 15 Ben Widawsky 2013-07-31 23:08:26 UTC

(In reply to comment #14)
> (In reply to comment #11)
> 
> Performance for the two patches:
> First patch:  100 fps
> second patch: 105 fps (LLC default )
> Both patches: 105 fps 
> 
> But I don't know how to try both patches:

Don't worry about it... I'll get this merged to -next-queued/-nightly ASAP

Comment 16 Ben Widawsky 2013-08-03 21:29:41 UTC

http://cgit.freedesktop.org/~bwidawsk/drm-intel/commit/?h=drm-intel-next-queued&id=b59f98153cd27b72b25abb9a3d5d50e1cd68b2a4

Comment 17 Daniel Vetter 2013-08-06 11:45:38 UTC

Should be fixed with:

commit 87a6b688ccc78b2c54bee56879c6d195d2457ebe
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Sun Aug 4 23:47:29 2013 -0700

    drm/i915/hsw: Change default LLC age to 3

Comment 18 ye.tian 2013-08-07 01:18:25 UTC

verified it.

Comment 19 Gordon Jin 2014-01-02 01:47:13 UTC

(give the credit to Ben)

Comment 20 Chris Wilson 2014-01-02 10:30:53 UTC

You mean the patch that you later decided caused a 20% regression in the very same benchmark...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.