System Environment: ---------------------------------------------- Platform: HSW ULT Libdrm: (master)libdrm-2.4.46-2-gfea5408098c3c3057958e85ea9d7146f0b08749e Mesa: (master)148f0deb065d8b64e15f951063fac40650ac257a Xserver: (master)xorg-server-1.14.99.1-137-g74469895e39fa38337f59edd64c4031ab9bb51d8 Xf86_video_intel:(master)2.21.12-12-g6c8b15d321044d4a81cb187cc5e1ac094eb82367 Cairo: (master)03c81d414d4edb710c91f96ddb7dbf73e5432583 Libva: (staging)6ba83cd306629e7579912627edab7a86d8c9ae1c Libva_intel_driver: (staging)1caf179b1425b13cacaa421c688c6df8369668c6 Kernel: (drm-intel-next-queued)8f588cfc349bbbd8ae62a13679b9efba41645064 Bug detailed description: ---------------------------------------------- Glbenchmark2.7.0 Glbenchmark2.7.0 EgyptHDoffscreen performance reduce by 10% on HSW ULT. The problem doesn’t exist on HSW desktop and IVB. please see dmesg and Xorg.0 attached. It's Kernel(-intel-next-queued) regression. By bisected, show that 0d8ff15e9 is the first bad commit. commit 0d8ff15e9a15f2b393e53337a107b7a1e5919b6d Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Thu Jul 4 11:02:03 2013 -0700 drm/i915/hsw: Set correct Haswell PTE encodings. Performance ---------------------------------------------- 50b44a4 0d8ff1 EgyptHD_C24Z16_FixedTime_Offscreen 96 86 TRex_C24Z16_FixedTimeStep_Offscreen 56 50 Reproduce steps: --------------------------------------------- 1, xinit& 2, vblank_mode=0 ./GLBenchmark -data ../../data/ -w 1920 -h 1080 -t 2701101
Created attachment 82652 [details] dmesg info
Created attachment 82653 [details] Xorg.0.log
I presume this regression will be recovered once mesa is setting the correct mocs.
Did you retest with latest mesa and its mocs?
With latest mesa, the performance improved. EgyptHD_FixedTime_Offscreen Performance ----------------------------------------------------- (queued_50b44a4)(queued_0d8ff1) (queued_86281e) latest (master)00d32cd5 97 86 86 (master)19031294 (latest) 103 97 98
But still a little regression, albeit less so. Afaik MOCS on ivb/hsw is rather limited and can't set the age control stuff. So I guess we still need to correct this in the kernel.
Right, but iirc, the mocs should be setting the cache age to 0 - why I thought it would restore the regression (and posssibly even improve). Maybe still a missing mocs? Or some other secondary effect.
Just to clarify, mocs sets age3. 3 is the youngest age and stays in longest. 0 is the oldest and first to evict. So my understanding is totally backwards, and the PTE should set 0 with userspace using mocs to mark high priority textures.
Created attachment 83060 [details] [review] try eLLC default age of 3
Created attachment 83061 [details] [review] try LLC default age of 3
Ye Tian, Please try both patches in all 3 configurations: with one patch with the second patch with both patches. Thanks.
To clarify: Mesa currently doesn't set any MOCS overrides for LLC/eLLC. It sets them to "Use the PTE values." (The recent MOCS additions only set L3 cacheability, which is different and separate.) We're relying on the kernel to set everything to be LLC+eLLC WB-cacheable (or WT cacheable where appropriate). In the future, we may try some heuristics to play with ages, but it's hard to know what the right settings are, and there are no concrete plans currently.
(In reply to comment #12) > To clarify: Mesa currently doesn't set any MOCS overrides for LLC/eLLC. It > sets them to "Use the PTE values." (The recent MOCS additions only set L3 > cacheability, which is different and separate.) > > We're relying on the kernel to set everything to be LLC+eLLC WB-cacheable > (or WT cacheable where appropriate). In the future, we may try some > heuristics to play with ages, but it's hard to know what the right settings > are, and there are no concrete plans currently. I was wondering about this. I can't find any mention of L3 + MOCS. Where did you guys find this? Docs make be believe 01b is uncached.
(In reply to comment #11) Performance for the two patches: First patch: 100 fps second patch: 105 fps (LLC default ) Both patches: 105 fps But I don't know how to try both patches: eLLC default if (level != I915_CACHE_NONE) - pte |= HSW_WB_ELLC_LLC_AGE0; + pte |= HSW_ELLC; LLC default if (level != I915_CACHE_NONE) - pte |= HSW_WB_LLC_AGE0; + pte |= HSW_LLC; Does it mean: if (level != I915_CACHE_NONE) - pte |= HSW_WB_ELLC_LLC_AGE0; + pte |= HSW_ELLC; + pte |= HSW_LLC;
(In reply to comment #14) > (In reply to comment #11) > > Performance for the two patches: > First patch: 100 fps > second patch: 105 fps (LLC default ) > Both patches: 105 fps > > But I don't know how to try both patches: Don't worry about it... I'll get this merged to -next-queued/-nightly ASAP
http://cgit.freedesktop.org/~bwidawsk/drm-intel/commit/?h=drm-intel-next-queued&id=b59f98153cd27b72b25abb9a3d5d50e1cd68b2a4
Should be fixed with: commit 87a6b688ccc78b2c54bee56879c6d195d2457ebe Author: Ben Widawsky <ben@bwidawsk.net> Date: Sun Aug 4 23:47:29 2013 -0700 drm/i915/hsw: Change default LLC age to 3
verified it.
(give the credit to Ben)
You mean the patch that you later decided caused a 20% regression in the very same benchmark...
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.