Summary: | [PNV regression] igt/gem_exec_lut_handle takes more than 10 minutes | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | lu hua <huax.lu> | ||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | critical | ||||||
Priority: | high | CC: | christophe.prigent, intel-gfx-bugs | ||||
Version: | unspecified | Keywords: | regression | ||||
Hardware: | All | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | PatchMerged | ||||||
i915 platform: | ALL | i915 features: | GEM/PPGTT | ||||
Attachments: |
|
Description
lu hua
2014-12-09 03:06:30 UTC
It is worth checking if the reloc time regressed. What is the output on drm-intel-fixes? Test on latest drm-intel-fixes kernel, it takes 9+ minutest. output: [root@x-pnv2 tests]# time ./gem_exec_lut_handle IGT-Version: 1.8-g819e68f (i686) (Linux: 3.18.0-rc7_drm-intel-fixes_b0616c_20141209+ i686) relocation: buffers= 1: old= 8081 + 512.2*reloc, lut= 8002 + 499.5*reloc (ns) relocation: buffers= 2: old= 5679 + 524.5*reloc, lut= 8733 + 499.8*reloc (ns) relocation: buffers= 4: old= 10113 + 512.1*reloc, lut= 9883 + 499.5*reloc (ns) relocation: buffers= 8: old= 11974 + 510.9*reloc, lut= 12842 + 502.0*reloc (ns) relocation: buffers= 16: old= 16232 + 516.0*reloc, lut= 15976 + 499.5*reloc (ns) relocation: buffers= 32: old= 25303 + 517.1*reloc, lut= 24376 + 507.0*reloc (ns) relocation: buffers= 64: old= 50120 + 525.1*reloc, lut= 47786 + 512.3*reloc (ns) relocation: buffers= 128: old= 104995 + 538.5*reloc, lut= 101171 + 523.2*reloc (ns) relocation: buffers= 256: old= 220469 + 549.6*reloc, lut= 208159 + 533.4*reloc (ns) relocation: buffers= 512: old= 430050 + 567.2*reloc, lut= 417591 + 548.2*reloc (ns) relocation: buffers=1024: old=1143818 + 623.7*reloc, lut=1097252 + 593.9*reloc (ns) relocation: buffers=2048: old=3667335 + 692.3*reloc, lut=3575493 + 641.8*reloc (ns) skip-relocs: buffers= 1: old= 7056 + 119.1*reloc, lut= 6876 + 109.7*reloc (ns) skip-relocs: buffers= 2: old= 7893 + 119.2*reloc, lut= 7745 + 109.7*reloc (ns) skip-relocs: buffers= 4: old= 8921 + 119.2*reloc, lut= 8727 + 109.7*reloc (ns) skip-relocs: buffers= 8: old= 10384 + 121.8*reloc, lut= 10706 + 109.8*reloc (ns) skip-relocs: buffers= 16: old= 15339 + 119.5*reloc, lut= 15719 + 113.2*reloc (ns) skip-relocs: buffers= 32: old= 24206 + 119.9*reloc, lut= 23253 + 110.3*reloc (ns) skip-relocs: buffers= 64: old= 49104 + 124.3*reloc, lut= 46761 + 114.9*reloc (ns) skip-relocs: buffers= 128: old= 103890 + 137.0*reloc, lut= 99817 + 126.9*reloc (ns) skip-relocs: buffers= 256: old= 213578 + 149.1*reloc, lut= 205201 + 138.5*reloc (ns) skip-relocs: buffers= 512: old= 424448 + 166.9*reloc, lut= 412640 + 155.1*reloc (ns) skip-relocs: buffers=1024: old=1131231 + 224.9*reloc, lut=1089359 + 203.7*reloc (ns) skip-relocs: buffers=2048: old=3661083 + 287.5*reloc, lut=3570173 + 252.8*reloc (ns) no-relocs: buffers= 1: old= 6874 + 51.5*reloc, lut= 6895 + 44.1*reloc (ns) no-relocs: buffers= 2: old= 7634 + 51.5*reloc, lut= 7636 + 44.1*reloc (ns) no-relocs: buffers= 4: old= 8678 + 51.5*reloc, lut= 8646 + 44.2*reloc (ns) no-relocs: buffers= 8: old= 10647 + 51.6*reloc, lut= 10483 + 44.2*reloc (ns) no-relocs: buffers= 16: old= 14610 + 51.6*reloc, lut= 14261 + 44.3*reloc (ns) no-relocs: buffers= 32: old= 23059 + 51.6*reloc, lut= 22257 + 44.4*reloc (ns) no-relocs: buffers= 64: old= 46940 + 51.5*reloc, lut= 44469 + 44.3*reloc (ns) no-relocs: buffers= 128: old= 100053 + 51.7*reloc, lut= 96143 + 44.3*reloc (ns) no-relocs: buffers= 256: old= 204337 + 53.3*reloc, lut= 196056 + 45.1*reloc (ns) no-relocs: buffers= 512: old= 414929 + 62.0*reloc, lut= 402343 + 70.6*reloc (ns) no-relocs: buffers=1024: old=1107129 + 23.3*reloc, lut=1046326 + 28.8*reloc (ns) no-relocs: buffers=2048: old=3537830 + 78.3*reloc, lut=3478675 + 47.5*reloc (ns) real 9m51.192s user 0m31.725s sys 9m4.004s So handling of execbuffer buffer list is slower by about 10-20% and the relocation list by about 5-10%. A second sore point is the i915_gem_obj_to_vma(). I have a cunning plan, all I need is a turnip. On my pnv, I am down from 770s to 419s. So let's look forward to some improvements here. Tested on the latest fixed kernel, it takes 13+ minutes. output: ------------------- root@x-pnv1:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# time ./gem_exec_lut_handle IGT-Version: 1.10-g1f6a64e (x86_64) (Linux: 4.0.0-rc6_drm-intel-fixes_840a1c_20150403+ x86_64) relocation: buffers= 1: old= 12972 + 215.4*reloc, lut= 12751 + 215.1*reloc (ns) relocation: buffers= 2: old= 13087 + 217.3*reloc, lut= 13860 + 215.0*reloc (ns) relocation: buffers= 4: old= 16865 + 216.3*reloc, lut= 17707 + 214.9*reloc (ns) relocation: buffers= 8: old= 19819 + 217.5*reloc, lut= 19344 + 217.1*reloc (ns) relocation: buffers= 16: old= 26013 + 225.4*reloc, lut= 26422 + 220.2*reloc (ns) relocation: buffers= 32: old= 46689 + 237.7*reloc, lut= 45972 + 233.7*reloc (ns) relocation: buffers= 64: old= 87659 + 257.6*reloc, lut= 84603 + 252.0*reloc (ns) relocation: buffers= 128: old= 168278 + 276.7*reloc, lut= 163484 + 272.5*reloc (ns) relocation: buffers= 256: old= 377065 + 316.1*reloc, lut= 366626 + 300.5*reloc (ns) relocation: buffers= 512: old=1194388 + 137.7*reloc, lut=1164111 + 189.0*reloc (ns) relocation: buffers=1024: old=2440147 + 375.8*reloc, lut=2417108 + 306.7*reloc (ns) relocation: buffers=2048: old=5664382 + 614.0*reloc, lut=5624147 + 406.1*reloc (ns) skip-relocs: buffers= 1: old= 7842 + 49.6*reloc, lut= 7816 + 46.1*reloc (ns) skip-relocs: buffers= 2: old= 8569 + 49.6*reloc, lut= 8663 + 46.0*reloc (ns) skip-relocs: buffers= 4: old= 9556 + 49.6*reloc, lut= 9575 + 46.1*reloc (ns) skip-relocs: buffers= 8: old= 11805 + 49.6*reloc, lut= 11644 + 46.2*reloc (ns) skip-relocs: buffers= 16: old= 16352 + 49.7*reloc, lut= 16308 + 46.2*reloc (ns) skip-relocs: buffers= 32: old= 29442 + 52.1*reloc, lut= 28865 + 47.8*reloc (ns) skip-relocs: buffers= 64: old= 61269 + 65.5*reloc, lut= 60698 + 58.3*reloc (ns) skip-relocs: buffers= 128: old= 119237 + 80.7*reloc, lut= 117896 + 73.0*reloc (ns) skip-relocs: buffers= 256: old= 236430 + 98.5*reloc, lut= 233710 + 88.2*reloc (ns) skip-relocs: buffers= 512: old= 594453 + 144.7*reloc, lut= 586675 + 116.5*reloc (ns) skip-relocs: buffers=1024: old=2416857 + 256.1*reloc, lut=2387079 + 188.5*reloc (ns) skip-relocs: buffers=2048: old=5653596 + 466.5*reloc, lut=5610695 + 287.8*reloc (ns) no-relocs: buffers= 1: old= 8168 + 0.2*reloc, lut= 8378 + 0.1*reloc (ns) no-relocs: buffers= 2: old= 8252 + 0.2*reloc, lut= 8338 + 0.2*reloc (ns) no-relocs: buffers= 4: old= 9286 + 0.2*reloc, lut= 9216 + 0.2*reloc (ns) no-relocs: buffers= 8: old= 11238 + 0.4*reloc, lut= 11268 + 0.1*reloc (ns) no-relocs: buffers= 16: old= 15609 + 0.1*reloc, lut= 15421 + 0.1*reloc (ns) no-relocs: buffers= 32: old= 27898 + 0.2*reloc, lut= 27345 + 0.2*reloc (ns) no-relocs: buffers= 64: old= 58729 + 0.1*reloc, lut= 58143 + 0.1*reloc (ns) no-relocs: buffers= 128: old= 114477 + 0.1*reloc, lut= 112812 + 0.3*reloc (ns) no-relocs: buffers= 256: old= 227536 + 0.2*reloc, lut= 224279 + 0.3*reloc (ns) no-relocs: buffers= 512: old= 569191 + 0.9*reloc, lut= 559735 + 1.2*reloc (ns) no-relocs: buffers=1024: old=2362485 + 3.7*reloc, lut=2327493 + 4.8*reloc (ns) no-relocs: buffers=2048: old=5496344 + -2.3*reloc, lut=5443557 + -2.0*reloc (ns) SUCCESS (801.260s) real 13m21.287s user 0m3.221s sys 13m16.822s still happen on SKL-Y with the latest kernel : Kernel commit log: commit b4c4542ba1abfb0d3d6913504502573bf2c62b12 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri Aug 28 15:51:30 2015 +0200 drm-intel-nightly: 2015y-08m-28d-13h-50m-34s UTC integration manifest Bug scrub: Test is long but result is Pass, priority updated to medium/normal. This is also reproduced on KBL with the next configuration Gfx stack information =============================================== --> Component : drm tag : libdrm-2.4.68-4-g7aab852 commit : 7aab852 --> Component : mesa tag : mesa-11.1.2 commit : 7bcd827 --> Component : cairo tag : 1.15.2 commit : db8a7f1 --> Component : intel-gpu-tools tag : intel-gpu-tools-1.14-346-gcce2ff0 commit : cce2ff0 Software information ============================ Kernel version : 4.6.0-nightly+ Linux distribution : Ubuntu 15.10 Architecture : 64-bit Bios revision : 28.1 KSC revision : 1.15 Hardware information ============================= Platform : SKL-Y to KBL (RVP3) Motherboard model : Kabylake Client platform Motherboard type : Skylake Y LPDDR3 RVP3 Laptop Motherboard manufacturer : Intel Corporation CPU family : Other CPU information : Genuine Intel(R) CPU 0000 @ 0.90GHz GPU Card : Intel Corporation Device 591e (prog-if 00 [VGA controller]) |=== kernel information ===| commit 2ec823981d62c56d1511bda42b8295e31ece800f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Sun May 22 18:23:13 2016 +0200 ========== test cases failed =============== igt@gem_exec_lut_handle Is this is still an issue? I am noticing that the original bug was reported against PNV with some regression data. Is it still occurring on PNV? However on SKL & KBL, we are only considering the overall time and no real regression data. So removing for now SKL & KBL from the current platforms until we have concrete data here. Yes, the regression is still present. And that regression impacts all machines, though there is a separate regression for Baytrail that nobody else appears to have noticed. No need for bisecting, the original cause was the double lookup of handle to vma. Patches are still pending (though some are already in place). Chris - any progress/news on patches still pending? Adding tag into "Whiteboard" field - ReadyForDev The bug still active *Status is correct *Platform is included *Feature is included *Priority and Severity correctly set *Logs included Two major improvements landed: commit 4ff4b44cbb70c269259958cbcc48d7b8a2cb9ec8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 16 15:05:16 2017 +0100 drm/i915: Store a direct lookup from object handle to vma commit 2889caa9232109afc8881f29a2205abeb5709d0c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 16 15:05:19 2017 +0100 drm/i915: Eliminate lots of iterations over the execobjects array which used to be enough to not only fix the original regression but make a substantial improvement. However, since the regression was noticed, I added two more iterations to the test (to exercise different paths through reloc) and we now interact with dma-fence and reservation_objects which are a huge step backwards. Resolving as the original regression is fixed. I've some patches to improve scalabiltiy of reservation_objects, but the main cost of them are the atomics we so painstakingly avoided previously (because of this bug highlighting how expensive they were along this path). In order to be able to close this bug, please test and share results. Thank you. Will close this issue as we do not have PNV to verify (In reply to Ricardo from comment #17) > Will close this issue as we do not have PNV to verify It was never PNV specific. (In reply to Elizabeth from comment #16) > In order to be able to close this bug, please test and share results. Thank > you. That is your job. I have given you the tools.... Works fine with the following configuration on BXT, GLK, KBL, SKL, BDW, BSW, HSW IVB and BYT. This is my Configuration: ====================================== Kernel ====================================== commit df0182c2c95385492772c6e4ace76b463298b8ca Author: Jani Nikula <jani.nikula@intel.com> AuthorDate: Mon Jul 3 16:20:53 2017 +0300 Commit: Jani Nikula <jani.nikula@intel.com> CommitDate: Mon Jul 3 16:20:53 2017 +0300 drm-tip: 2017y-07m-03d-13h-20m-24s UTC integration manifest ====================================== Graphic drivers ====================================== Component: drm tag: libdrm-2.4.81-24-g3095cc8 commit: 3095cc8eaba1aa87ad38c04ae2b1eabe30f7e16c Component: cairo tag: 1.15.6-2-g57b4050 commit: 57b40507dda3f58dfc8635548d606b86dc7bcf51 Component: intel-gpu-tools tag: intel-gpu-tools-1.19-57-g6fcc8e8 commit: 6fcc8e8b247661c7950b998e0b95141ffbd6b833 Component: piglit tag: piglit-v1 commit: c8f4fd9eeb298a2ef0855927f22634f794ef3eff ========================================== |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.