Bug 81735 - [BSW]igt/gem_exec_lut_handle cost long time to execute sporadically
Summary: [BSW]igt/gem_exec_lut_handle cost long time to execute sporadically
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-25 08:42 UTC by Guo Jinxian
Modified: 2017-10-06 14:36 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (125.03 KB, text/plain)
2014-07-25 08:42 UTC, Guo Jinxian
no flags Details
dmesg on -next-queued branch (126.40 KB, text/plain)
2014-07-25 08:59 UTC, Guo Jinxian
no flags Details

Description Guo Jinxian 2014-07-25 08:42:24 UTC
Created attachment 103430 [details]
dmesg

==System Environment==
--------------------------
Regression: No
The fail rate about 1 out of 5

Non-working platforms: BSW

==kernel==
--------------------------
origin/drm-intel-nightly: 089bea2664988de48fccc81407ebe12115156731(fails)
    drm-intel-nightly: 2014y-07m-24d-09h-55m-38s integration manifest
origin/drm-intel-next-queued: 2b8ee5710915a244b807fbe3427299d007d19e89(fails)
    drm/i915: Allow overlapping userptr objects  
origin/drm-intel-fixes: eedd10f45bdcb2a5b2afa35f845e080c3bc984f2(fails)
    drm/i915: Simplify i915_gem_release_all_mmaps()

==Bug detailed description==
-----------------------------
igt/gem_exec_lut_handle cost long time to execute sporadically

Output:
[root@x-bsw01 tests]# time ./gem_exec_lut_handle
IGT-Version: 1.7-g4d4f4b2 (x86_64) (Linux: 3.16.0-rc6_drm-intel-nightly_089bea_20140725_debug+ x86_64)
relocation: buffers=   1: old= 105904 + 180.9*reloc, lut= 102425 + 173.7*reloc (ns)
relocation: buffers=   2: old= 104746 + 183.0*reloc, lut= 104779 + 173.9*reloc (ns)
relocation: buffers=   4: old= 107092 + 182.0*reloc, lut= 107067 + 173.6*reloc (ns)
relocation: buffers=   8: old= 113138 + 181.9*reloc, lut= 111362 + 173.7*reloc (ns)
relocation: buffers=  16: old= 122958 + 182.4*reloc, lut= 122497 + 173.9*reloc (ns)
relocation: buffers=  32: old= 144296 + 182.9*reloc, lut= 143878 + 174.1*reloc (ns)
relocation: buffers=  64: old= 188723 + 185.5*reloc, lut= 187134 + 176.3*reloc (ns)
relocation: buffers= 128: old= 277146 + 190.2*reloc, lut= 275321 + 179.9*reloc (ns)
relocation: buffers= 256: old= 458424 + 199.5*reloc, lut= 454968 + 186.3*reloc (ns)
relocation: buffers= 512: old= 803739 + 226.0*reloc, lut= 798157 + 195.7*reloc (ns)
relocation: buffers=1024: old=1865623 + 323.2*reloc, lut=1904592 + 260.9*reloc (ns)
relocation: buffers=2048: old=7074389 + 436.8*reloc, lut=7076460 + 315.6*reloc (ns)
skip-relocs: buffers=   1: old= 100994 + 91.6*reloc, lut= 100750 + 85.5*reloc (ns)
skip-relocs: buffers=   2: old= 102906 + 91.5*reloc, lut= 101958 + 85.5*reloc (ns)
skip-relocs: buffers=   4: old= 105166 + 91.3*reloc, lut= 104126 + 85.5*reloc (ns)
skip-relocs: buffers=   8: old= 111004 + 91.3*reloc, lut= 109197 + 85.3*reloc (ns)
skip-relocs: buffers=  16: old= 121325 + 93.1*reloc, lut= 121063 + 85.2*reloc (ns)
skip-relocs: buffers=  32: old= 142916 + 91.7*reloc, lut= 142387 + 85.2*reloc (ns)
skip-relocs: buffers=  64: old= 186514 + 93.1*reloc, lut= 185571 + 85.6*reloc (ns)
skip-relocs: buffers= 128: old= 274873 + 97.1*reloc, lut= 273076 + 88.7*reloc (ns)
skip-relocs: buffers= 256: old= 456788 + 106.4*reloc, lut= 453750 + 94.1*reloc (ns)
skip-relocs: buffers= 512: old= 806428 + 130.0*reloc, lut= 802482 + 103.1*reloc (ns)
skip-relocs: buffers=1024: old=1895140 + 212.7*reloc, lut=1929344 + 159.2*reloc (ns)
skip-relocs: buffers=2048: old=7058420 + 321.6*reloc, lut=7064467 + 214.6*reloc (ns)
no-relocs: buffers=   1: old= 100038 + 30.8*reloc, lut= 100159 + 23.7*reloc (ns)
no-relocs: buffers=   2: old= 102166 + 30.7*reloc, lut= 101549 + 23.9*reloc (ns)
no-relocs: buffers=   4: old= 104208 + 30.7*reloc, lut= 104186 + 23.9*reloc (ns)
no-relocs: buffers=   8: old= 109841 + 30.8*reloc, lut= 108595 + 23.9*reloc (ns)
no-relocs: buffers=  16: old= 121203 + 30.4*reloc, lut= 119996 + 23.8*reloc (ns)
no-relocs: buffers=  32: old= 141550 + 30.6*reloc, lut= 140940 + 23.7*reloc (ns)
no-relocs: buffers=  64: old= 184514 + 30.6*reloc, lut= 183211 + 23.6*reloc (ns)
no-relocs: buffers= 128: old= 270394 + 30.4*reloc, lut= 268552 + 23.4*reloc (ns)
no-relocs: buffers= 256: old= 448667 + 30.4*reloc, lut= 445651 + 22.6*reloc (ns)
no-relocs: buffers= 512: old= 785921 + 33.1*reloc, lut= 779181 + 24.3*reloc (ns)
no-relocs: buffers=1024: old=1826164 + 59.9*reloc, lut=1847018 + 49.0*reloc (ns)
no-relocs: buffers=2048: old=6972076 + 41.9*reloc, lut=6975994 + 30.0*reloc (ns)

real    15m53.222s
user    0m20.729s
sys     15m29.287s


Reproduce steps:
-------------------------
1. ./gem_exec_lut_handle
Comment 1 Chris Wilson 2014-07-25 08:55:40 UTC
How does the pass compare? (i.e. please attach the good output and dmesg)
Comment 2 Guo Jinxian 2014-07-25 08:57:38 UTC
(In reply to comment #1)
> How does the pass compare? (i.e. please attach the good output and dmesg)

Here is the output on latest -next-queued which passed.

[root@x-bsw01 tests]# time ./gem_exec_lut_handle
IGT-Version: 1.7-g4d4f4b2 (x86_64) (Linux: 3.16.0-rc4_drm-intel-next-queued_2b8ee5_20140725+ x86_64)
relocation: buffers=   1: old=  20213 + 160.6*reloc, lut=  21170 + 150.9*reloc (ns)
relocation: buffers=   2: old=  23548 + 160.3*reloc, lut=  26650 + 149.6*reloc (ns)
relocation: buffers=   4: old=  23450 + 160.0*reloc, lut=  24112 + 150.5*reloc (ns)
relocation: buffers=   8: old=  28316 + 159.0*reloc, lut=  30131 + 149.1*reloc (ns)
relocation: buffers=  16: old=  29774 + 160.0*reloc, lut=  31269 + 150.0*reloc (ns)
relocation: buffers=  32: old=  35678 + 161.4*reloc, lut=  35133 + 151.7*reloc (ns)
relocation: buffers=  64: old=  60752 + 162.4*reloc, lut=  62951 + 151.1*reloc (ns)
relocation: buffers= 128: old=  95711 + 172.6*reloc, lut=  94049 + 160.7*reloc (ns)
relocation: buffers= 256: old= 188186 + 178.5*reloc, lut= 184888 + 165.0*reloc (ns)
relocation: buffers= 512: old= 362717 + 195.3*reloc, lut= 356609 + 168.5*reloc (ns)
relocation: buffers=1024: old= 752565 + 258.9*reloc, lut= 737432 + 204.8*reloc (ns)

relocation: buffers=2048: old=4130880 + 375.5*reloc, lut=4117357 + 279.1*reloc (ns)
skip-relocs: buffers=   1: old=   7075 + 87.0*reloc, lut=   6909 + 80.3*reloc (ns)
skip-relocs: buffers=   2: old=   7721 + 87.0*reloc, lut=   7554 + 80.3*reloc (ns)
skip-relocs: buffers=   4: old=   8532 + 87.0*reloc, lut=   8321 + 80.3*reloc (ns)
skip-relocs: buffers=   8: old=  10332 + 87.2*reloc, lut=  10062 + 80.5*reloc (ns)
skip-relocs: buffers=  16: old=  14190 + 87.4*reloc, lut=  13790 + 80.7*reloc (ns)
skip-relocs: buffers=  32: old=  22911 + 87.4*reloc, lut=  22027 + 80.8*reloc (ns)
skip-relocs: buffers=  64: old=  43343 + 89.4*reloc, lut=  41712 + 82.4*reloc (ns)
skip-relocs: buffers= 128: old=  94919 + 92.8*reloc, lut=  93175 + 83.8*reloc (ns)
skip-relocs: buffers= 256: old= 186820 + 99.9*reloc, lut= 183693 + 88.6*reloc (ns)
skip-relocs: buffers= 512: old= 361709 + 115.8*reloc, lut= 355697 + 92.4*reloc (ns)
skip-relocs: buffers=1024: old= 743773 + 174.1*reloc, lut= 735623 + 126.8*reloc (ns)
skip-relocs: buffers=2048: old=4132826 + 283.8*reloc, lut=4114821 + 194.7*reloc (ns)
no-relocs: buffers=   1: old=   6812 + 30.2*reloc, lut=   6704 + 23.5*reloc (ns)
no-relocs: buffers=   2: old=   7248 + 30.3*reloc, lut=   7071 + 23.6*reloc (ns)
no-relocs: buffers=   4: old=   8168 + 30.3*reloc, lut=   7906 + 23.7*reloc (ns)
no-relocs: buffers=   8: old=   9970 + 30.3*reloc, lut=   9437 + 23.7*reloc (ns)
no-relocs: buffers=  16: old=  13394 + 30.6*reloc, lut=  13017 + 23.9*reloc (ns)
no-relocs: buffers=  32: old=  21729 + 30.6*reloc, lut=  21105 + 24.0*reloc (ns)
no-relocs: buffers=  64: old=  40860 + 31.5*reloc, lut=  39069 + 25.0*reloc (ns)
no-relocs: buffers= 128: old=  90798 + 29.8*reloc, lut=  88936 + 22.9*reloc (ns)
no-relocs: buffers= 256: old= 178076 + 29.7*reloc, lut= 174850 + 22.4*reloc (ns)
no-relocs: buffers= 512: old= 343969 + 31.0*reloc, lut= 337410 + 22.4*reloc (ns)
no-relocs: buffers=1024: old= 714549 + 46.5*reloc, lut= 704009 + 37.8*reloc (ns)
no-relocs: buffers=2048: old=4040396 + 49.3*reloc, lut=4017948 + 37.6*reloc (ns)

real    8m19.540s
user    0m18.975s
sys     7m59.965s
Comment 3 Guo Jinxian 2014-07-25 08:59:48 UTC
Created attachment 103431 [details]
dmesg on -next-queued branch

(In reply to comment #2)
> (In reply to comment #1)
> > How does the pass compare? (i.e. please attach the good output and dmesg)
> 
> Here is the output on latest -next-queued which passed.
> 
> [root@x-bsw01 tests]# time ./gem_exec_lut_handle
> IGT-Version: 1.7-g4d4f4b2 (x86_64) (Linux:
> 3.16.0-rc4_drm-intel-next-queued_2b8ee5_20140725+ x86_64)
> relocation: buffers=   1: old=  20213 + 160.6*reloc, lut=  21170 +
> 150.9*reloc (ns)
> relocation: buffers=   2: old=  23548 + 160.3*reloc, lut=  26650 +
> 149.6*reloc (ns)
> relocation: buffers=   4: old=  23450 + 160.0*reloc, lut=  24112 +
> 150.5*reloc (ns)
> relocation: buffers=   8: old=  28316 + 159.0*reloc, lut=  30131 +
> 149.1*reloc (ns)
> relocation: buffers=  16: old=  29774 + 160.0*reloc, lut=  31269 +
> 150.0*reloc (ns)
> relocation: buffers=  32: old=  35678 + 161.4*reloc, lut=  35133 +
> 151.7*reloc (ns)
> relocation: buffers=  64: old=  60752 + 162.4*reloc, lut=  62951 +
> 151.1*reloc (ns)
> relocation: buffers= 128: old=  95711 + 172.6*reloc, lut=  94049 +
> 160.7*reloc (ns)
> relocation: buffers= 256: old= 188186 + 178.5*reloc, lut= 184888 +
> 165.0*reloc (ns)
> relocation: buffers= 512: old= 362717 + 195.3*reloc, lut= 356609 +
> 168.5*reloc (ns)
> relocation: buffers=1024: old= 752565 + 258.9*reloc, lut= 737432 +
> 204.8*reloc (ns)
> 
> relocation: buffers=2048: old=4130880 + 375.5*reloc, lut=4117357 +
> 279.1*reloc (ns)
> skip-relocs: buffers=   1: old=   7075 + 87.0*reloc, lut=   6909 +
> 80.3*reloc (ns)
> skip-relocs: buffers=   2: old=   7721 + 87.0*reloc, lut=   7554 +
> 80.3*reloc (ns)
> skip-relocs: buffers=   4: old=   8532 + 87.0*reloc, lut=   8321 +
> 80.3*reloc (ns)
> skip-relocs: buffers=   8: old=  10332 + 87.2*reloc, lut=  10062 +
> 80.5*reloc (ns)
> skip-relocs: buffers=  16: old=  14190 + 87.4*reloc, lut=  13790 +
> 80.7*reloc (ns)
> skip-relocs: buffers=  32: old=  22911 + 87.4*reloc, lut=  22027 +
> 80.8*reloc (ns)
> skip-relocs: buffers=  64: old=  43343 + 89.4*reloc, lut=  41712 +
> 82.4*reloc (ns)
> skip-relocs: buffers= 128: old=  94919 + 92.8*reloc, lut=  93175 +
> 83.8*reloc (ns)
> skip-relocs: buffers= 256: old= 186820 + 99.9*reloc, lut= 183693 +
> 88.6*reloc (ns)
> skip-relocs: buffers= 512: old= 361709 + 115.8*reloc, lut= 355697 +
> 92.4*reloc (ns)
> skip-relocs: buffers=1024: old= 743773 + 174.1*reloc, lut= 735623 +
> 126.8*reloc (ns)
> skip-relocs: buffers=2048: old=4132826 + 283.8*reloc, lut=4114821 +
> 194.7*reloc (ns)
> no-relocs: buffers=   1: old=   6812 + 30.2*reloc, lut=   6704 + 23.5*reloc
> (ns)
> no-relocs: buffers=   2: old=   7248 + 30.3*reloc, lut=   7071 + 23.6*reloc
> (ns)
> no-relocs: buffers=   4: old=   8168 + 30.3*reloc, lut=   7906 + 23.7*reloc
> (ns)
> no-relocs: buffers=   8: old=   9970 + 30.3*reloc, lut=   9437 + 23.7*reloc
> (ns)
> no-relocs: buffers=  16: old=  13394 + 30.6*reloc, lut=  13017 + 23.9*reloc
> (ns)
> no-relocs: buffers=  32: old=  21729 + 30.6*reloc, lut=  21105 + 24.0*reloc
> (ns)
> no-relocs: buffers=  64: old=  40860 + 31.5*reloc, lut=  39069 + 25.0*reloc
> (ns)
> no-relocs: buffers= 128: old=  90798 + 29.8*reloc, lut=  88936 + 22.9*reloc
> (ns)
> no-relocs: buffers= 256: old= 178076 + 29.7*reloc, lut= 174850 + 22.4*reloc
> (ns)
> no-relocs: buffers= 512: old= 343969 + 31.0*reloc, lut= 337410 + 22.4*reloc
> (ns)
> no-relocs: buffers=1024: old= 714549 + 46.5*reloc, lut= 704009 + 37.8*reloc
> (ns)
> no-relocs: buffers=2048: old=4040396 + 49.3*reloc, lut=4017948 + 37.6*reloc
> (ns)
> 
> real    8m19.540s
> user    0m18.975s
> sys     7m59.965s
Comment 4 Chris Wilson 2014-09-10 08:17:22 UTC
So looks like there is fixed overhead for each ioctl in the failure case (from 6us to 100us).

Could this just be related to kernel debug options? But then it would not be so sporadic.

If you can find a way to reproduce it consistently, the overhead should show up in a perf profile.
Comment 5 Jesse Barnes 2014-12-05 19:50:40 UTC
Jinxian, can you collect a perf profile when this happens?  That should narrow things down pretty quickly.
Comment 6 Guo Jinxian 2014-12-08 08:44:29 UTC
(In reply to Jesse Barnes from comment #5)
> Jinxian, can you collect a perf profile when this happens?  That should
> narrow things down pretty quickly.

The failure unable to reproduce on latest -nightly(bfdd01aa1825aa0068f9236b21362b550f6d630f)

root@x-bsw01:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# time ./gem_exec_lut_handle
IGT-Version: 1.8-g819e68f (x86_64) (Linux: 3.18.0-rc7_drm-intel-nightly_bfdd01_20141208+ x86_64)

real    7m57.688s
user    0m19.124s
sys     7m37.680s

real    7m44.798s
user    0m18.992s
sys     7m24.948s

real    7m58.919s
user    0m19.353s
sys     7m38.629s

real    7m55.662s
user    0m18.616s
sys     7m36.181s

real    7m49.988s
user    0m19.228s
sys     7m29.939s
Comment 7 Elizabeth 2017-10-06 14:36:57 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.