78023 – [PNV/HSW/BDW bisected]igt/gem_exec_lut_handle is slow

Bug 78023 - [PNV/HSW/BDW bisected]igt/gem_exec_lut_handle is slow

Summary: [PNV/HSW/BDW bisected]igt/gem_exec_lut_handle is slow

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Chris Wilson
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Duplicates (1):	78246 (view as bug list)
Depends on:
Blocks:

Reported:	2014-04-28 07:47 UTC by Guo Jinxian
Modified:	2016-10-25 11:30 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg (95.41 KB, text/plain) 2014-04-28 07:47 UTC, Guo Jinxian	no flags	Details
View All

Description Guo Jinxian 2014-04-28 07:47:00 UTC

Created attachment 98105 [details]
dmesg

system Environment:
--------------------------
Platform: BDW
Kernel:(drm-intel-nightly)11ddb598492d1f97f894495eea398febb41a9eb0

Bug detailed description:
-----------------------------
igt/gem_exec_lut_handle is slow on -nightly(11ddb598492d1f97f894495eea398febb41a9eb0) and -next-queued(b7c0d9df97c10ec5693a838df2fd53058f8e9e96). it's passed on -fixes(7f1950fbb989e8fc5463b307e062b4529d51c862)
.
it spends about 15 minutes.

output:
 date;./gem_exec_lut_handle;date
Mon Apr 28 14:30:05 EDT 2014
IGT-Version: 1.6-ga595a40 (x86_64) (Linux: 3.15.0-rc2_drm-intel-nightly_11ddb5_20140428+ x86_64)
relocation: buffers=   1: old= 125316 + 117.9*reloc, lut= 146850 + 112.0*reloc (ns)
relocation: buffers=   2: old= 406548 + 141.1*reloc, lut= 419875 + 140.6*reloc (ns)
relocation: buffers=   4: old= 800835 + 139.2*reloc, lut= 816025 + 137.0*reloc (ns)
relocation: buffers=   8: old=1166923 + 135.5*reloc, lut=1232500 + -75.5*reloc (ns)
relocation: buffers=  16: old= 192457 + 81.2*reloc, lut= 205708 + 65.6*reloc (ns)
relocation: buffers=  32: old= 192791 + 122.6*reloc, lut= 209907 + 119.2*reloc (ns)
relocation: buffers=  64: old= 506904 + 148.8*reloc, lut= 542884 + 60.0*reloc (ns)
relocation: buffers= 128: old= 191845 + 132.5*reloc, lut= 202870 + 130.7*reloc (ns)
relocation: buffers= 256: old= 526010 + 175.1*reloc, lut= 550133 + 139.0*reloc (ns)
relocation: buffers= 512: old= 359939 + 120.2*reloc, lut= 364441 + 101.7*reloc (ns)
relocation: buffers=1024: old= 666040 + 193.0*reloc, lut= 659301 + 164.3*reloc (ns)
relocation: buffers=2048: old=1392698 + 90.6*reloc, lut=1326707 + 31.1*reloc (ns)
skip-relocs: buffers=   1: old=  99454 + 77.9*reloc, lut= 109764 + 76.7*reloc (ns)
skip-relocs: buffers=   2: old= 404212 + 101.5*reloc, lut= 418515 + 100.4*reloc (ns)
skip-relocs: buffers=   4: old= 824913 + -9.5*reloc, lut= 864003 + -114.4*reloc (ns)
skip-relocs: buffers=   8: old= 100407 + 78.6*reloc, lut= 110857 + 77.6*reloc (ns)
skip-relocs: buffers=  16: old= 411945 + 102.9*reloc, lut= 427001 + 101.5*reloc (ns)
skip-relocs: buffers=  32: old= 815200 + 95.8*reloc, lut= 830975 + 92.6*reloc (ns)
skip-relocs: buffers=  64: old=1165378 + 89.7*reloc, lut=1178587 + 87.6*reloc (ns)
skip-relocs: buffers= 128: old=1524780 + 90.8*reloc, lut=1537187 + 89.3*reloc (ns)
skip-relocs: buffers= 256: old=1906707 + 92.3*reloc, lut=1918720 + 89.7*reloc (ns)
skip-relocs: buffers= 512: old=2337270 + 105.8*reloc, lut=2347068 + 91.7*reloc (ns)
skip-relocs: buffers=1024: old=2053583 + -544.2*reloc, lut=1952384 + -534.9*reloc (ns)
skip-relocs: buffers=2048: old= 680265 + 113.4*reloc, lut= 652852 + 68.2*reloc (ns)
no-relocs: buffers=   1: old= 304519 + 32.3*reloc, lut= 331179 + -24.8*reloc (ns)
no-relocs: buffers=   2: old=  97862 + 64.8*reloc, lut= 108347 + 65.1*reloc (ns)
no-relocs: buffers=   4: old= 414029 + 94.3*reloc, lut= 430559 + 93.7*reloc (ns)
no-relocs: buffers=   8: old= 838338 + 75.1*reloc, lut= 852344 + 74.6*reloc (ns)
no-relocs: buffers=  16: old=1185574 + 70.9*reloc, lut=1198192 + 70.3*reloc (ns)
no-relocs: buffers=  32: old=1516555 + 70.3*reloc, lut=1528722 + 69.6*reloc (ns)
no-relocs: buffers=  64: old=1847106 + 69.1*reloc, lut=1859239 + 69.6*reloc (ns)
no-relocs: buffers= 128: old=2190399 + 75.2*reloc, lut=2205177 + 71.6*reloc (ns)
no-relocs: buffers= 256: old=2566325 + 71.3*reloc, lut=2578291 + 72.2*reloc (ns)
no-relocs: buffers= 512: old= 730893 + -143.7*reloc, lut= 671261 + -119.0*reloc (ns)
no-relocs: buffers=1024: old= 660928 + -76.5*reloc, lut= 667962 + -86.1*reloc (ns)
no-relocs: buffers=2048: old= 689709 + 79.9*reloc, lut= 683225 + 77.5*reloc (ns)
Mon Apr 28 14:44:33 EDT 2014


Reproduce steps:
----------------------------
1. ./gem_exec_lut_handle

Comment 1 Chris Wilson 2014-04-28 08:31:08 UTC

You should be able to bisect this based on the first line (rather than wait 15 minutes).

Comment 2 Guo Jinxian 2014-04-30 07:31:56 UTC

9403eb1064168ea7b47c5ccd04ec17b98ca9a0de is the first bad commit
commit 9403eb1064168ea7b47c5ccd04ec17b98ca9a0de
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Mon Mar 17 12:21:55 2014 +0000
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Fri Apr 25 16:18:01 2014 +0200

    drm/i915: Do not call retire_requests from wait_for_rendering

    A common issue we have is that retiring requests causes recursion
    through GTT manipulation or page table manipulation which we can only
    handle at very specific points. However, to maintain internal
    consistency (enforced through our sanity checks on write_domain at
    various points in the GEM object lifecycle) we do need to retire the
    object prior to marking it with a new write_domain, and also clear the
    write_domain for the implicit flush following a batch.

    Note that this then allows the unbound objects to still be on the active
    lists, and so care must be taken when removing objects from unbound lists
    (similar to the caveats we face processing the bound lists).

    v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules,
    by refactoring it to call into __i915_gem_shrink().

    v3: Missed an object-retire prior to changing cache domains in
    i915_gem_object_set_cache_leve()

    v4: Rebase

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


:040000 040000 469c92ff9c22309ca81fa43a75c249c1112b69da a036ee8486228ba2ddc646c607595011a31cdde7 M      drivers

Comment 3 Chris Wilson 2014-04-30 07:33:22 UTC

You only see this on BDW right? As it doesn't seem to affect the platforms here.

Comment 4 Guo Jinxian 2014-05-04 05:53:47 UTC

(In reply to comment #3)
> You only see this on BDW right? As it doesn't seem to affect the platforms
> here.

This bug still can be reproduce on PNV and HSW, Thanks.

Comment 5 Chris Wilson 2014-05-05 08:02:15 UTC

The root cause is ppgtt:

commit 1f70999f9052f5a1b0ce1a55aff3808f2ec9fe42
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 27 22:43:07 2014 +0000

    drm/i915: Prevent recursion by retiring requests when the ring is full
    
    As the VM do not track activity of objects and instead use a large
    hammer to forcibly idle and evict all of their associated objects when
    one is released, it is possible for that to cause a recursion when we
    need to wait for free space on a ring and call retire requests.
    (intel_ring_begin -> intel_ring_wait_request ->
    i915_gem_retire_requests_ring -> i915_gem_context_free ->
    i915_gem_evict_vm -> i915_gpu_idle -> intel_ring_begin etc)
    
    In order to remove the requirement for calling retire-requests from
    intel_ring_wait_request, we have to inline a couple of steps from
    retiring requests, notably we have to record the position of the request
    we wait for and use that to update the available ring space.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

removes the i915_gem_retire_requests_ring() when we run out of ring space, and this test has no explicit request retiring and so depends upon this implicit retire.

Comment 6 Chris Wilson 2014-05-06 05:31:34 UTC

*** Bug 78246 has been marked as a duplicate of this bug. ***

Comment 7 Chris Wilson 2014-05-08 07:43:56 UTC

commit 1cf0ba14740d96fbf6f58a201f000a34b74f4725
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon May 5 09:07:33 2014 +0100

    drm/i915: Flush request queue when waiting for ring space
    
    During the review of
    
    commit 1f70999f9052f5a1b0ce1a55aff3808f2ec9fe42
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Mon Jan 27 22:43:07 2014 +0000
    
        drm/i915: Prevent recursion by retiring requests when the ring is full
    
    Ville raised the point that our interaction with request->tail was
    likely to foul up other uses elsewhere (such as hang check comparing
    ACTHD against requests).
    
    However, we also need to restore the implicit retire requests that certain
    test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the
    spectre that the ppgtt will randomly call i915_gpu_idle() and recurse
    back into intel_ring_begin().
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023
    Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
    [danvet: Remove now unused 'tail' variable as spotted by Brad.]
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Comment 8 Guo Jinxian 2014-05-15 06:01:32 UTC

Fixes on latest -nightly(2be456541ea41728002ccca2de5235f48d14326e), Thanks.

Comment 9 Jari Tahvanainen 2016-10-25 11:30:28 UTC

Closing verified+fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.