Bug 83647 - [all Bisected]igt/gem_persistent_relocs and igt/gem_reloc_vs_gpu some subcases timeout
Summary: [all Bisected]igt/gem_persistent_relocs and igt/gem_reloc_vs_gpu some subcase...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-09 03:39 UTC by Guo Jinxian
Modified: 2017-07-03 13:57 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (80.19 KB, text/plain)
2014-09-09 03:39 UTC, Guo Jinxian
no flags Details

Description Guo Jinxian 2014-09-09 03:39:21 UTC
Created attachment 105948 [details]
dmesg

==System Environment==
--------------------------
Regression: Yes.
Bisected

Non-working platforms: PNV/BYT

==kernel==
--------------------------
origin/drm-intel-nightly: 4a3d32734bdcef6813b31f06a58430436e98711e(fails)
    drm-intel-nightly: 2014y-09m-08d-18h-33m-01s integration manifest
origin/drm-intel-next-queued: a2ca46441decdcdf4010f1db8a7041c8851327b3(fails)
    drm/i915: split intel_primary_plane_setplane() into check() and commit()
origin/drm-intel-fixes: 7a98948f3b536ca9a077e84966ddc0e9f53726df(fails)
    drm/i915: Wait for vblank before enabling the TV encoder

==Bug detailed description==
-----------------------------
igt/gem_persistent_relocs and igt/gem_reloc_vs_gpu some subcases timeout

Case list:
igt/gem_persistent_relocs/forked-faulting-reloc-thrash-inactive
igt/gem_persistent_relocs/forked-interruptible-faulting-reloc-thrash-inactive
igt/gem_persistent_relocs/forked-interruptible-thrash-inactive
igt/gem_persistent_relocs/forked-thrash-inactive

igt/gem_reloc_vs_gpu/forked-faulting-reloc-thrash-inactive
igt/gem_reloc_vs_gpu/forked-interruptible-faulting-reloc-thrash-inactive
igt/gem_reloc_vs_gpu/forked-interruptible-thrash-inactive
igt/gem_reloc_vs_gpu/forked-thrash-inactive


Output:
root@x-bytm02:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# time ./gem_persistent_relocs --run-subtest forked-interruptible-faulting-reloc-thrash-inactive
IGT-Version: 1.7-gac3d060 (x86_64) (Linux: 3.17.0-rc4_drm-intel-nightly_4a3d32_20140909+ x86_64)
^C^C^C

^C^C
^C
^Z
[1]+  Stopped                 ./gem_persistent_relocs --run-subtest forked-interruptible-faulting-reloc-thrash-inactive

real    10m53.109s
user    0m0.000s
sys     0m0.000s


==Reproduce steps==
---------------------------- 
1. time ./gem_persistent_relocs --run-subtest forked-interruptible-faulting-reloc-thrash-inactive

==Bisect results==
----------------------------
Bisect shows: 4ad72b7fadd285f849439cdbc408f8b847cef704 is the first bad commit
commit 4ad72b7fadd285f849439cdbc408f8b847cef704
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Wed Sep 3 19:23:37 2014 +0100
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Thu Sep 4 09:56:07 2014 +0200

    drm/i915: Fix unsafe vma iteration in i915_drop_caches
    
    When unbinding, there is a possibility that we drop the active reference
    on the object, thereby freeing it. If that happens, we may destroy the
    vm link as well as the object and vma. So iterate carefully.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 1 lu hua 2014-09-09 03:41:44 UTC
It impacts all platforms.
Comment 2 Chris Wilson 2014-09-09 06:03:23 UTC
Object disappears during unbind. The only question is how that only just affected you.
Comment 3 Chris Wilson 2014-09-10 08:10:41 UTC
I still have no idea how this is the first time it showed up, but

commit ab4a7b96c7c9980f306730eee7667639d6221ef2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 9 11:16:08 2014 +0100

    drm/i915: Objects on the unbound list may still have an active reference
    
    Due to the lazy retirement semantics, even though we have unbound an
    object, it may still hold onto an active reference. So in the debug code,
    play safe.
    
    v2: Export i915_gem_shrink() rather than opencoding it.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

should be the last fix we ever need for drop_caches (tm)!

And I think

commit ace110dfad1b9ac2c724e1c1251c0faa8a408fa1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 9 07:02:43 2014 +0100

    drm/i915: Drop any active reference before unbinding
    
    Before we process the final unbind on an object and move it to the
    unbound list, it is semantically cleaner if there are no more active
    references to the object. (An active reference would imply that it was
    still being accessed by the GPU after it became inaccessible.) The
    caveat is that all callsites must be prepared for the object to
    disappeared during the unbind - i.e. they must hold their own reference.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

is the root cause.
Comment 4 Guo Jinxian 2014-09-11 01:57:20 UTC
Verified.

root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests#./gem_persistent_relocs --run-subtest forked-interruptible-faulting-reloc-thrash-inactive
IGT-Version: 1.8-g107151c (x86_64) (Linux: 3.17.0-rc4_drm-intel-nightly_99f444_20140910+ x86_64)
Subtest forked-interruptible-faulting-reloc-thrash-inactive: SUCCESS (4.306s)

root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_reloc_vs_gpu --run-subtest forked-faulting-reloc-thrash-inactive
IGT-Version: 1.8-g107151c (x86_64) (Linux: 3.17.0-rc4_drm-intel-nightly_99f444_20140910+ x86_64)
Subtest forked-faulting-reloc-thrash-inactive: SUCCESS (4.387s)
Comment 5 Jari Tahvanainen 2017-07-03 13:57:00 UTC
Closing old verified+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.