Bug 72557

Summary: [IVB/HSW/BYT]igt/gem_ctx_exec/eviction causes system hang with -queued kernel
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: intel-gfx-bugs
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg none

Description lu hua 2013-12-10 07:42:54 UTC
System Environment:
--------------------------
Arch:             x86_64
Platform:         Ivybridge/Haswell
Kernel:      (drm-intel-next-queued)798183c54799fbe1e5a5bfabb3a8c0505ffd2149

Bug detailed description:
---------------------------
It causes system hang with -queued kernel. It fails on -fixes and -nightly kernel(bug 72507)

Bisect on -fixes kernel, commit a415d355645ca5e8797235a76026ca2622ceefdb fixed it.
commit a415d355645ca5e8797235a76026ca2622ceefdb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 26 11:23:15 2013 +0000

    drm/i915: Pin relocations for the duration of constructing the execbuffer

    As the execbuffer dispatch grows ever more complex and involves multiple
    stages of moving objects into the aperture, we need to take greater care
    that we do not evict our execbuffer objects prior to dispatch. This is
    relatively simple as we can just keep the objects pinned for not just
    the relocation but until we are finished.

    One such example is the possibility of the context switch causing an
    eviction or hitting the shrinker in order to fit its object into the
    aperture.

    Link: http://lists.freedesktop.org/archives/intel-gfx/2013-November/036166.h                                                                                                 tml
    Reported-by: "Siluvery, Arun" <arun.siluvery@intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Ben Widawsky <benjamin.widawsky@intel.com>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: stable@vger.kernel.org


output:
trying buffer count 521599
trying buffer count 521598
trying buffer count 521597
trying buffer count 521596

Reproduce steps:
-------------------------
1. ./gem_ctx_exec --run-subtest eviction
Comment 1 lu hua 2013-12-10 07:47:39 UTC
Created attachment 90556 [details]
dmesg
Comment 2 Daniel Vetter 2013-12-10 09:48:12 UTC
Can you please double-check that latest -nightly really doesn't work? It now contains the fix ...
Comment 3 Daniel Vetter 2013-12-10 09:49:15 UTC
*** Bug 72506 has been marked as a duplicate of this bug. ***
Comment 4 lu hua 2013-12-11 03:29:41 UTC
It works well on latest -nightly branch.
This issue only happens on -queued branch.
Comment 5 lu hua 2013-12-30 06:22:06 UTC
(In reply to comment #2)
> Can you please double-check that latest -nightly really doesn't work? It now
> contains the fix ...


It still fails with OOM killer on BYT.
Comment 6 Daniel Vetter 2014-01-08 19:25:27 UTC
Oops, the RAM condition to skip the check on machines with too little memory was botched. Please re-run, the test should now skip on BYT instead of OOM, but on all other platforms with sufficient RAM (4G or more) it should still work.

Please make sure we don't have any unexpected skips now when validating:

commit 7775fca2df815dfee18b181de6fe13df27bb9867
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jan 8 20:24:36 2014 +0100

    tests/gem_ctx_exec: fix ram requirement fumble
Comment 7 lu hua 2014-01-14 05:54:22 UTC
Verifies.Fixed.
Comment 8 Elizabeth 2017-10-06 14:41:26 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.