Bug 88908

Summary: [SNB/BYT bisected]igt/gem_reloc_vs_gpu/forked-thrashing-hang cause system hang.
Product: DRI Reporter: Ding Heng <hengx.ding>
Component: DRM/IntelAssignee: Mika Kuoppala <mika.kuoppala>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg after run forked-thrashing-hang none

Description Ding Heng 2015-02-02 05:56:35 UTC
Created attachment 113031 [details]
dmesg after run forked-thrashing-hang

==System Environment==
--------------------------
Regression: Yes.
Non-working platforms:  BYT

==kernel==
--------------------------
drm-intel-nightly/8b4216f91c7bf8d3459cadf9480116220bd6545e(2015-02-02)

==Bug detailed description==
-----------------------------
System hang a while after running this case, I can see alot of abnormal output in dmesg before it hang.
igt/gem_reloc_vs_gpu/forked-faulting-reloc-thrashing-hang
igt/gem_reloc_vs_gpu/forked-faulting-reloc-thrash-inactive-hang 
igt/gem_reloc_vs_gpu/forked-thrash-inactive-hang
igt/gem_reloc_vs_gpu/forked-thrashing-hang


[root@x-bdw01 tests]# time ./gem_reloc_vs_gpu --run-subtest forked-faulting-reloc-thrash-inactive-hang
IGT-Version: 1.9-g3214a27 (x86_64) (Linux: 3.19.0-rc4_drm-intel-nightly_95cce4_20150115+ x86_64)
^C^

==Reproduce steps==
---------------------------- 
1. ./gem_reloc_vs_gpu --run-subtest forked-faulting-reloc-thrash-inactive-hang
Comment 1 Ding Heng 2015-02-02 05:58:18 UTC
b8d24a06568368076ebd5a858a011699a97bfa42 is the first bad commit.
commit b8d24a06568368076ebd5a858a011699a97bfa42
Author:     Mika Kuoppala <mika.kuoppala@linux.intel.com>
AuthorDate: Wed Jan 28 17:03:14 2015 +0200
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Thu Jan 29 18:03:07 2015 +0100

    drm/i915: Remove nested work in gpu error handling
    
    Now when we declare gpu errors only through our own dedicated
    hangcheck workqueue there is no need to have a separate workqueue
    for handling the resetting and waking up the clients as the deadlock
    concerns are no more.
    
    The only exception is i915_debugfs::i915_set_wedged, which triggers
    error handling through process context. However as this is only used through
    test harness it is responsibility for test harness not to introduce hangs
    through both debug interface and through hangcheck mechanism at the same time.
    
    Remove gpu_error.work and let the hangcheck work do the tasks it used to.
    
    v2: Add a big warning sign into i915_debugfs::i915_set_wedged (Chris)
    
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 2 lu hua 2015-02-02 08:20:42 UTC
Run gem_reloc_vs_gpu*hang cases on SNB(HNR), It takes more than 10 minutes and doesn't exit testing, it has the same bisect commit.

Run time ./gem_reloc_vs_gpu --run-subtest forked-faulting-reloc-thrash-inactive-hang 
output(commit b8d24a):
IGT-Version: 1.9-g51d87b8 (x86_64) (Linux: 3.19.0-rc5_kcloud_b8d24a_20150202+ x86_64)

^C^C
real    11m17.503s
user    0m0.006s
sys     0m0.162s

output(commit 397f6f):
IGT-Version: 1.9-g51d87b8 (x86_64) (Linux: 3.19.0-rc5_kcloud_397f6f_20150202+ x86_64)
(gem_reloc_vs_gpu:4121) CRITICAL: Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
(gem_reloc_vs_gpu:4121) CRITICAL: Failed assertion: test == 0xdeadbeef
(gem_reloc_vs_gpu:4121) CRITICAL: mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
(gem_reloc_vs_gpu:4109) CRITICAL: Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
(gem_reloc_vs_gpu:4109) CRITICAL: Failed assertion: test == 0xdeadbeef
(gem_reloc_vs_gpu:4109) CRITICAL: mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
child 12 failed with exit status 99
Subtest forked-faulting-reloc-thrash-inactive-hang: FAIL (223.664s)

real    3m44.081s
user    0m0.030s
sys     0m1.857s
Comment 3 Mika Kuoppala 2015-02-03 13:52:15 UTC

*** This bug has been marked as a duplicate of bug 88928 ***
Comment 4 Mika Kuoppala 2015-02-03 13:54:27 UTC

*** This bug has been marked as a duplicate of bug 88933 ***
Comment 5 Elizabeth 2017-10-06 14:31:42 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.