Bug 88933 - [all Bisected]igt/gem_reset_stats doesn't exit testing
Summary: [all Bisected]igt/gem_reset_stats doesn't exit testing
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high major
Assignee: Mika Kuoppala
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 88874 88908 88915 88916 88928 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-02-03 06:38 UTC by lu hua
Modified: 2017-08-14 08:36 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (120.59 KB, text/plain)
2015-02-03 06:38 UTC, lu hua
no flags Details
drm/i915: Don't bail out early on i915_handle_error (1.21 KB, patch)
2015-02-03 12:26 UTC, Mika Kuoppala
no flags Details | Splinter Review
drm/i915: Remove bogus locking check in the hangcheck code (1.69 KB, patch)
2015-02-03 13:25 UTC, Mika Kuoppala
no flags Details | Splinter Review
dmesg_new (18.76 KB, text/plain)
2015-02-11 03:04 UTC, Ding Heng
no flags Details

Description lu hua 2015-02-03 06:38:25 UTC
Created attachment 113086 [details]
dmesg

==System Environment==
--------------------------
Regression: Yes

no-working platforms: all

==kernel==
--------------------------
drm-intel-nightly/8b4216f91c7bf8d3459cadf9480116220bd6545e
commit 8b4216f91c7bf8d3459cadf9480116220bd6545e
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sat Jan 31 17:46:32 2015 +0100

    drm-intel-nightly: 2015y-01m-31d-16h-46m-12s UTC integration manifest

==Bug detailed description==
-----------------------------
It takes more than 10 minutes and doesn't exit testing on all platforms with drm-intel-nightly and drm-intel-next-queued kernel, works well on drm-intel-fixes kernel.

run ./gem_reset_stats --run-subtest ban-blt output:
IGT-Version: 1.9-g51d87b8 (x86_64) (Linux: 3.19.0-rc5_kcloud_b8d24a_20150202+ x86_64)
^C(gem_reset_stats:4109) drmtest-WARNING: Warning on condition flags != 0 in fucntion check_stop_rings, file drmtest.c:112
(gem_reset_stats:4109) drmtest-WARNING: i915_ring_stop flags on exit 0x80000004, can't quiescent gpu cleanly

real    11m57.087s
user    0m0.005s
sys     0m0.012s

dmesg:
[   94.782567] WARNING: CPU: 3 PID: 1057 at drivers/gpu/drm/i915/i915_irq.c:2615 i915_handle_error+0x54/0x5b0 [i915]()
[   94.782606] WARN_ON(mutex_is_locked(&dev_priv->dev->struct_mutex))
[   94.782630] Modules linked in:
[   94.782647]  dm_mod snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic iTCO_wdt iTCO_vendor_support ppdev pcspkr serio_raw uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm firewire_ohci firewire_core crc_itu_t lpc_ich mfd_core snd_timer snd soundcore wmi parport_pc parport tpm_infineon tpm_tis tpm battery ac acpi_cpufreq joydev i915 button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea
[   94.782916] CPU: 3 PID: 1057 Comm: kworker/u16:5 Not tainted 3.19.0-rc5_kcloud_b8d24a_20150202+ #26
[   94.782950] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.22 12/22/2011
[   94.782994] Workqueue: i915-hangcheck i915_hangcheck_elapsed [i915]
[   94.783021]  0000000000000000 0000000000000009 ffffffff8178902f ffff880137c83cc8
[   94.783059]  ffffffff8103bc4a 0000000000000246 ffffffffa00b429f 00000000000006f6
[   94.783097]  ffff880133465300 ffff880002908ad0 ffff880002eae800 ffff880002c2da00
[   94.783135] Call Trace:
[   94.783150]  [<ffffffff8178902f>] ? dump_stack+0x40/0x50
[   94.783174]  [<ffffffff8103bc4a>] ? warn_slowpath_common+0x98/0xb0
[   94.783210]  [<ffffffffa00b429f>] ? i915_handle_error+0x54/0x5b0 [i915]
[   94.783257]  [<ffffffff8103bca7>] ? warn_slowpath_fmt+0x45/0x4a
[   94.783312]  [<ffffffffa00b429f>] ? i915_handle_error+0x54/0x5b0 [i915]
[   94.783361]  [<ffffffff81785e3c>] ? printk+0x48/0x4d
[   94.783405]  [<ffffffffa00b4b5e>] ? i915_hangcheck_elapsed+0x325/0x3bf [i915]
[   94.783435]  [<ffffffff8104c923>] ? process_one_work+0x1b2/0x314
[   94.783461]  [<ffffffff8104d07e>] ? worker_thread+0x24d/0x339
[   94.783485]  [<ffffffff8104ce31>] ? cancel_delayed_work_sync+0xa/0xa
[   94.783511]  [<ffffffff81050901>] ? kthread+0xce/0xd6
[   94.783532]  [<ffffffff81050833>] ? kthread_create_on_node+0x162/0x162
[   94.783560]  [<ffffffff8178e8ac>] ? ret_from_fork+0x7c/0xb0
[   94.783584]  [<ffffffff81050833>] ? kthread_create_on_node+0x162/0x162
[   94.783610] ---[ end trace c20f2077c4395952 ]---


Bisect shows: b8d24a06568368076ebd5a858a011699a97bfa42 is the first bad commit.
commit b8d24a06568368076ebd5a858a011699a97bfa42
Author:     Mika Kuoppala <mika.kuoppala@linux.intel.com>
AuthorDate: Wed Jan 28 17:03:14 2015 +0200
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Thu Jan 29 18:03:07 2015 +0100

    drm/i915: Remove nested work in gpu error handling

    Now when we declare gpu errors only through our own dedicated
    hangcheck workqueue there is no need to have a separate workqueue
    for handling the resetting and waking up the clients as the deadlock
    concerns are no more.

    The only exception is i915_debugfs::i915_set_wedged, which triggers
    error handling through process context. However as this is only used through
    test harness it is responsibility for test harness not to introduce hangs
    through both debug interface and through hangcheck mechanism at the same time.

    Remove gpu_error.work and let the hangcheck work do the tasks it used to.

    v2: Add a big warning sign into i915_debugfs::i915_set_wedged (Chris)

    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

==Reproduce steps==
---------------------------- 
1.time ./gem_reset_stats --run-subtest ban-blt
Comment 1 Mika Kuoppala 2015-02-03 12:26:48 UTC
Created attachment 113105 [details] [review]
drm/i915: Don't bail out early on i915_handle_error
Comment 2 Mika Kuoppala 2015-02-03 13:25:00 UTC
Created attachment 113109 [details] [review]
drm/i915: Remove bogus locking check in the hangcheck code
Comment 3 Mika Kuoppala 2015-02-03 13:54:14 UTC
*** Bug 88915 has been marked as a duplicate of this bug. ***
Comment 4 Mika Kuoppala 2015-02-03 13:54:27 UTC
*** Bug 88908 has been marked as a duplicate of this bug. ***
Comment 5 Mika Kuoppala 2015-02-03 13:55:52 UTC
*** Bug 88928 has been marked as a duplicate of this bug. ***
Comment 6 Mika Kuoppala 2015-02-03 18:23:40 UTC
*** Bug 88874 has been marked as a duplicate of this bug. ***
Comment 7 lu hua 2015-02-04 03:30:31 UTC
(In reply to Mika Kuoppala from comment #2)
> Created attachment 113109 [details] [review] [review]
> drm/i915: Remove bogus locking check in the hangcheck code

Fixed by this patch.
Comment 8 Jani Nikula 2015-02-04 12:54:39 UTC
*** Bug 88916 has been marked as a duplicate of this bug. ***
Comment 9 Mika Kuoppala 2015-02-09 08:29:51 UTC
commit b838cbee0d6f0234406e435032b2304f3d05515d
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Feb 3 11:45:40 2015 +0100

    drm/i915: Remove bogus locking check in the hangcheck code
Comment 10 Ding Heng 2015-02-11 03:03:46 UTC
(In reply to Mika Kuoppala from comment #9)
> commit b838cbee0d6f0234406e435032b2304f3d05515d
Author: Daniel Vetter
> <daniel.vetter@ffwll.ch>
Date:   Tue Feb 3 11:45:40 2015 +0100

   
> drm/i915: Remove bogus locking check in the hangcheck code

I tested this commit on BDW. Case will cause system hang on this commit. Please refer to the dmesg_new.txt file.
Comment 11 Ding Heng 2015-02-11 03:04:14 UTC
Created attachment 113335 [details]
dmesg_new
Comment 12 Chris Wilson 2015-03-10 09:02:14 UTC
That's not this bug. That's an execlists specific bug.
Comment 13 lu hua 2015-03-11 05:44:39 UTC
root@x-bsw01:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_reset_stats --run-subtest ban-blt
IGT-Version: 1.9-g07be8fe (x86_64) (Linux: 4.0.0-rc3_drm-intel-nightly_c09a3b_20150310+ x86_64)
Subtest ban-blt: SUCCESS (11.567s)
Comment 14 Jari Tahvanainen 2017-08-14 08:36:40 UTC
Moving old bug from Verified to Closed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.