Bug 87429 - [SNB/IVB bisected]igt/gem_reset_stats/ban-ctx-render causes system hang
Summary: [SNB/IVB bisected]igt/gem_reset_stats/ban-ctx-render causes system hang
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high critical
Assignee: Imre Deak
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords: bisected, regression
Depends on:
Blocks:
 
Reported: 2014-12-18 03:33 UTC by Guo Jinxian
Modified: 2016-12-13 08:51 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
fix rps interrupt disabling (3.27 KB, patch)
2014-12-18 17:05 UTC, Imre Deak
no flags Details | Splinter Review

Description Guo Jinxian 2014-12-18 03:33:46 UTC
==System Environment==
--------------------------
Regression: No(Only failed on nightly branch).

Non-working platforms: SNB

==kernel==
--------------------------
origin/drm-intel-nightly: 2014_12_17(fails)
origin/drm-intel-next-queued:e2c719b75c8c186deb86570d8466df9e9eff919b(works)
    drm/i915: tame the chattermouth (v2)
origin/drm-intel-fixes: b0616c5306b342ceca07044dbc4f917d95c4f825(works)
    drm/i915: Unlock panel even when LVDS is disabled
origin/drm-next: 4e0cd68115620bc3236ff4e58e4c073948629b41(works)
    drm: sti: fix module compilation issue
origin/drm-fixes: 3e3282c0a23d8eb9438dcf4ac908a5eb48c7038b(works)
    Merge tag 'drm-intel-fixes-2014-12-04' of git://anongit.freedesktop.org/drm-intel into drm-fixes

==Bug detailed description==
-----------------------------
igt/gem_reset_stats/ban-ctx-render causes system hang, the failure only occurs on nightly branch. Because system hang, unable to catch dmesg.

==Reproduce steps==
---------------------------- 
1. ./gem_reset_stats --run-subtest ban-render
Comment 1 Ander Conselvan de Oliveira 2014-12-18 12:23:36 UTC
I bisected this to

commit dbea3cea69508e9d548ed4a6be13de35492e5d15
Author: Imre Deak <imre.deak@intel.com>
Date:   Mon Dec 15 18:59:28 2014 +0200

    drm/i915: sanitize RPS resetting during GPU reset

Reverting seems to prevent the system hang.
Comment 2 Jani Nikula 2014-12-18 12:42:25 UTC
Imre, any ideas before I queue a revert?
Comment 3 Mika Kuoppala 2014-12-18 16:51:24 UTC
This test hangs the gpu twice in quick succession. So it also resets the gpu twice in quick succession. Perhaps there is ordering issue in the delayed enabling of rps
Comment 4 Imre Deak 2014-12-18 17:05:42 UTC
Created attachment 110995 [details] [review]
fix rps interrupt disabling

Could you try the attached patch?
Comment 5 Chris Wilson 2014-12-18 17:09:38 UTC
Also it can be prone to any locking issue with hangcheck/capture vs render.
Comment 6 lu hua 2014-12-19 07:15:00 UTC
(In reply to Imre Deak from comment #4)
> Created attachment 110995 [details] [review] [review]
> fix rps interrupt disabling
> 
> Could you try the attached patch?

Fixed by this patch.
Comment 7 lu hua 2014-12-22 01:36:48 UTC
Run ./gem_reset_stats --run-subtest ban-render, system also hang.
output:
IGT-Version: 1.9-gc537cdb (x86_64) (Linux: 3.18.0_drm-intel-nightly_4fa231_20141221+ x86_64)

dmesg:
[  100.747004] console [netcon0] enabled
[  100.748232] netconsole: network logging started
[  100.755058] console [netcon0] disabled
[  100.773377] console [netcon0] enabled
[  100.774640] netconsole: network logging started
[  103.062178] NET: Registered protocol family 10
[  120.331211] gem_reset_stats: executing
[  120.332608] [drm:i915_gem_open]
[  120.334171] [drm:i915_gem_open]
[  120.335400] [drm:i915_gem_context_create_ioctl] HW context 1 created
[  120.336645] [drm:i915_gem_context_destroy_ioctl] HW context 1 destroyed
[  120.337881] [drm:i915_gem_open]
[  120.339167] gem_reset_stats: starting subtest ban-render
[  120.340412] [drm:i915_gem_open]
[  120.341937] [drm:i915_gem_open]
[  120.343143] [drm:i915_gem_open]
[  120.344621] [drm:i915_ring_stop_set] Stopping rings 0x80000001
[  130.802652] [drm] stuck on render ring
[  130.804371] [drm] GPU HANG: ecode 6:0:0xe77fffff, in gem_reset_stats [4232], reason: Ring hung, action: reset
[  130.805614] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  130.806883] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  130.808172] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  130.809527] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  130.810894] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  130.812300] [drm:i915_error_work_func] resetting chip
[  130.815362] [drm] Simulated gpu hang, resetting stop_rings
[  130.816704] drm/i915: Resetting chip after gpu hang
[  130.818088] [drm:ironlake_update_primary_plane] Writing base 001F3000 00000000 0 0 6400
[  130.819566] [drm:i915_ring_stop_set] Stopping rings 0x80000001
Comment 8 lu hua 2014-12-22 06:09:10 UTC
Run ./gem_reset_stats --run-subtest ban-blt on IVB, it also causes system hang.
Comment 9 Jani Nikula 2014-12-30 13:04:05 UTC
Fixed by

commit 917d45309f0ee13da6eb9ea215f2c7f19ac3817f
Author: Imre Deak <imre.deak@intel.com>
Date:   Fri Dec 19 19:33:26 2014 +0200

    drm/i915: fix HW lockup due to missing RPS IRQ workaround on GEN6

in drm-intel-fixes.
Comment 10 lu hua 2015-01-04 03:03:43 UTC
Verified.Fixed.
Comment 11 Jari Tahvanainen 2016-12-13 08:51:39 UTC
Closing verified+fixed. See commit 479459a8


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.