Bug 83341

Summary: [HSW]igt/gem_reset_stats sporadically causes poweroff
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: christophe.prigent, intel-gfx-bugs
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: HSW i915 features: GEM/Other
Attachments:
Description Flags
dmesg
none
dmesg2 none

Description lu hua 2014-09-01 08:55:59 UTC
Created attachment 105541 [details]
dmesg

==System Environment==
--------------------------
Regression: not sure, unstable
Non-working platforms: HSW

==kernel==
--------------------------
drm-intel-nightly/6e9c5b9d428bb075293ec865ba58f90931187a48
drm-intel-fixes/bbe1c2740d3a25aa1dbe5d842d2ff09cddcdde0a
drm-intel-next-queued/c101c5b635bee54e43d0732473d2f80b2a0e00f4


==Bug detailed description==
It only happens one HSW machine with -nightly, -queued or -fixes kernel.
It happens on different subcase when run multiple round. 

[root@x-hsw27 ~]# lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Haswell DRAM Controller [8086:0c00] (rev 06)
00:01.0 PCI bridge [0604]: Intel Corporation Haswell PCI Express x16 Controller [8086:0c01] (rev 06)
00:02.0 VGA compatible controller [0300]: Intel Corporation Haswell Integrated Graphics Controller [8086:0412] (rev 06)
00:03.0 Audio device [0403]: Intel Corporation Haswell HD Audio Controller [8086:0c0c] (rev 06)
00:14.0 USB controller [0c03]: Intel Corporation Lynx Point USB xHCI Host Controller [8086:8c31] (rev 04)
00:16.0 Communication controller [0780]: Intel Corporation Lynx Point MEI Controller #1 [8086:8c3a] (rev 04)
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection I217-V [8086:153b] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation Lynx Point USB Enhanced Host Controller #2 [8086:8c2d] (rev 04)
00:1b.0 Audio device [0403]: Intel Corporation Lynx Point High Definition Audio Controller [8086:8c20] (rev 04)
00:1c.0 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #1 [8086:8c10] (rev d4)
00:1c.1 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #2 [8086:8c12] (rev d4)
00:1c.3 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #4 [8086:8c16] (rev d4)
00:1c.4 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #5 [8086:8c18] (rev d4)
00:1d.0 USB controller [0c03]: Intel Corporation Lynx Point USB Enhanced Host Controller #1 [8086:8c26] (rev 04)
00:1f.0 ISA bridge [0601]: Intel Corporation Lynx Point LPC Controller [8086:8c44] (rev 04)
00:1f.2 SATA controller [0106]: Intel Corporation Lynx Point 6-port SATA Controller 1 [AHCI mode] [8086:8c02] (rev 04)
00:1f.3 SMBus [0c05]: Intel Corporation Lynx Point SMBus Controller [8086:8c22] (rev 04)
03:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 01)
04:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev aa)
05:01.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev aa)
05:02.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev aa)
05:03.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev aa)
08:00.0 Network controller [0280]: Atheros Communications Inc. AR9462 Wireless Network Adapter [168c:0034] (rev 01)

output:
IGT-Version: 1.7-gd6af004 (x86_64) (Linux: 3.17.0-rc2_drm-intel-fixes_bbe1c2_20140901+ x86_64)
Subtest params: SUCCESS
Subtest params-ctx-render: SUCCESS
Subtest reset-stats-render: SUCCESS
Subtest reset-stats-ctx-render: SUCCESS
Subtest ban-render: SUCCESS
Subtest ban-ctx-render: SUCCESS
Subtest reset-count-render: SUCCESS
Subtest reset-count-ctx-render: SUCCESS
Subtest unrelated-ctx-render: SUCCESS
Subtest close-pending-render: SUCCESS
Subtest close-pending-ctx-render: SUCCESS
Subtest close-pending-fork-render: SUCCESS
Subtest close-pending-fork-reverse-render: SUCCESS
Test requirement not met in function __real_main1088, file gem_reset_stats.c:1128:
Test requirement: !(RING_HAS_CONTEXTS == false)
Subtest params-ctx-blt: SKIP


Reproduce steps:
---------------------------- 
1. ./gem_reset_stats
Comment 1 Mika Kuoppala 2014-11-06 12:30:09 UTC
Is the last line in the log always:
'Stopping rings 0xc0000004' ?
Comment 2 lu hua 2014-11-07 08:05:18 UTC
(In reply to Mika (In reply to Mika Kuoppala from comment #1)
> Is the last line in the log always:
> 'Stopping rings 0xc0000004' ?


I reproduce twice the hang. once the last line is as below:
[  136.904770] [drm:i915_ring_stop_set] Stopping rings 0xc0000004

once hang as below: 
output:
IGT-Version: 1.8-ge34240d (x86_64) (Linux: 3.18.0-rc3_drm-intel-nightly_e6b3eb_20141107+ x86_64)
Subtest params: SUCCESS (0.002s)
Subtest params-ctx-render: SUCCESS (0.001s)
Subtest reset-stats-render: SUCCESS (6.140s)
Subtest reset-stats-ctx-render: SUCCESS (5.996s)
Subtest ban-render: SUCCESS (16.001s)
Comment 3 lu hua 2014-11-07 08:07:14 UTC
Created attachment 109077 [details]
dmesg2

this cycle doesn't keep "Stopping rings 0xc0000004"
[   58.788774] gem_reset_stats: starting subtest ban-ctx-render
[   58.789340] [drm:i915_gem_open]
[   58.789919] [drm:i915_gem_open]
[   58.790456] [drm:i915_gem_context_create_ioctl] HW context 1 created
[   58.791006] [drm:i915_gem_context_create_ioctl] HW context 2 created
[   58.791660] [drm:i915_ring_stop_set] Stopping rings 0x80000001
[   60.781089] [drm:intel_print_rc6_info] Enabling RC6 states: RC6 on
[   60.785104] [drm:gen6_enable_rps] Overclocking supported. Max: 1250MHz, Overclock max: 1250MHz
[   64.776617] [drm] stuck on render ring
[   64.777852] [drm] GPU HANG: ecode 0:0xe757ffff, in gem_reset_stats [4130], reason: Ring hung, action: reset
[   64.778547] [drm:i915_error_work_func] resetting chip
[   64.780645] [drm] Simulated gpu hang, resetting stop_rings
[   64.781337] drm/i915: Resetting chip after gpu hang
[   64.782045] [drm:init_status_page] render ring hws offset: 0x001a1000
[   64.784654] [drm:init_status_page] bsd ring hws offset: 0x001c3000
[   64.785417] [drm:init_status_page] blitter ring hws offset: 0x001e4000
[   64.786194] [drm:init_status_page] video enhancement ring hws offset: 0x00205000
[   64.787052] [drm:i915_ring_stop_set] Stopping rings 0x80000001
[   66.778421] [drm:intel_print_rc6_info] Enabling RC6 states: RC6 on
[   66.781311] [drm:gen6_enable_rps] Overclocking supported. Max: 1250MHz, Overclock max: 1250MHz
Comment 4 Jani Nikula 2015-10-23 10:00:55 UTC
Timeout, closing. Please reopen if the problem persists with latest kernels.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.