Bug 53626

Summary: [SNB regression rc6]Oglc depth-stencil(basic.read.ds) causes system hang on X mode
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: ben, chris, daniel, jbarnes, mikhail.v.gavrilov, xunx.fang
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
bisect log none

Description lu hua 2012-08-17 06:32:16 UTC
System Environment:
--------------------------
Arch:           i386
Platform:       Sandybridge
Libdrm:	(master)libdrm-2.4.38-3-g3163cfe4db925429760407e77140e2d595338bc2
Mesa:	(master)1597176f7090eea73f41b3114ae2a02a50ac7a12
Xserver:(master)xorg-server-1.12.99.904-7-gad5fe2d9614959b68bf71e23abf7e5abac9c2734
Xf86_video_intel:(master)2.20.3-45-g94871944a0e1351273d6029df7bf0300f31a8571
Libva:	(staging)f12f80371fb534e6bbf248586b3c17c298a31f4e
Libva_intel_driver:(staging)82fa52510a37ab645daaa3bb7091ff5096a20d0b
Kernel:	(drm-intel-next-queued) 3d21b86ca4bd5350c9e095db7d874c5e499f76d6

Bug detailed description:
-------------------------
It happens on -queued kernel,It doesn't happen on -fixes kernel.

The last known good commit: ab3951eb74e7c33a2f5b7b64d72e82f1eea61571
The last known bad commit: 3d21b86ca4bd5350c9e095db7d874c5e499f76d6

netconsole:
[  908.117758] console [netcon0] enabled
[  908.117775] netconsole: network logging started
[  940.199761] [drm:__gen6_gt_force_wake_get] *ERROR* Force wake wait timed out
[  940.243207] [drm:__gen6_gt_wait_for_thread_c0] *ERROR* GT thread status wait timed out

Reproduce steps:
----------------
1. start X
2. ./oglconform -z -suite all -v 2 -test depth-stencil basic.read.ds
Comment 1 Daniel Vetter 2012-08-22 08:59:37 UTC
Is this still an issue on latest -fixes/-queued?
Comment 2 lu hua 2012-08-23 07:15:27 UTC
It still happens on -queued kernel cee4ab0284fac1c6da5997802cf2d826898da316.
Comment 3 Daniel Vetter 2012-08-28 09:14:54 UTC
I guess we need the bisect for this one, no idea what blows up here. But since this is a hang, please make sure first that the baseline really is good.
Comment 4 lu hua 2012-08-31 07:48:41 UTC
Bisecting: a merge base must be tested
[6b16351acbd415e66ba16bf7d473ece1574cf0bc] Linux 3.5-rc4

The merge base 6b16351acbd415e66ba16bf7d473ece1574cf0bc is bad.
This means the bug has been fixed between 6b16351acbd415e66ba16bf7d473ece1574cf0bc and [20d5a540e55a29daeef12706f9ee73baf5641c16].
Comment 5 lu hua 2012-08-31 07:49:21 UTC
Created attachment 66386 [details]
bisect log
Comment 6 Chris Wilson 2012-09-15 09:30:21 UTC
Worksforme on dinq with the i965_dri.so blorp fixes.
Comment 7 lu hua 2012-09-17 07:15:16 UTC
It still happens on -queued kernel (commit:a0db295dcd040cf842567bc).
Comment 8 Chris Wilson 2012-10-17 21:49:54 UTC
This is another bug that should be retested with drm-intel-testing in case it is the GT_MODE WiZ w/a.
Comment 9 lu hua 2012-11-13 08:49:40 UTC
It still happens on the latest -fixes branch (commit: 4a8dece21eea0ad6aca442272673d48693cd93b4) and -queued branch(commi b343cd948b193b414c541755b2fdd40c7f01cdd0).
It happens on X mode, works well on gnome-session.

Many Oglconform_31 cases also have this issue.

I run following cases, them also have this issue
primbuff(advanced.trianglestrip.hivtx)
fbo(mrt.renderToRbFragShaderRbSizes)
snorm-textures(advanced.mipmap.manual.getTex)

netconsole:
[  230.226514] netconsole: network logging started
[  253.370594] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake to ack request.
[  253.372039] [drm:__gen6_gt_wait_for_thread_c0] *ERROR* GT thread status wait timed out
Comment 10 Daniel Vetter 2012-11-13 10:14:12 UTC
Hm, looks like we've never checked the usual snb suspects:
- Does disabling rc6 work around the issue?

- Does disabling hw contexts work around the issue? The simplest way is with the below patch snippet:

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 0e510df..f04d90f 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -244,7 +244,7 @@ void i915_gem_context_init(struct drm_device *dev)
        struct drm_i915_private *dev_priv = dev->dev_private;
        uint32_t ctx_size;
 
-       if (!HAS_HW_CONTEXTS(dev)) {
+       if (1) {
                dev_priv->hw_contexts_disabled = true;
                return;
        }
Comment 11 lu hua 2012-11-14 07:01:42 UTC
Disable RC6, It works well.
Disable hw contexts, this issue still exists.
Comment 12 mikhail.v.gavrilov 2012-12-09 11:07:00 UTC
I am also have in dmesg output:

[60458.774119] [drm:__gen6_gt_force_wake_get] *ERROR* Force wake wait timed out
[61929.498381] [drm:__gen6_gt_force_wake_get] *ERROR* Force wake wait timed out


$ lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset Family LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
02:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 01)
03:01.0 Multimedia audio controller: VIA Technologies Inc. VT1720/24 [Envy24PT/HT] PCI Multi-Channel Audio Controller (rev 01)
04:00.0 Ethernet controller: Atheros Communications Inc. AR8151 v2.0 Gigabit Ethernet (rev c0)
05:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
06:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
Comment 13 Chris Wilson 2012-12-09 14:03:20 UTC
If the conclusion is that it is not one of the many bugs mesa introduced with depth-stencils, then we must try it as a duplicate of the rc6 dropped mmio writes.

*** This bug has been marked as a duplicate of bug 50545 ***
Comment 14 Elizabeth 2017-10-06 14:48:36 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.