Bug 90384 - [drm] [SNB] Render ring stuck on waiting vblank
Summary: [drm] [SNB] Render ring stuck on waiting vblank
Status: CLOSED WONTFIX
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-09 23:59 UTC by Martin Mokrejs
Modified: 2017-03-03 16:43 UTC (History)
1 user (show)

See Also:
i915 platform: SNB
i915 features: GPU hang


Attachments
dmesg-3.19.0 (95.49 KB, text/plain)
2015-05-10 00:02 UTC, Martin Mokrejs
no flags Details
/sys/class/drm/card0/error (2.04 MB, text/plain)
2015-05-10 00:04 UTC, Martin Mokrejs
no flags Details
Xorg.0.log (25.54 KB, text/plain)
2015-05-10 00:06 UTC, Martin Mokrejs
no flags Details
.config (96.17 KB, text/plain)
2015-05-10 00:06 UTC, Martin Mokrejs
no flags Details
.config-3.10.12 (no GPU HANGs) (88.56 KB, text/plain)
2015-05-11 09:01 UTC, Martin Mokrejs
no flags Details

Description Martin Mokrejs 2015-05-09 23:59:24 UTC
[ 9255.606859] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue
[ 9255.606923] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 9255.606925] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 9255.606927] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 9255.606928] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 9255.606930] [drm] GPU crash dump saved to /sys/class/drm/card0/error


$ uname -a
Linux vostro 3.19.0-default-pciehp #2 SMP Fri Feb 20 14:36:49 CET 2015 x86_64 Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz GenuineIntel GNU/Linux
$
Comment 1 Martin Mokrejs 2015-05-10 00:02:25 UTC
Created attachment 115662 [details]
dmesg-3.19.0

On a CPU loaded laptop I have now blinking external LCD screen (via HDMI). I did not have these issues with 3.10.72. ;-)

The eSATA error I think are related also to this kernel, the eSATA port is combined with USB2.0 port and handled by same Intel chip. This is Dell Vostro 3550 with A12 BIOS (latest).
Comment 2 Martin Mokrejs 2015-05-10 00:04:00 UTC
Created attachment 115663 [details]
/sys/class/drm/card0/error
Comment 3 Martin Mokrejs 2015-05-10 00:06:02 UTC
Created attachment 115664 [details]
Xorg.0.log

Just to provide more details (probably irrelevant to the kernel issue).
Comment 4 Martin Mokrejs 2015-05-10 00:06:57 UTC
Created attachment 115665 [details]
.config
Comment 5 Mika Kuoppala 2015-05-11 08:07:50 UTC
0x01214024:      0x01800100: MI_WAIT_FOR_EVENT, plane B scan line wait
0x01214028:      0x11000001: MI_LOAD_REGISTER_IMM
0x0121402c:      0x00002050:    dword 1
0x01214030:      0x00010000:    dword 2
0x01214034:      0x11000001: MI_LOAD_REGISTER_IMM

Missed vblank was kicked.

Is this happening often or one time occurrence?
Comment 6 Martin Mokrejs 2015-05-11 08:54:02 UTC
I used for various reasons so far a lot 3.10.72. I tried 3.19.0 now but could be I did NOT use all of its features in the past, especially as the DRI/DRM code is changing and I did not pay attention to all new features as this laptop has only the built-in intel graphics. So, maybe I did not have enabled some feature in the 3.10.x kernels.

I suspect an issue in 3.19.0 kernel. I did observe in the vary past with overheated CPU that external LCD (via HDMI) blinks. Also the keyboard under-light turns on and off on its own (the ACPI implementation is not ideal in Linux). Also, while closing down the LCD screen of the laptop makes the externally connected LCD (via HDMI) to blink as well (this does not happen with MS Win7). I conclude some ACPI events are not interpreted well in Linux.

I do use a lot eSATA port (actually this is eSATA/USB2.0 combined socket) which is hooked to the same SandyBridge chip as the HDMI socket and another USB2.0 socket. I run continually CPU-intensive tasks while accessing the eSATA drive. That worked so far quite well in 3.10.x.

The weird thing with 3.19.0 is that while having connected external eSATA drive and with the CPU loaded fully (2 threads, HT disabled on my phys. 2-core CPU) I observe RESETS of the SATA connection (every few seconds so that it killed the drive I think as its heads were shaking horribly). That SATA reset co-incidentally results in HDMI LCD screen blink and also, the USB2.0 connection being reset (same device removed and re-enumerated, actually a mouse connected to the socket). So, with 3,.19.0 I just cannot use my eSATA port if the CPU is loaded (it work fine until I heat the CPU).

 
# lspci -tv
-[0000:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM Controller
           +-02.0  Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller
           +-16.0  Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1
           +-1a.0  Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2
           +-1b.0  Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller
           +-1c.0-[03-04]--
           +-1c.1-[05-06]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           +-1c.3-[09-0a]----00.0  Intel Corporation Centrino Wireless-N 1030 [Rainbow Peak]
           +-1c.4-[0b-0c]----00.0  Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller
           +-1c.7-[11-16]----00.0  Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller
           +-1d.0  Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1
           +-1f.0  Intel Corporation HM67 Express Chipset Family LPC Controller
           +-1f.2  Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller
           \-1f.3  Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller
#


I barely remember observing that the eSATA resets trigger the USB2.0 link reset with past kernels (based on my report at about 3.4 kernel there was increases SATA reset timeout because 3.5" drive do ot spin up quickly enough and kernel was too eager to decrease SATA speed and then even reset), but I never investigated that and the SATA resets did not simply happen with 3.10.12 and 3.10.72. Here is the original report: http://comments.gmane.org/gmane.linux.usb.general/61393 You can see the USB2.0 port was really being reset in the past as well, in conjunction with eSATA resets. Thgat time it seemed it was external USB hub issue but ... it happens even with just a mouse.



However, the 'GPU HANG' happened right after bootup, and there was one more during the next days:

# dmesg | grep GPU
[ 9255.606859] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue
[ 9255.606923] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 9255.606930] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[76352.762263] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue
#


The screen blinking itself is not causing the GPU hang. When the CPU is loaded and eSATA drive connected, I get sometimes maybe 10 screen blinks in a minute. But there were only two GPU HANGs as you could see per dmesg above. There were no messages related to the second 'GPU HANG' and the card0/error file contains same dump as of now (compared by md5sum).




BTW: This is my 4th mainboard in the laptop and second CPU. Dell tried to "fix" my issues by replacing the mainboard all the time but I really had a bad CPU (HDMI hardly ever giving signal). There is some bad design in the SandyBridge. But from my past reading of the https://blogs.intel.com/technology/2011/01/chipset_design_flaw/ it seemed to me I have the eSATA port on a different PCIe port, so should not suffer the 'degrade over time' issue.
Comment 7 Martin Mokrejs 2015-05-11 09:01:16 UTC
Created attachment 115695 [details]
.config-3.10.12 (no GPU HANGs)

Just in case I did not have enabled some DRI/DRM code in the past kernels.
Comment 8 yann 2017-02-24 08:14:10 UTC
We seem to have neglected the bug a bit, apologies.

Martin Mokrejs, since There were improvements pushed in kernel that will benefit to your system, so please re-test with latest kernel and mark as REOPENED if you can reproduce (and attach fresh gpu error dump & kernel log) and RESOLVED/* if you cannot reproduce.
Comment 9 yann 2017-03-03 16:43:29 UTC
(In reply to yann from comment #8)
> We seem to have neglected the bug a bit, apologies.
> 
> Martin Mokrejs, since There were improvements pushed in kernel that will
> benefit to your system, so please re-test with latest kernel and mark as
> REOPENED if you can reproduce (and attach fresh gpu error dump & kernel log)
> and RESOLVED/* if you cannot reproduce.

Timeout. Assuming that this is not occurring anymore. If this issue happens again, re-test with latest kernel and REOPEN if you can reproduce (and attach fresh gpu error dump & kernel log)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.