Bug 93672 - [HSW] GPU HANG: ecode 7:0:0x85d77c1b, in Xorg [23231], reason: Ring hung, action: reset
Summary: [HSW] GPU HANG: ecode 7:0:0x85d77c1b, in Xorg [23231], reason: Ring hung, act...
Status: RESOLVED INVALID
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Ian Romanick
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-11 22:42 UTC by Lonni J Friedman
Modified: 2017-02-10 22:38 UTC (History)
3 users (show)

See Also:
i915 platform: HSW
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (3.01 MB, text/plain)
2016-01-11 22:42 UTC, Lonni J Friedman
Details
/var/log/Xorg.0.log with backtrace (26.63 KB, text/plain)
2016-01-11 22:43 UTC, Lonni J Friedman
Details
dmesg (89.52 KB, text/plain)
2016-01-11 22:45 UTC, Lonni J Friedman
Details
Additional crash dump (1.49 MB, text/plain)
2016-02-23 13:04 UTC, johannes.wienke
Details

Description Lonni J Friedman 2016-01-11 22:42:50 UTC
Created attachment 120967 [details]
/sys/class/drm/card0/error

Xorg seems to crash randomly.  In dmesg, I see:
[26286.989369] [drm] stuck on render ring
[26286.991670] [drm] GPU HANG: ecode 7:0:0x85d77c1b, in Xorg [23231], reason: Ring hung, action: reset
[26286.991675] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[26286.991677] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[26286.991679] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[26286.991680] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[26286.991682] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[26286.993372] drm/i915: Resetting chip after gpu hang


[netllama@netllama ~]$ rpm -qa | grep xorg
xorg-x11-server-Xorg-1.17.4-1.fc22.x86_64
xorg-x11-drv-vesa-2.3.2-20.fc22.x86_64
xorg-x11-font-utils-7.5-28.fc22.x86_64
xorg-x11-drv-intel-2.99.917-15.20150729.fc22.x86_64
xorg-x11-drv-synaptics-1.8.2-2.fc22.x86_64
xorg-x11-fonts-ISO8859-1-75dpi-7.5-14.fc22.noarch
xorg-x11-xauth-1.0.9-2.fc22.x86_64
xorg-x11-drv-vmware-13.0.2-8.20150211git8f0cf7c.fc22.x86_64
xorg-x11-fonts-Type1-7.5-14.fc22.noarch
xorg-x11-drv-ati-7.5.0-3.fc22.x86_64
xorg-x11-drv-wacom-0.29.0-2.fc22.x86_64
xorg-x11-server-common-1.17.4-1.fc22.x86_64
xorg-x11-drv-qxl-0.1.3-2.fc22.x86_64
xorg-x11-proto-devel-7.7-12.fc21.noarch
xorg-x11-drv-fbdev-0.4.3-20.fc22.x86_64
xorg-x11-xinit-1.3.4-8.fc22.x86_64
xorg-x11-fonts-ISO8859-1-100dpi-7.5-14.fc22.noarch
xorg-x11-utils-7.5-19.fc22.x86_64
xorg-x11-drv-nouveau-1.0.11-2.fc22.x86_64
abrt-addon-xorg-2.6.1-6.fc22.x86_64
xorg-x11-server-utils-7.7-17.fc22.x86_64
xorg-x11-xkb-utils-7.7-13.fc22.x86_64
xorg-x11-drv-vmmouse-13.0.99-1.fc22.x86_64
xorg-x11-drv-openchrome-0.3.3-14.fc22.x86_64
xorg-x11-drv-evdev-2.9.2-1.fc22.x86_64
[netllama@netllama ~]$ uname -a
Linux netllama 4.2.8-200.fc22.x86_64 #1 SMP Tue Dec 15 16:50:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Comment 1 Lonni J Friedman 2016-01-11 22:43:48 UTC
Created attachment 120968 [details]
/var/log/Xorg.0.log with backtrace
Comment 2 Lonni J Friedman 2016-01-11 22:45:04 UTC
$ lspci
00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09)
00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 09)
00:14.0 USB controller: Intel Corporation 8 Series USB xHCI HC (rev 04)
00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04)
00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 1 (rev e4)
00:1c.1 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 2 (rev e4)
00:1c.2 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 3 (rev e4)
00:1c.4 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 5 (rev e4)
00:1c.5 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 6 (rev e4)
00:1f.0 ISA bridge: Intel Corporation 8 Series LPC Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 8 Series SMBus Controller (rev 04)
02:00.0 Multimedia controller: Broadcom Corporation 720p FaceTime HD Camera
03:00.0 Network controller: Broadcom Corporation BCM4360 802.11ac Wireless Network Adapter (rev 03)
04:00.0 SATA controller: Samsung Electronics Co Ltd Apple PCIe SSD (rev 01)
05:00.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt [Falcon Ridge]
06:00.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt [Falcon Ridge]
06:03.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt [Falcon Ridge]
06:04.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt [Falcon Ridge]
06:05.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt [Falcon Ridge]
06:06.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt [Falcon Ridge]
07:00.0 System peripheral: Intel Corporation DSL5520 Thunderbolt [Falcon Ridge]
Comment 3 Lonni J Friedman 2016-01-11 22:45:42 UTC
Created attachment 120969 [details]
dmesg
Comment 4 johannes.wienke 2016-02-23 13:03:17 UTC
I was hit by the same issue today. I will attach my debug info. Maybe it helps.

Maybe this is also related to https://bugs.freedesktop.org/show_bug.cgi?id=83704
Comment 5 johannes.wienke 2016-02-23 13:04:38 UTC
Created attachment 121917 [details]
Additional crash dump
Comment 6 kurt 2016-05-24 02:15:14 UTC
I also am having this problem.  Nothing in my /sys/class/drm/card0/error file.

abby-HP-Compaq-dc7700-Small-Form-Factor log # lspci
00:00.0 Host bridge: Intel Corporation 82Q963/Q965 Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82Q963/Q965 Integrated Graphics Controller (rev 02)
00:03.0 Communication controller: Intel Corporation 82Q963/Q965 HECI Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DM Gigabit Network Connection (rev 02)
00:1a.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1d.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HO (ICH8DO) LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801H (ICH8 Family) 4 port SATA Controller [IDE mode] (rev 02)
abby-HP-Compaq-dc7700-Small-Form-Factor log # rpm -qa | grep xorg
abby-HP-Compaq-dc7700-Small-Form-Factor log # rpm -qa | grep -i xorg
abby-HP-Compaq-dc7700-Small-Form-Factor log # 


Not sure what else I'm supposed to add here.
Comment 7 yann 2016-09-15 12:10:21 UTC
(In reply to johannes.wienke from comment #4)
> I was hit by the same issue today. I will attach my debug info. Maybe it
> helps.
> 
> Maybe this is also related to
> https://bugs.freedesktop.org/show_bug.cgi?id=83704

Johannes, your issue is different from Lonni's one and the one from bug 83704. It may be fixed with commit f55ded764ce60f87463e33bfa3a32e2c44715581 which shipped as part of Mesa 10.6.0 in June 2015. So please update your Mesa driver. If this is still occurring then open a new bug.

We can see from the gpu crash dump that hung is happening in render ring batch with active head at 0x03a3818c, with 0x60020100 (CONSTANT_BUFFER) as IPEHR.

Batch extract (around 0x03a3818c):

0x03a38168:      0x78090005: 3DSTATE_VERTEX_ELEMENTS
0x03a3816c:      0x04400000:    buffer 0: valid, type 0x0040, src offset 0x0000 bytes
0x03a38170:      0x11130000:    (X, Y, Z, 1.0), dst offset 0x00 bytes
0x03a38174:      0x0485000c:    buffer 0: valid, type 0x0085, src offset 0x000c bytes
0x03a38178:      0x11230004:    (X, Y, 0.0, 1.0), dst offset 0x10 bytes
0x03a3817c:      0x04d80014:    buffer 0: valid, type 0x00d8, src offset 0x0014 bytes
0x03a38180:      0x12230008:    (X, 0.0, 0.0, 1.0), dst offset 0x20 bytes
0x03a38184:      0x60020100: CONSTANT_BUFFER: valid
0x03a38188:      0x025b0142:    offset: 0x025b0140, length: 192 bytes
0x03a3818c:      0x7b009004: 3DPRIMITIVE: tri list random
0x03a38190:      0x00000006:    vertex count
0x03a38194:      0x00000000:    start vertex
0x03a38198:      0x00000001:    instance count
0x03a3819c:      0x00000000:    start instance
0x03a381a0:      0x00000000:    index bias
Comment 8 yann 2016-09-15 12:19:21 UTC
(In reply to Lonni J Friedman from comment #0)
> Created attachment 120967 [details]
> /sys/class/drm/card0/error
> 

Lonni, I recommend you to update your system (kernel as well as Mesa driver) to benefit of all fix and enhancement done. Please re-test and confirm if this issue is still occurring or not.

In parallel, assigning to Mesa product (please let me know if I am mistaken with this GPU Hang).

From this error dump, hung is happening in render ring batch with active head at 0x63831560, with 0x7a000002 (PIPE_CONTROL) as IPEHR.

Batch extract (around 0x63831560):

0x63831534:      0x7b000005: 3DPRIMITIVE:
0x63831538:      0x0000000f:    rect list sequential
0x6383153c:      0x00003498:    vertex count
0x63831540:      0x0000497d:    start vertex
0x63831544:      0x00000001:    instance count
0x63831548:      0x00000000:    start instance
0x6383154c:      0x00000000:    index bias
0x63831550:      0x7a000002: PIPE_CONTROL
0x63831554:      0x00100002:    no write, cs stall, stall at scoreboard,
0x63831558:      0x00000000:
0x6383155c:      0x00000000:
0x63831560:      0x782f0000: 3DSTATE_SAMPLER_STATE_POINTERS_PS
0x63831564:      0x000018c0:    dword 1
0x63831568:      0x782a0000: 3DSTATE_BINDING_TABLE_POINTERS_PS
0x6383156c:      0x0000fde0:    dword 1

We can also note in the dump an error:
ERROR: 0x00000105
    TLB page fault error (GTT entry not valid)
    Invalid page directory entry error
    Cacheline containing a PD was marked as invalid
Comment 9 Lonni J Friedman 2016-09-15 14:00:04 UTC
Yann: sorry, but I'm unclear what you're asking me to do.  What does "update your system (kernel as well as Mesa driver)" actually mean?  Is there a specific version that you're asking me to try?
Comment 10 yann 2016-09-15 15:27:11 UTC
(In reply to Lonni J Friedman from comment #9)
> Yann: sorry, but I'm unclear what you're asking me to do.  What does "update
> your system (kernel as well as Mesa driver)" actually mean?  Is there a
> specific version that you're asking me to try?

From your kernel version name, it looks like to me that you are using Fedora 22, so the easiest way to start is to update your Fedora distribution to latest one.


Another way, is to update by yourself your kernel and Mesa driver, but you may first use the regular Fedora update before going to that way (especially if you are not confident here)

- Kernel repo: https://cgit.freedesktop.org/drm-intel/ (clone the drm-intel-nightly, use your current kernel .config (available in /boot folder) to build and then install your kernel & its modules)
- Mesa repo: https://cgit.freedesktop.org/mesa/mesa/ (clone the mesa-12.0.3 and you may follow the guide on http://www.mesa3d.org/install.html to build & install ;  note that you may needs to install some additional dependencies)
Comment 11 Lonni J Friedman 2016-09-15 16:07:59 UTC
In the 8 months since I submitted this bug, I've updated to the 'latest Fedora' many times, and this bug has never gone away.  Currently I'm running the following:

$ uname -a
Linux netllama 4.6.4-201.fc23.x86_64 #1 SMP Tue Jul 12 11:43:59 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ rpm -qa | grep xorg
xorg-x11-utils-7.5-20.fc23.x86_64
xorg-x11-drv-ati-7.6.1-3.20160215gitd41fccc.fc23.x86_64
xorg-x11-drv-openchrome-0.5.0-1.fc23.x86_64
xorg-x11-server-utils-7.7-17.fc23.x86_64
xorg-x11-proto-devel-7.7-18.fc23.noarch
xorg-x11-xauth-1.0.9-4.fc23.x86_64
xorg-x11-drv-vesa-2.3.2-23.fc23.x86_64
xorg-x11-fonts-ISO8859-1-75dpi-7.5-15.fc23.noarch
xorg-x11-drv-nouveau-1.0.12-3.fc23.x86_64
xorg-x11-xkb-utils-7.7-16.fc23.x86_64
xorg-x11-drv-vmmouse-13.1.0-2.fc23.x86_64
xorg-x11-drv-vmware-13.0.2-10.20150211git8f0cf7c.fc23.x86_64
xorg-x11-fonts-Type1-7.5-15.fc23.noarch
xorg-x11-drv-fbdev-0.4.3-23.fc23.x86_64
xorg-x11-xinit-1.3.4-10.fc23.x86_64
xorg-x11-server-Xorg-1.18.3-3.fc23.x86_64
xorg-x11-drv-qxl-0.1.4-6.fc23.x86_64
xorg-x11-drv-wacom-0.30.0-4.fc23.x86_64
xorg-x11-font-utils-7.5-29.fc23.x86_64
xorg-x11-fonts-ISO8859-1-100dpi-7.5-15.fc23.noarch
xorg-x11-drv-evdev-2.9.99-2.20150807git66c997886.fc23.x86_64
xorg-x11-drv-intel-2.99.917-19.20151206.fc23.x86_64
xorg-x11-drv-synaptics-1.8.3-1.fc23.x86_64
xorg-x11-server-common-1.18.3-3.fc23.x86_64
Comment 12 yann 2016-09-15 16:16:05 UTC
Can you also "run rpm -qa | grep mesa" ? it will be helpful for Mesa team
Comment 13 Lonni J Friedman 2016-09-15 16:19:11 UTC
$ rpm -qa | grep mesa
mesa-libEGL-11.1.0-4.20151218.fc23.i686
mesa-vdpau-drivers-11.1.0-4.20151218.fc23.x86_64
mesa-libGLES-devel-11.1.0-4.20151218.fc23.x86_64
mesa-libxatracker-11.1.0-4.20151218.fc23.x86_64
mesa-libGLU-9.0.0-9.fc23.x86_64
mesa-libgbm-devel-11.1.0-4.20151218.fc23.x86_64
mesa-libglapi-11.1.0-4.20151218.fc23.i686
mesa-libglapi-11.1.0-4.20151218.fc23.x86_64
mesa-libGL-devel-11.1.0-4.20151218.fc23.x86_64
mesa-libgbm-11.1.0-4.20151218.fc23.i686
mesa-libGL-11.1.0-4.20151218.fc23.x86_64
mesa-libGLU-devel-9.0.0-9.fc23.x86_64
mesa-libEGL-devel-11.1.0-4.20151218.fc23.x86_64
mesa-libGLES-11.1.0-4.20151218.fc23.x86_64
mesa-libGLU-9.0.0-9.fc23.i686
mesa-dri-drivers-11.1.0-4.20151218.fc23.x86_64
mesa-libgbm-11.1.0-4.20151218.fc23.x86_64
mesa-libwayland-egl-devel-11.1.0-4.20151218.fc23.x86_64
mesa-libwayland-egl-11.1.0-4.20151218.fc23.x86_64
mesa-libEGL-11.1.0-4.20151218.fc23.x86_64
mesa-libGL-11.1.0-4.20151218.fc23.i686
mesa-filesystem-11.1.0-4.20151218.fc23.x86_64
Comment 14 yann 2016-09-15 16:32:34 UTC
When you get again this hang, please attached the new gpu dump as well as kernel log (ie dmesg)
Comment 15 yann 2016-11-04 15:37:06 UTC
Please test a new version of Mesa (12 or 13) and mark as REOPENED
if you can reproduce and RESOLVED/* if you cannot reproduce.
Comment 16 Annie 2017-02-10 22:38:47 UTC
Dear Reporter,

This Mesa bug has been in the "NEEDINFO" status for over 60 days. I am closing this bug based on lack of response but feel free to reopen if resolution is still needed. Please ensure you're supplying the correct information as requested.

Thank you.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.