Bug 104423 - [kbl] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [737]
Summary: [kbl] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [737]
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-31 14:28 UTC by Petr Manek
Modified: 2018-04-11 16:07 UTC (History)
2 users (show)

See Also:
i915 platform: I915G
i915 features: display/eDP


Attachments
Appropriate part of dmesg (926 bytes, text/plain)
2017-12-31 14:28 UTC, Petr Manek
Details
Contents of /sys/class/drm/card0/error (48.88 KB, text/plain)
2017-12-31 14:29 UTC, Petr Manek
Details
Contents of /sys/class/drm/card0/error for the 2nd hang (63.76 KB, text/plain)
2018-01-06 12:57 UTC, Petr Manek
Details
Appropriate part of dmesg for the 2nd hang (926 bytes, text/plain)
2018-01-06 12:58 UTC, Petr Manek
Details
Contents of /sys/class/drm/card0/error for the 3rd hang (62.55 KB, text/plain)
2018-01-10 14:26 UTC, Petr Manek
Details
Appropriate part of dmesg for the 3rd hang (915 bytes, text/plain)
2018-01-10 14:27 UTC, Petr Manek
Details
Contents of /sys/class/drm/card0/error for the 4th hang (47.07 KB, text/plain)
2018-01-17 22:34 UTC, Petr Manek
Details
Appropriate part of dmesg for the 4th hang (915 bytes, text/plain)
2018-01-17 22:35 UTC, Petr Manek
Details
Contents of /sys/class/drm/card0/error for the 5th hang (47.08 KB, text/plain)
2018-01-18 00:54 UTC, Petr Manek
Details
Appropriate part of dmesg for the 5th hang (321 bytes, text/plain)
2018-01-18 00:55 UTC, Petr Manek
Details
Appropriate part of dmesg for Rainbert's 1st hang (1.23 KB, text/plain)
2018-01-20 00:25 UTC, Rainbert
Details
Contents of /sys/class/drm/card0/error for Rainbert's 1st hang (48.68 KB, text/plain)
2018-01-20 00:26 UTC, Rainbert
Details

Description Petr Manek 2017-12-31 14:28:40 UTC
Created attachment 136460 [details]
Appropriate part of dmesg

I get this 1-2/day after a brand new archlinux install. When the hang happens, the Xorg display is frozen until I'm kicked out of my session after ~30 seconds.

I don't know yet how to reproduce it. The hang usually happens after several hours of usage with several X windows open.

Some info about my system:

:: Kernel: 4.14.8-1-ARCH
:: Distro: archlinux
:: Model: LENOVO ThinkPad T570 20H90052MC
:: BIOS: N1VET37W (1.27)

The crash report and relevant dmesg output is attached.
Comment 1 Petr Manek 2017-12-31 14:29:18 UTC
Created attachment 136461 [details]
Contents of /sys/class/drm/card0/error
Comment 2 Elizabeth 2018-01-05 22:34:23 UTC
Hello Petr, could you please share your Mesa version? Thanks.
Comment 3 Petr Manek 2018-01-06 10:05:12 UTC
Hi Elizabeth, my Mesa version is 17.3.1.
Comment 4 Petr Manek 2018-01-06 12:57:40 UTC
Created attachment 136581 [details]
Contents of /sys/class/drm/card0/error for the 2nd hang
Comment 5 Petr Manek 2018-01-06 12:58:51 UTC
Created attachment 136582 [details]
Appropriate part of dmesg for the 2nd hang
Comment 6 Petr Manek 2018-01-06 13:01:15 UTC
Just had another hang. This time, I have executed a MATLAB command, which produced high volume of ASCII letters in my shell, just before the screen froze. The contents of dmesg and crash dump have been attached.

Petr.
Comment 7 Petr Manek 2018-01-10 14:26:46 UTC
Created attachment 136650 [details]
Contents of /sys/class/drm/card0/error for the 3rd hang
Comment 8 Petr Manek 2018-01-10 14:27:12 UTC
Created attachment 136651 [details]
Appropriate part of dmesg for the 3rd hang
Comment 9 Petr Manek 2018-01-10 14:28:32 UTC
I have added info about yet another hang. Happened when approximately 64K lines were printed in xterm by a faulty Python script.
Comment 10 Petr Manek 2018-01-17 22:34:56 UTC
Created attachment 136816 [details]
Contents of /sys/class/drm/card0/error for the 4th hang
Comment 11 Petr Manek 2018-01-17 22:35:17 UTC
Created attachment 136817 [details]
Appropriate part of dmesg for the 4th hang
Comment 12 Petr Manek 2018-01-17 22:36:11 UTC
Another hang happened. This time with no particular reason whatsoever. The crash dump and dmesg is attached. P.
Comment 13 Petr Manek 2018-01-18 00:54:50 UTC
Created attachment 136822 [details]
Contents of /sys/class/drm/card0/error for the 5th hang
Comment 14 Petr Manek 2018-01-18 00:55:06 UTC
Created attachment 136823 [details]
Appropriate part of dmesg for the 5th hang
Comment 15 Petr Manek 2018-01-18 00:55:49 UTC
One more hang. This time during memory-expensive calculations.
Comment 16 Rainbert 2018-01-20 00:23:07 UTC
I have the very same problem. 
Instead of filing an own bug report I append my stuff here as I also have the identifier "GPU HANG: ecode 9:0:0x85dffffb" in my dmesg. 

The hag occurs randomly 2-5 times a day. For me usually when I'm using libreoffice in KDE/Plasma on my up to date archlinux. 
Description otherwise the same as in Petr's case. 

Kernel: 4.14.11-1-ARCH
Mesa: 17.3.1-2
Machine: Lenovo Thinkpad X1 Carbon 2016 generation. 

I did *not* install the package xf86-video-intel. Should I? 
I do not use any specific xorg configuration files. 

I already reported this bug 1 year ago here: https://bugs.freedesktop.org/show_bug.cgi?id=99325 
This bug showed the same behavior but identified differently (maybe the logging has changed and is now more explicit). 

I also attach the output of dmesg and te /sys/class/drm/card0/error file.
Comment 17 Rainbert 2018-01-20 00:25:11 UTC
Created attachment 136861 [details]
Appropriate part of dmesg for Rainbert's 1st hang
Comment 18 Rainbert 2018-01-20 00:26:11 UTC
Created attachment 136862 [details]
Contents of /sys/class/drm/card0/error for Rainbert's 1st hang
Comment 19 Elizabeth 2018-01-23 22:57:34 UTC
Hello everybody, 
Thanks for keeping track. Could you investigate if the random hangs can be reduced to some specific case, like the MATLAB command that Petr mentioned or any specific process related? It will be really useful to identify a reliable way to reproduce this. Also, any of you have tried with another desktop besides KDE? It will be helpful to know if this is actually desktop dependent. And as you mention about bug 99325, could you attach xorg log from a hanged session as requested by Mark. Also, dmesg messages of hang are no more needed since we have the error states attached.
Thanks again.
Comment 20 Petr Manek 2018-01-23 23:13:38 UTC
Hello Elizabeth,

Thanks for the update.

In my case it seems that the hangs never happen immediately after booting. Instead, most of them happen few hours after. That suggests to me that the problem might somehow be related to memory usage (acessing invalid buffers? not enough memory to continue?). In two of the four documented cases (so far), the hang occurred as excessive amount of output characters was printed into xterm (infinite loops, faulty scripts). That could either be linked to the memory problem or to another similar root cause.

Also, please note that I use i3 WM instead of KDE in my setup. Sometimes I use two displays (via HDMI) in extended mode (not mirror) However, in all instances so far, the hang occurred in a single display mode.

Best,
Petr
Comment 21 Elizabeth 2018-03-14 23:33:04 UTC
Hello again everyone, new mesa 17.3.6 release includes important fixes for gpu hangs reported on games and DEs, could any of you try it and report back? If the issue still is happening with that version, a way to reliably reproduce this still will be the best approach.
Comment 22 Petr Manek 2018-04-11 14:45:15 UTC
(In reply to Elizabeth from comment #21)
> Hello again everyone, new mesa 17.3.6 release includes important fixes for
> gpu hangs reported on games and DEs, could any of you try it and report
> back? If the issue still is happening with that version, a way to reliably
> reproduce this still will be the best approach.

Hello there,

It has been almost a month since your last message. I'm currently running mesa 18.0.0 and no GPU hang has occured since the update to mesa 17.3.6.

I think it's safe to assume that the bug has been fixed in my case.

Thank you and cheers!


Petr


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.