105184 – [skl] GPU HANG: ecode 9:0:0x85dffffb, in Xorg - random and REALLY annoying.

Bug 105184 - [skl] GPU HANG: ecode 9:0:0x85dffffb, in Xorg - random and REALLY annoying.

Summary: [skl] GPU HANG: ecode 9:0:0x85dffffb, in Xorg - random and REALLY annoying.

Status:	RESOLVED DUPLICATE of bug 104411

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel 3D Bugs Mailing List
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-02-21 04:24 UTC by Wes Will
Modified:	2018-03-01 08:17 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:
i915 features:

Attachments
/sys/class/drm/card0/error (42.23 KB, text/plain) 2018-02-21 14:30 UTC, Kamil Jońca	Details
gpu error text (852.62 KB, text/plain) 2018-02-21 17:14 UTC, Wes Will	Details
Show Obsolete (1) View All

Description Wes Will 2018-02-21 04:24:44 UTC

I have searched the bugzilla archives and found tons of things similar but nothing which makes any difference or progress toward correcting this.  It started occurring when OpenSuSE "updated" LEAP 43.2 to the 4.4.114-42 kernel.  Before, in the 4.4. <100 verions I had no such problems whatsoever.  There was an intermediate update to the 4.4.104 kernel, and power management went out the window.  No graphics issues then, and power management was restored / fixed with this .114 increment.  

This GPU hang happens about a dozen times a day.  Too random to pin down, can happen in any open window and any running application.  Usual tasks are one web browser (PaleMoon - version 27.6.2 (64-bit)) with no more than five tabs open at once; one email client (SeaMonkey Mail - User agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.1); and sometimes an e-book reader (FBReader - version of the RPM package shows fbreader-0.12.10-19.23.x86_64).

I really don't use much of anything else on a regular basis, don't 'game,' and don't do much of anything more graphics-intensive than the odd video clip.  

Clipboard manager, wireless networking, and a battery monitor is about it.  KDE window manager running, and thinking seriously about dumping EVERYTHING 'systemd' related and switching distros entirely to something WITHOUT systemd or any of that foolishness.  (Lennart can STUFF his idiotic systemd crap.  Give me my text file configurations back!!)

More info, just what I can guess is pertinent, ask for anything else you require if it comes to your mind:

output of uname -a 
Linux toughbook.farreaches.org 4.4.114-42-default #1 SMP Tue Feb 6 10:58:10 UTC 2018 (b6ee9ae) x86_64 x86_64 x86_64 GNU/Linux

Hardware is an ex-Illinois-State-Police Panasonic CF-30 ToughBook laptop I rescued from a recycler.  Nothing at all added or special in the hardware, everything is bog-standard, just like straight out of the factory as far as the memory and CPU specs go.  Other than the <expletive deleted> multiplexed USB glidepad and touch screen which still gives me grief now and again, it does yeoman service and is just as rugged as advertised.

The pertinent part of dmesg gives:
 
[drm] GPU HANG: ecode 4:0:0x828fffff, in X [2213], reason: Hang on render ring, action: reset
[  266.808827] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  266.808828] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  266.808829] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  266.808829] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  266.808830] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  266.808894] drm/i915: Resetting chip after gpu hang

So here, as asked in the dmesg text, is a new bug for your perusal and hopefully my edification.  Crash dump file /sys/class/drm/card0/error is ZERO BYTES and cannot be deleted even as root.  If it had anything in it I would attach it....

Comment 1 Chris Wilson 2018-02-21 08:13:42 UTC

(In reply to Wes Will from comment #0)
> So here, as asked in the dmesg text, is a new bug for your perusal and
> hopefully my edification.  Crash dump file /sys/class/drm/card0/error is
> ZERO BYTES and cannot be deleted even as root.  If it had anything in it I
> would attach it....

It is a virtual file. It is never 0 bytes, ls lies. Just cat it and attach it.

Comment 2 Kamil Jońca 2018-02-21 14:28:36 UTC

I have similar problems: 
I tried to edit file with emacs and everything hangs, and then X server restarts 
syslog shows:
Feb 21 08:49:57 perkoz kernel: [  346.798404] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [684], reason: Hang on rcs0, action: reset
Feb 21 08:49:57 perkoz kernel: [  346.798405] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb 21 08:49:57 perkoz kernel: [  346.798406] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 21 08:49:57 perkoz kernel: [  346.798406] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb 21 08:49:57 perkoz kernel: [  346.798406] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Feb 21 08:49:57 perkoz kernel: [  346.798407] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Feb 21 08:49:57 perkoz kernel: [  346.798412] i915 0000:00:02.0: Resetting rcs0 after gpu hang
Feb 21 08:50:05 perkoz kernel: [  354.791038] i915 0000:00:02.0: Resetting rcs0 after gpu hang
Feb 21 08:50:13 perkoz kernel: [  362.792063] i915 0000:00:02.0: Resetting rcs0 after gpu hang
Feb 21 08:50:21 perkoz kernel: [  370.792973] i915 0000:00:02.0: Resetting rcs0 after gpu hang
Feb 21 08:50:29 perkoz kernel: [  378.793790] i915 0000:00:02.0: Resetting rcs0 after gpu hang

Comment 3 Kamil Jońca 2018-02-21 14:30:58 UTC

Created attachment 137502 [details]
/sys/class/drm/card0/error

Comment 4 Elizabeth 2018-02-21 15:53:39 UTC

Hello Kamil, please try Mesa 18, this could be related to bug 104578.

Comment 5 Wes Will 2018-02-21 17:14:07 UTC

Created attachment 137506 [details]
gpu error text

gpu error file (virtual) converted to (real) text in a (real) file, attached.

Comment 6 Elizabeth 2018-02-22 17:29:44 UTC

Hello Wes, what mesa version are you using? It is possible that you replicate at least with kernel 4.13 to get more information in the error state? Thanks.

Comment 7 Kamil Jońca 2018-02-22 17:38:01 UTC

Kernel version is 4.15.x (debian), mesa is (IIRC) 17.3.3 (this version is packaged in debian sid)

Comment 8 Elizabeth 2018-02-22 17:47:42 UTC

(In reply to Kamil Jońca from comment #7)
> Kernel version is 4.15.x (debian), mesa is (IIRC) 17.3.3 (this version is
> packaged in debian sid)
Unfortunately, mesa 18.0.0.rc4 is not available in debian sid yet. If you have the time you could try to build your own version, try precompiled packages or wait for it to be available. 
https://www.mesa3d.org/

Comment 9 Wes Will 2018-02-22 23:24:47 UTC

(In reply to Elizabeth from comment #6)
> Hello Wes, what mesa version are you using? It is possible that you
> replicate at least with kernel 4.13 to get more information in the error
> state? Thanks.

Highest version (including libgl1 and development) available from the OpenSuSE repositories is only 17.0.5.

I am just about done with SuSE linux as a reasonable distro.  Maybe on top-end and new gear, it might be okay, but it isn't handling the older hardware I have to use nearly well enough any more, and that used to be one of the high points of OpenSuSE.  Add in the systemd idiocy, and I'm about to give up on this, wipe the OS off of here and re-image with a non-systemd distro.  Maybe I'll try a from-scratch compile, for the ability to pick and choose which of the myriad modules wind up in the kernel.

Getting Mesa (version 18-plus), will be one of the goals I'll insist on meeting.  The i915 / DRI issues, if they re-appear with that version I will report it here.  Give me a few hours (maybe a couple of days if I get emergency calls late, I do restaurant maintenance and never know when the idiots will break something important), to get something ready to go on this laptop.

Comment 10 Kamil Jońca 2018-03-01 08:06:08 UTC

(In reply to Elizabeth from comment #8)
> (In reply to Kamil Jońca from comment #7)
> > Kernel version is 4.15.x (debian), mesa is (IIRC) 17.3.3 (this version is
> > packaged in debian sid)
> Unfortunately, mesa 18.0.0.rc4 is not available in debian sid yet. If you
> have the time you could try to build your own version, try precompiled
> packages or wait for it to be available. 
> https://www.mesa3d.org/

After installing  17.3.6-1  version of debian mesa packages emacs stopped to hang my card.

Comment 11 Mark Janes 2018-03-01 08:17:30 UTC


*** This bug has been marked as a duplicate of bug 104411 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.