Bug 105182 - [skl] GPU HANG: ecode 9:0:0x85dffffb, in X
Summary: [skl] GPU HANG: ecode 9:0:0x85dffffb, in X
Status: NEEDINFO
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-21 00:15 UTC by robinsbd
Modified: 2018-03-06 05:26 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Intel GPU Crash Dump (27.45 KB, text/plain)
2018-02-21 00:15 UTC, robinsbd
Details
Updated GPU crash dump running kernel 4.15 (53.60 KB, text/plain)
2018-03-05 20:35 UTC, robinsbd
Details
Intel GPU Crash Dump with only kernel 4.15 data (26.14 KB, text/plain)
2018-03-05 20:40 UTC, robinsbd
Details

Note You need to log in before you can comment on or make changes to this bug.
Description robinsbd 2018-02-21 00:15:28 UTC
Created attachment 137489 [details]
Intel GPU Crash Dump

When this issue happens, the X server and gdm process terminates and you end up back at the login screen. The system itself is not hung, but you lose your desktop session.

Kernel:  3.10.0-693.11.1.el7.x86_64
Linux Distro:  RHEL 7.4
Hardware:  Dell XPS 13 9350
Display Connector:  Just the laptop display

I have attached a GPU crash dump.

In addition, here are the relevant log entries from /var/log/messsages for the last 2 times the GPU hung.

=====================
Feb 20 16:35:12 xecho kernel: [kern.info][250168.455912] [drm] GPU HANG: ecode 9:0:0x85dffffb, in X [1636], reason: Hang on render ring, action: reset
Feb 20 16:35:12 xecho kernel: [kern.info][250168.455916] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb 20 16:35:12 xecho kernel: [kern.info][250168.455917] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 20 16:35:12 xecho kernel: [kern.info][250168.455918] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb 20 16:35:12 xecho kernel: [kern.info][250168.455918] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Feb 20 16:35:12 xecho kernel: [kern.info][250168.455919] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Feb 20 16:35:12 xecho kernel: [kern.notice][250168.455963] drm/i915: Resetting chip after gpu hang
Feb 20 16:35:12 xecho kernel: [kern.info][250168.456024] [drm] RC6 on
Feb 20 16:35:12 xecho kernel: [kern.info][250168.467483] [drm] GuC firmware load skipped
Feb 20 16:35:24 xecho kernel: [kern.notice][250180.401159] drm/i915: Resetting chip after gpu hang
Feb 20 16:35:24 xecho kernel: [kern.info][250180.401251] [drm] RC6 on
Feb 20 16:35:24 xecho kernel: [kern.info][250180.417580] [drm] GuC firmware load skipped
Feb 20 16:35:25 xecho gdm: [user.notice] Child process 1636 was already dead.
=====================
Feb 20 17:04:01 xecho kernel: [kern.notice][251897.385910] drm/i915: Resetting chip after gpu hang
Feb 20 17:04:01 xecho kernel: [kern.info][251897.386001] [drm] RC6 on
Feb 20 17:04:01 xecho kernel: [kern.info][251897.402075] [drm] GuC firmware load skipped
Feb 20 17:04:17 xecho kernel: [kern.notice][251913.380806] drm/i915: Resetting chip after gpu hang
Feb 20 17:04:17 xecho kernel: [kern.info][251913.380882] [drm] RC6 on
Feb 20 17:04:17 xecho kernel: [kern.info][251913.396346] [drm] GuC firmware load skipped
Feb 20 17:04:29 xecho kernel: [kern.notice][251925.384639] drm/i915: Resetting chip after gpu hang
Feb 20 17:04:29 xecho kernel: [kern.info][251925.384717] [drm] RC6 on
Feb 20 17:04:29 xecho kernel: [kern.info][251925.400945] [drm] GuC firmware load skipped
Feb 20 17:04:31 xecho gdm: [user.notice] Child process 16301 was already dead.
=====================
Comment 1 robinsbd 2018-02-21 00:16:41 UTC
The user was running KDE and using LibreOffice at the time.
Comment 2 Elizabeth 2018-02-22 17:36:47 UTC
Hello, what mesa version do you have? An issue related was recently fixed in mesa 17.3, it could be helpful to test 17.3 or up. Also is it possible for you to test at least with kernel 4.13 to get more information in the crash dump? Thanks.
Comment 3 robinsbd 2018-02-22 18:04:56 UTC
I am running mesa 17.0.1-6.20170307.
Comment 4 robinsbd 2018-03-05 20:27:31 UTC
I updated the kernel to 4.15.7-1.el7.elrepo.x86_64.

Still getting a GPU HANG.

Mar  5 13:00:23 xecho kernel: [kern.info][ 7824.673564] [drm] GPU HANG: ecode 9:0:0x86dffffd, in X [1975], reason: Hang on rcs0, action: reset
Mar  5 13:00:23 xecho kernel: [kern.info][ 7824.673566] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mar  5 13:00:23 xecho kernel: [kern.info][ 7824.673566] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Mar  5 13:00:23 xecho kernel: [kern.info][ 7824.673566] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mar  5 13:00:23 xecho kernel: [kern.info][ 7824.673567] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Mar  5 13:00:23 xecho kernel: [kern.info][ 7824.673567] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Mar  5 13:00:23 xecho kernel: [kern.notice][ 7824.673579] i915 0000:00:02.0: Resetting rcs0 after gpu hang

Mesa is still at version 17.0.1-6.20170307.
Comment 5 robinsbd 2018-03-05 20:35:35 UTC
Created attachment 137798 [details]
Updated GPU crash dump running kernel 4.15

I uploaded the latest GPU crash dump running kernel 4.15.
Comment 6 robinsbd 2018-03-05 20:40:49 UTC
Created attachment 137799 [details]
Intel GPU Crash Dump with only kernel 4.15 data

I just uploaded a crash dump that contains only the dump relevant to kernel 4.15.
Comment 7 Mark Janes 2018-03-05 21:50:53 UTC
please update to 17.3.6, which will fix this issue.
Comment 8 robinsbd 2018-03-05 21:53:46 UTC
Can you point me to any instructions or tutorial for updating Mesa to 17.3.6?

I am running RHEL 7.4 and of course I can only get RPM packages for 17.0.1.

Thanks!
Comment 9 Elizabeth 2018-03-05 23:41:18 UTC
(In reply to robinsbd from comment #8)
> Can you point me to any instructions or tutorial for updating Mesa to 17.3.6?
> 
> I am running RHEL 7.4 and of course I can only get RPM packages for 17.0.1.
> 
> Thanks!
https://www.mesa3d.org
Comment 10 robinsbd 2018-03-06 01:57:23 UTC
I just found today that mesa-17.2.3 has now been made available for RHEL 7.4. That would be an upgrade from 17.0.1 to 17.2.3. The full package version is 17.2.3-7.20171019.el7.

Can you tell me if this version is late enough to resolve this issue? 
Do you still think I absolutely need the latest at 17.3.6?

I'm game for building 17.3.6, but if I could stay within my distro, that would be preferred.
Comment 11 robinsbd 2018-03-06 02:04:25 UTC
In my previous comment, I said RHEL 7.4. That is incorrect. The mesa 17.2.3 is from RHEL 7.5 beta and it includes support for wayland. I may still be able to install this mesa version if needed.
Comment 12 robinsbd 2018-03-06 02:46:04 UTC
I am compiling mesa 17.3.6 now.

Quick question is once this is installed in /usr/local, is there an easy way to check to make sure this mesa version is actually being used and not the stock installation in /usr?

Thanks.
Comment 13 Mark Janes 2018-03-06 05:26:44 UTC
export LD_LIBRARY_PATH and LIBGL_DRIVERS_PATH to point at your install location.  I think in your case it will be:

LD_LIBRARY_PATH=/usr/local/lib
LIBGL_DRIVERS_PATH=/usr/local/lib/dri


To verify, run `glxinfo | grep version`.  You should see the mesa version in the OpenGL version string.

Thanks for taking the time to verify this for us.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.