Bug 107968 - [kb] GPU HANG: ecode 9:0:0x85dffffb, in zoom
Summary: [kb] GPU HANG: ecode 9:0:0x85dffffb, in zoom
Status: RESOLVED WORKSFORME
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-18 08:23 UTC by Marcin Owsiany
Modified: 2018-10-26 08:57 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
/sys/class/drm/card0/error (763.94 KB, text/plain)
2018-09-18 08:23 UTC, Marcin Owsiany
Details
xrandr --verbose (17.48 KB, text/plain)
2018-09-18 08:27 UTC, Marcin Owsiany
Details
glxinfo (26.73 KB, text/plain)
2018-09-18 10:25 UTC, Marcin Owsiany
Details

Description Marcin Owsiany 2018-09-18 08:23:50 UTC
Created attachment 141620 [details]
/sys/class/drm/card0/error

My case sounds similar to https://bugs.freedesktop.org/show_bug.cgi?id=101203 but dmesg told me to file a new bug, so here it is.

Using Debian stable with kernel 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) on Lenovo T580 laptop with the following device:

00:02.0 VGA compatible controller: Intel Corporation Device 5917 (rev 07) (prog-if 00 [VGA controller])
	Subsystem: Lenovo Device 225a
	Flags: bus master, fast devsel, latency 0, IRQ 143
	Memory at e7000000 (64-bit, non-prefetchable) [size=16M]
	Memory at c0000000 (64-bit, prefetchable) [size=256M]
	I/O ports at e000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: i915
	Kernel modules: i915


I use zoom conference approximately once per day on average, and it crashes due to this hang in about one out of 3 uses at seemingly random times while the conference is on. I do not know how to reproduce this at will.

In my case the message was:
[275903.554471] [drm] GPU HANG: ecode 9:0:0x85dffffb, in zoom [29759], reason: Hang on render ring, action: reset
[275903.554474] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[275903.554475] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[275903.554476] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[275903.554477] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[275903.554478] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[275903.554576] drm/i915: Resetting chip after gpu hang
[275903.554650] [drm] RC6 on
[275903.574354] [drm] GuC firmware load skipped
[275914.529457] drm/i915: Resetting chip after gpu hang
[275914.529542] [drm] RC6 on
[275914.547604] [drm] GuC firmware load skipped

Attaching the crash dump.

I wonder whether someone would be able to figure out a specific fix that could be backported to Debian stable's kernel?
Comment 1 Marcin Owsiany 2018-09-18 08:27:28 UTC
Created attachment 141622 [details]
xrandr --verbose

Also adding xrandr output in case it matters (I have a dual-monitor setup).
Comment 2 Chris Wilson 2018-09-18 08:30:47 UTC
If you mesa (libGL) is as ancient as the kernel, you will be best served by updating both.

(In reply to Marcin Owsiany from comment #0)
> I wonder whether someone would be able to figure out a specific fix that
> could be backported to Debian stable's kernel?

You are missing a few years of bug fixes. Where to start?
Comment 3 Illia Iorin 2018-09-18 10:05:21 UTC
Could you please  do several things:
- update kernel to 4.16 or later.
- provide your current mesa version(glxinfo)
- install custom mesa from git-master
and try to reproduce this bug after you do those things.
Comment 4 Marcin Owsiany 2018-09-18 10:25:27 UTC
Created attachment 141634 [details]
glxinfo

Attaching glxinfo output.
Comment 5 Marcin Owsiany 2018-09-18 10:30:54 UTC
I just built the kernel from drm-tip.

Trying to build mesa following the instructions in the "BUILDING 3D-MESA" section of https://01.org/linuxgraphics/documentation/build-guide-0 failed with:

configure: error: unrecognized option: `--enble-dri3'
Comment 6 Illia Iorin 2018-09-18 10:53:28 UTC
Try to configure mesa by this command:
./autogen.sh --with-gallium-drivers="" --with-dri-drivers=i965 --prefix=<path to bins>

more information at https://mesa3d.org/install.html
Comment 7 Sergii Romantsov 2018-09-18 12:15:40 UTC
<< configure: error: unrecognized option: `--enble-dri3'
Here is misprint: should be '--enable-dri3'
But it may require additional dependencies and so you can try --disable-dri3 instead.
Comment 8 Marcin Owsiany 2018-09-24 13:06:21 UTC
I managed to build mesa with your help.

However after rebooting to the kernel built in Comment 5 I found out that it does not know how to use my LVM group. Looks like "make defconfig" does not attempt to use the config I'm running currently and perhaps some disk encryption stuff is missing...

I remember back from the days of linux-2.4.x that there used to be something like "make oldconfig" but it was not a completely flawless experience either. And I don't really have time this week to dive into how one configures the kernel these days :-(
Comment 9 Denis 2018-09-24 13:26:42 UTC
there is one more way to check this, simply add "test" repository and download 4.18 kernel from it (if you want to leave system as is, then don't update anything except kernel and mesa).

https://serverfault.com/questions/550855/how-to-add-debian-testing-repository-to-apt-get


About problem with configs, I made it another way - copied current config to folder with kernel, made "make config" - selected "load config" (or similar) and re-saved it. Worked for me.
Comment 10 Marcin Owsiany 2018-09-26 07:26:18 UTC
Thank you for the suggestion, Denis!

While installing packages directly from the testing suite into a stable system is risky, your suggestion made me realize that there might be a more recent kernel in the "backports" suite. And indeed, there is linux-image-4.17.0-0.bpo.3-amd64 which is built in a way that should work flawlessly in my system.

There are also some more recent mesa packages available, but there are quite a few of them, and I'm wondering which ones I really need? Do I need to look at the libraries which zoom is linked against? Or is it the X server which needs the updated libraries? Sorry if this question seems silly, but I hope that I can test this with a little bit of your help...
Comment 11 Denis 2018-09-26 08:12:22 UTC
no worries, Marcin.
4.17 is good also. About mesa - I think, any higher then 17.3.+ - will not have the bug (in general - higher - better, cos it will be more fresh :) )
About X server - latest "stable" in 16.04 ubuntu is 1.19.6, but I am not sure, that for current issue it should matter.
Comment 12 Marcin Owsiany 2018-10-08 07:57:47 UTC
FWIW, upgrading to linux-image-4.17.0-0.bpo.3-amd64 (4.17.17-1~bpo9+1) seems to have helped with the crashes. I did not need to touch mesa (I actually tried upgrading it to the version in backports, but it just made the X server crash, so I reverted).
Comment 13 Illia Iorin 2018-10-09 10:32:26 UTC
It is good news. I think if this bug will not appear in two week we can close this bug and relative ones like this https://bugs.freedesktop.org/show_bug.cgi?id=101203
Comment 14 Denis 2018-10-26 08:57:19 UTC
closing the issue because we didn't get other comments from reporter, so I suspect that kernel update helped.

Marcin, please reopen this issue if it still actual for you.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.