Created attachment 141620 [details]
My case sounds similar to https://bugs.freedesktop.org/show_bug.cgi?id=101203 but dmesg told me to file a new bug, so here it is.
Using Debian stable with kernel 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) on Lenovo T580 laptop with the following device:
00:02.0 VGA compatible controller: Intel Corporation Device 5917 (rev 07) (prog-if 00 [VGA controller])
Subsystem: Lenovo Device 225a
Flags: bus master, fast devsel, latency 0, IRQ 143
Memory at e7000000 (64-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
I/O ports at e000 [size=64]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
I use zoom conference approximately once per day on average, and it crashes due to this hang in about one out of 3 uses at seemingly random times while the conference is on. I do not know how to reproduce this at will.
In my case the message was:
[275903.554471] [drm] GPU HANG: ecode 9:0:0x85dffffb, in zoom , reason: Hang on render ring, action: reset
[275903.554474] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[275903.554475] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[275903.554476] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[275903.554477] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[275903.554478] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[275903.554576] drm/i915: Resetting chip after gpu hang
[275903.554650] [drm] RC6 on
[275903.574354] [drm] GuC firmware load skipped
[275914.529457] drm/i915: Resetting chip after gpu hang
[275914.529542] [drm] RC6 on
[275914.547604] [drm] GuC firmware load skipped
Attaching the crash dump.
I wonder whether someone would be able to figure out a specific fix that could be backported to Debian stable's kernel?
Created attachment 141622 [details]
Also adding xrandr output in case it matters (I have a dual-monitor setup).
If you mesa (libGL) is as ancient as the kernel, you will be best served by updating both.
(In reply to Marcin Owsiany from comment #0)
> I wonder whether someone would be able to figure out a specific fix that
> could be backported to Debian stable's kernel?
You are missing a few years of bug fixes. Where to start?
Could you please do several things:
- update kernel to 4.16 or later.
- provide your current mesa version(glxinfo)
- install custom mesa from git-master
and try to reproduce this bug after you do those things.
Created attachment 141634 [details]
Attaching glxinfo output.
I just built the kernel from drm-tip.
Trying to build mesa following the instructions in the "BUILDING 3D-MESA" section of https://01.org/linuxgraphics/documentation/build-guide-0 failed with:
configure: error: unrecognized option: `--enble-dri3'
Try to configure mesa by this command:
./autogen.sh --with-gallium-drivers="" --with-dri-drivers=i965 --prefix=<path to bins>
more information at https://mesa3d.org/install.html
<< configure: error: unrecognized option: `--enble-dri3'
Here is misprint: should be '--enable-dri3'
But it may require additional dependencies and so you can try --disable-dri3 instead.
I managed to build mesa with your help.
However after rebooting to the kernel built in Comment 5 I found out that it does not know how to use my LVM group. Looks like "make defconfig" does not attempt to use the config I'm running currently and perhaps some disk encryption stuff is missing...
I remember back from the days of linux-2.4.x that there used to be something like "make oldconfig" but it was not a completely flawless experience either. And I don't really have time this week to dive into how one configures the kernel these days :-(
there is one more way to check this, simply add "test" repository and download 4.18 kernel from it (if you want to leave system as is, then don't update anything except kernel and mesa).
About problem with configs, I made it another way - copied current config to folder with kernel, made "make config" - selected "load config" (or similar) and re-saved it. Worked for me.
Thank you for the suggestion, Denis!
While installing packages directly from the testing suite into a stable system is risky, your suggestion made me realize that there might be a more recent kernel in the "backports" suite. And indeed, there is linux-image-4.17.0-0.bpo.3-amd64 which is built in a way that should work flawlessly in my system.
There are also some more recent mesa packages available, but there are quite a few of them, and I'm wondering which ones I really need? Do I need to look at the libraries which zoom is linked against? Or is it the X server which needs the updated libraries? Sorry if this question seems silly, but I hope that I can test this with a little bit of your help...
no worries, Marcin.
4.17 is good also. About mesa - I think, any higher then 17.3.+ - will not have the bug (in general - higher - better, cos it will be more fresh :) )
About X server - latest "stable" in 16.04 ubuntu is 1.19.6, but I am not sure, that for current issue it should matter.
FWIW, upgrading to linux-image-4.17.0-0.bpo.3-amd64 (4.17.17-1~bpo9+1) seems to have helped with the crashes. I did not need to touch mesa (I actually tried upgrading it to the version in backports, but it just made the X server crash, so I reverted).
It is good news. I think if this bug will not appear in two week we can close this bug and relative ones like this https://bugs.freedesktop.org/show_bug.cgi?id=101203
closing the issue because we didn't get other comments from reporter, so I suspect that kernel update helped.
Marcin, please reopen this issue if it still actual for you.