Steps to reproduce:
1. Enable OpenGL layers in about:config and restart firefox
2. Navigate to this page: https://www.thebalance.com/obama-tax-cuts-3306330
Firefox Completely hangs, and eventually the firefox window turns all black
If I look at dmesg in the terminal I see:
[ 2442.865994] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Compositor , reason: Hang on rcs0, action: reset
[ 2442.865995] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 2442.865995] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 2442.865996] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 2442.865996] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 2442.865997] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 2442.866002] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 2450.853937] i915 0000:00:02.0: Resetting rcs0 after gpu hang
GPU should not hang
- Firefox Version: 57.0.2 64-bit
- Hardware: Dell XPS 13 9360 FHD
- Distro: Solus
- Graphics info:
Display Server: x11 (X.Org 1.19.5 ) drivers: modesetting (unloaded: fbdev,vesa)
OpenGL: renderer: Mesa DRI Intel HD Graphics 620 (Kaby Lake GT2)
version: 4.5 Mesa 17.3.1 Direct Render: Yes
Created attachment 136386 [details]
crash dump from /sys/class/drm/card0/error attached
Hello Brandon, do this worked fine with older Mesa versions? Is there a change that you test with this branch https://cgit.freedesktop.org/mesa/mesa/? Do firefox gives any trace when the issue happens?
I booted my solus live usb (which has older versions, xorg 1.18, mesa 17.1.6, kernel 4.12) and I could not reproduce the issue there
If I wait a while after the gpu hang firefox eventually crashes and this is the firefox crash information: https://crash-stats.mozilla.com/report/index/14af5f17-396e-4348-8af1-0296f0180106
Alright, so on my existing fully updated mesa install (kernel 4.14, xorg 1.19) I downgraded ONLY the mesalib package from version 17.3.1 to 17.2.6 (the next newest version in solus's package cache) and then rebooted.
With 17.2.6 I cannot re-create the issue, so it seems like the problem appear somewhere between 17.2.6 and 17.3.1
I hope that helps!
(In reply to Brandon Watkins from comment #5)
> Alright, so on my existing fully updated mesa install (kernel 4.14, xorg
> 1.19) I downgraded ONLY the mesalib package from version 17.3.1 to 17.2.6
> (the next newest version in solus's package cache) and then rebooted.
> With 17.2.6 I cannot re-create the issue, so it seems like the problem
> appear somewhere between 17.2.6 and 17.3.1
> I hope that helps!
That's really helpful. If you got the chance, could you try to bisect for the initial bad commit?
I'm not very knowledgeable about bisecting, building packages etc... (and my attempts so far weren't fruitful lol). However I was able to narrow it down a bit further to between 17.2.6 and 17.3.0
Still haven't had any luck getting bisect to work (I can build mesa-bit fine, but get errors when building the first commit suggested). I did build and install the latest mesa-git though and can confirm that I could still re-create the issue there
Hello again, quick question what desktop environment are you using right now?
I was able to re-create the issue with both gnome 3.26, budgie 10.4 and KDE Plasma 5.11
also, to clarify the steps to re-create the firefox setting to enable is layers.acceleration.force-enabled
Thanks for that piece of information! I can reproduce and I'm bisecting now.
ea0d2e98ecb369ab84e78c84709c0930ea8c293a is the first bad commit
Author: Kenneth Graunke <email@example.com>
Date: Thu Oct 5 20:31:01 2017 -0700
i965: Disable auxiliary buffers when there are self-dependencies.
Jason and I investigated several OpenGL CTS failures where the tests
bind the same texture for rendering and texturing, at the same time.
This has defined results as long as the reads happen before writes,
or the regions are non-overlapping. Normally, this just works out.
However, CCS can cause problems. If the shader is reading one set of
pixels, and writing to different pixels that are adjacent, they may end
up being covered by the same CCS block. So rendering may be writing a
CCS block, while the sampler is trying to read it. Corruption ensues.
Disabling CCS is unfortunate, but safe.
Fixes several KHR-GL45.texture_barrier.* subtests.
Reviewed-by: Nanley Chery <firstname.lastname@example.org>
Reviewed-by: Jason Ekstrand <email@example.com>
Created attachment 136930 [details]
trace reproducing gpu hang
looks like we will have a patch momentarily to fix this.
I verified that it is fixed by the branch:
*** This bug has been marked as a duplicate of bug 104411 ***