Bug 104383 - [KBL] Intel GPU hang with firefox
Summary: [KBL] Intel GPU hang with firefox
Status: RESOLVED DUPLICATE of bug 104411
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 17.3
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Kenneth Graunke
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-25 18:04 UTC by Brandon Watkins
Modified: 2018-01-24 20:28 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features:


Attachments
crash dump (42.82 KB, text/plain)
2017-12-25 18:08 UTC, Brandon Watkins
Details
trace reproducing gpu hang (142.82 MB, application/x-compressed-tar)
2018-01-24 07:10 UTC, Mark Janes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Brandon Watkins 2017-12-25 18:04:06 UTC
Steps to reproduce:

1. Enable OpenGL layers in about:config and restart firefox
2. Navigate to this page: https://www.thebalance.com/obama-tax-cuts-3306330



Actual results:

Firefox Completely hangs, and eventually the firefox window turns all black

If I look at dmesg in the terminal I see:

[ 2442.865994] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Compositor [4679], reason: Hang on rcs0, action: reset
[ 2442.865995] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 2442.865995] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 2442.865996] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 2442.865996] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 2442.865997] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 2442.866002] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 2450.853937] i915 0000:00:02.0: Resetting rcs0 after gpu hang



Expected results:

GPU should not hang


NOTES: 

- Firefox Version: 57.0.2 64-bit
- Hardware: Dell XPS 13 9360 FHD
- Distro: Solus
- Graphics info:

 Display Server: x11 (X.Org 1.19.5 ) drivers: modesetting (unloaded: fbdev,vesa)
           Resolution: 1920x1080@59.93hz
           OpenGL: renderer: Mesa DRI Intel HD Graphics 620 (Kaby Lake GT2)
           version: 4.5 Mesa 17.3.1 Direct Render: Yes
Comment 1 Brandon Watkins 2017-12-25 18:08:02 UTC
Created attachment 136386 [details]
crash dump

crash dump from /sys/class/drm/card0/error attached
Comment 2 Elizabeth 2018-01-05 22:03:06 UTC
Hello Brandon, do this worked fine with older Mesa versions? Is there a change that you test with this branch https://cgit.freedesktop.org/mesa/mesa/? Do firefox gives any trace when the issue happens?
Thanks.
Comment 3 Brandon Watkins 2018-01-06 01:14:49 UTC
I booted my solus live usb (which has older versions, xorg 1.18, mesa 17.1.6, kernel 4.12) and I could not reproduce the issue there
Comment 4 Brandon Watkins 2018-01-06 01:24:14 UTC
If I wait a while after the gpu hang firefox eventually crashes and this is the firefox crash information: https://crash-stats.mozilla.com/report/index/14af5f17-396e-4348-8af1-0296f0180106
Comment 5 Brandon Watkins 2018-01-06 02:13:20 UTC
Alright, so on my existing fully updated mesa install (kernel 4.14, xorg 1.19) I downgraded ONLY the mesalib package from version 17.3.1 to 17.2.6 (the next newest version in solus's package cache) and then rebooted.

With 17.2.6 I cannot re-create the issue, so it seems like the problem appear somewhere between 17.2.6 and 17.3.1

I hope that helps!
Comment 6 Elizabeth 2018-01-08 18:06:55 UTC
(In reply to Brandon Watkins from comment #5)
> Alright, so on my existing fully updated mesa install (kernel 4.14, xorg
> 1.19) I downgraded ONLY the mesalib package from version 17.3.1 to 17.2.6
> (the next newest version in solus's package cache) and then rebooted.
> 
> With 17.2.6 I cannot re-create the issue, so it seems like the problem
> appear somewhere between 17.2.6 and 17.3.1
> 
> I hope that helps!
That's really helpful. If you got the chance, could you try to bisect for the initial bad commit?
Comment 7 Brandon Watkins 2018-01-14 00:32:27 UTC
I'm not very knowledgeable about bisecting, building packages etc... (and my attempts so far weren't fruitful lol). However I was able to narrow it down a bit further to between 17.2.6 and 17.3.0
Comment 8 Brandon Watkins 2018-01-20 19:44:34 UTC
Still haven't had any luck getting bisect to work (I can build mesa-bit fine, but get errors when building the first commit suggested). I did build and install the latest mesa-git though and can confirm that I could still re-create the issue there
Comment 9 Elizabeth 2018-01-23 23:12:24 UTC
Hello again, quick question what desktop environment are you using right now?
Comment 10 Brandon Watkins 2018-01-23 23:58:21 UTC
I was able to re-create the issue with both gnome 3.26, budgie 10.4 and KDE Plasma 5.11
Comment 11 Brandon Watkins 2018-01-24 01:18:28 UTC
also, to clarify the steps to re-create the firefox setting to enable is layers.acceleration.force-enabled
Comment 12 Mark Janes 2018-01-24 06:20:03 UTC
Thanks for that piece of information!  I can reproduce and I'm bisecting now.
Comment 13 Mark Janes 2018-01-24 06:57:44 UTC
bisected to


ea0d2e98ecb369ab84e78c84709c0930ea8c293a is the first bad commit
commit ea0d2e98ecb369ab84e78c84709c0930ea8c293a
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Thu Oct 5 20:31:01 2017 -0700
    i965: Disable auxiliary buffers when there are self-dependencies.
    
    Jason and I investigated several OpenGL CTS failures where the tests
    bind the same texture for rendering and texturing, at the same time.
    This has defined results as long as the reads happen before writes,
    or the regions are non-overlapping.  Normally, this just works out.
    
    However, CCS can cause problems.  If the shader is reading one set of
    pixels, and writing to different pixels that are adjacent, they may end
    up being covered by the same CCS block.  So rendering may be writing a
    CCS block, while the sampler is trying to read it.  Corruption ensues.
    
    Disabling CCS is unfortunate, but safe.
    
    Fixes several KHR-GL45.texture_barrier.* subtests.
    
    Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Comment 14 Mark Janes 2018-01-24 07:10:41 UTC
Created attachment 136930 [details]
trace reproducing gpu hang
Comment 15 Mark Janes 2018-01-24 07:20:54 UTC
looks like we will have a patch momentarily to fix this.

I verified that it is fixed by the branch:

git://people.freedesktop.org/~jekstrand/mesa  wip/bug-104411

*** This bug has been marked as a duplicate of bug 104411 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.