Bug 96878 - [Bisected: cc2d0e6][HSW] "GPU HANG" msg after autologin to gnome-session
Summary: [Bisected: cc2d0e6][HSW] "GPU HANG" msg after autologin to gnome-session
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Topi Pohjolainen
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
: 96899 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-07-10 09:58 UTC by Arek Ruśniak
Modified: 2016-07-13 16:16 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
gpu crash dump (3.06 MB, text/plain)
2016-07-10 09:58 UTC, Arek Ruśniak
Details
dmesg log (66.36 KB, text/plain)
2016-07-10 10:02 UTC, Arek Ruśniak
Details
Xorg.0.log (41.35 KB, text/x-log)
2016-07-10 12:42 UTC, Arek Ruśniak
Details
dmesg from ubuntu (73.03 KB, text/plain)
2016-07-12 10:29 UTC, Arek Ruśniak
Details
dmesg from ubuntu v2 (72.36 KB, text/plain)
2016-07-12 10:31 UTC, Arek Ruśniak
Details
Proposed fix (4.83 KB, patch)
2016-07-13 07:12 UTC, Topi Pohjolainen
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Arek Ruśniak 2016-07-10 09:58:39 UTC
Created attachment 124984 [details]
gpu crash dump

hi, dmesg shows me this:
[   28.353961] [drm] stuck on render ring
[   28.354632] [drm] GPU HANG: ecode 7:0:0x85dfbff9, in gnome-shell [597], reason: Engine(s) hung, action: reset
[   28.354633] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   28.354634] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   28.354635] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   28.354636] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   28.354637] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   28.355972] drm/i915: Resetting chip after gpu hang

It's only appear when i try autologin into gnome-session (or budgie.desktop) right after boot (DM is started automaticaly by systemd).When i start gdm.service or sddm.servece manually from tty, autologin works. 
When I've tried autologin to another WM like fluxbox or xfce4 it works normally  without hang.

After this hang, gdm (or sddm) starts again and i can login to gnome.session and everything works fine. I've found no other trigger to make this happens again.
Comment 1 Arek Ruśniak 2016-07-10 10:01:11 UTC
And it is all happen after this:

cc2d0e64c0b10884bc12d80018b622911e8b152f is the first bad commit
commit cc2d0e64c0b10884bc12d80018b622911e8b152f
Author: Topi Pohjolainen <topi.pohjolainen@intel.com>
Date:   Fri May 20 11:15:35 2016 +0300

    i965/blorp/gen7+: Stop trashing push constant allocation
    
    Packet 3DSTATE_CONSTANT_PS is still emitted explicitly as ps stage
    itself is enabled and hardware may try to prefetch constants from
    the buffer. From the BSpec: 3D Pipeline - Windower -
    3DSTATE_PUSH_CONSTANT_ALLOC_PS
    
      "Specifies the size of the PS constant buffer. This value will
       determine the amount of data the command stream can pre-fetch
       before the buffer is full."
    
    This is not possible on gen6. From the BSpec about 3DSTATE_CONSTANT_PS:
    
    "This packet must be followed by WM_STATE."
    
    Binding table emissions for stages other than PS can be now dropped,
    they were only needed for the 3DSTATE_CONSTANT_XS to be effective:
    
    From the BSpec:
    
      "The 3DSTATE_CONSTANT_* command is not committed to the shader unit
       until the corresponding (same shader) 3DSTATE_BINDING_TABLE_POINTER_*
       command is parsed."
    
    Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Comment 2 Arek Ruśniak 2016-07-10 10:02:36 UTC
Created attachment 124985 [details]
dmesg log
Comment 3 Kenneth Graunke 2016-07-10 10:48:34 UTC
Thanks so much for bisecting this, it's very helpful!

Topi, can you take a look?
Comment 4 Arek Ruśniak 2016-07-10 12:29:25 UTC
I was wrong autologin is not the trigger, the problem is gnome-shell only i belive. 
I've removed radeon card (radeonsi module causes X crash and this is why autologin fail)
I disabled gdm service at all and started gnome-session from tty by xinit script. 

dmesg output for intel only system:

[   32.537198] [drm] stuck on render ring
[   32.537922] [drm] GPU HANG: ecode 7:0:0x85dfbff9, in gnome-shell [562], reason: Engine(s) hung, action: reset
[   32.537923] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   32.537923] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   32.537924] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   32.537924] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   32.537925] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   32.539993] drm/i915: Resetting chip after gpu hang
[   42.570515] [drm] stuck on render ring
[   42.571591] [drm] GPU HANG: ecode 7:0:0x86d2bff9, in gnome-shell [562], reason: Engine(s) hung, action: reset
[   42.573774] drm/i915: Resetting chip after gpu hang
[   52.570434] [drm] stuck on render ring
[   52.571521] [drm] GPU HANG: ecode 7:0:0x85dfbff9, in Xorg [517], reason: Engine(s) hung, action: reset
[   52.573678] drm/i915: Resetting chip after gpu hang

I found funny thing, when I run&quit weston session before startx everything looks normal, no hang, no crash. 

If new crash dump is needed just tell me.
Comment 5 Arek Ruśniak 2016-07-10 12:42:24 UTC
Created attachment 124988 [details]
Xorg.0.log

Ah i forgot, this is only true if I use intel ddx, for generic Xorg driver(modesseting) everything works good as always.
Comment 6 Topi Pohjolainen 2016-07-11 08:21:09 UTC
I tried HSW woth Ubuntu 16.04. I disabled lightdm, kicked off Xorg manually from commnand line:

sudo -E /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch

And then simple "gnome-session" from command line also. Looks fine, no hang. Can you try if that works for you, and if I need to do something differently?


Using older kernel than you (4.5.0-rc1+). In xorg.log I have:

[   190.200] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20160124
[   190.200] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1 (Timo Aaltonen <tjaalton@debian.org>)
Comment 7 Arek Ruśniak 2016-07-11 09:37:28 UTC
Topi thanks for reply. I'll install Ubuntu and try it. Maybe it's arch issue,  will see.
Comment 8 Arek Ruśniak 2016-07-12 10:29:35 UTC
Created attachment 125021 [details]
dmesg from ubuntu

I disable display-manager.service & run ubuntu with kernel line: "root=... ro quiet" only

from tty1
"sudo -E /usr/bin/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch"

from tty2
"DISPLAY=:0 gnome-session-classic" or "DISPLAY=:0 gnome-shell" (gnome-session from tty don't start gnome-shell for me and I don't have time to figure out why) 

As I wrote when I run weston at first and then gnome-shell gpu doesn't hang, the same is for Unity!
Comment 9 Arek Ruśniak 2016-07-12 10:31:44 UTC
Created attachment 125022 [details]
dmesg from ubuntu v2
Comment 10 Arek Ruśniak 2016-07-12 11:38:57 UTC
some system info:
clean ubuntu + gnome3 + oibaf ppa
kernel - 4.4.0-28-generic 
xorg - 1.18.3 
mesa - 12.1~git1607120730.e30069~gd~x 
ddx - 2:2.99.917+git1607041932.26f8ab~gd~x
Comment 11 Topi Pohjolainen 2016-07-12 15:03:51 UTC
I tried that but still couldn't get a hang. Your machine is a laptop I take? I tried with a desktop machine.
Comment 12 Aaron Watry 2016-07-12 15:07:50 UTC
I just spent last night bisecting this same issue on my machine (haswell mobile), but I found the following results:

75e095: good (commit before cc2d0e6, no artifacts)
cc2d0e6: good-ish (no hangs, but some artifacts that weren't observed previously when at the login screen)
39fdee6: bad (First bad commit that hangs GPU, reverting fixes the hang)

cc2d0e6 doesn't hang my GDM login screen, but there's some odd visual artifacts that didn't appear before this commit.

In order to trigger the hang, I have to do a cold boot following the mesa build/install. A warm restart of gdm doesn't seem to trigger the issue.

Note: 39fdee6 can be cleanly reverted, and after doing that my hangs go away, but the visual artifacts caused by cc2d0e6 remain.

We may want to mark 96899 (which I just opened a few minutes ago) as a duplicate of this...
Comment 13 Aaron Watry 2016-07-12 15:13:36 UTC
*** Bug 96899 has been marked as a duplicate of this bug. ***
Comment 14 Topi Pohjolainen 2016-07-12 15:17:03 UTC
I was just about to comment that I got the hang afterall, but I actually have it with cc2d0e6.
Comment 15 Arek Ruśniak 2016-07-12 15:21:10 UTC
I have desktop cpu:
Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz

I've just wanted to see how PRIME works today, and here I am :)
Comment 16 Topi Pohjolainen 2016-07-12 19:07:40 UTC
It looks that it is enough to program the _3DSTATE_CONSTANT_XS packets. The CONSTANT_ALLOC_XS packets can be omitted (in the patch the call gen7_emit_push_constant_state()).
Discussing now with others how we want to fix this.
Comment 17 Topi Pohjolainen 2016-07-13 07:12:49 UTC
Created attachment 125045 [details] [review]
Proposed fix

The attached patch fixes the hang at least for me. Could you give it a spin as well?
Comment 18 Arek Ruśniak 2016-07-13 08:00:20 UTC
Yes, it works. Gnome starts without "gpu hang" msg now.
Close it immediately or wait for drop to the master?
Comment 19 Topi Pohjolainen 2016-07-13 08:02:49 UTC
Lets wait for master, cleaner that way. I suspect review won't take long.
Comment 20 Topi Pohjolainen 2016-07-13 12:15:37 UTC
Pushed:

commit 26778da5716b2f3ad1f2ca5881b4ed500306b035
Author: Topi Pohjolainen <topi.pohjolainen@intel.com>
Date:   Tue Jul 12 22:09:42 2016 +0300
Comment 21 Arek Ruśniak 2016-07-13 12:39:32 UTC
Thanks for help.
Comment 22 Aaron Watry 2016-07-13 16:16:41 UTC
Yup, fixes the issue for me as well. Thanks Topi!


bug/show.html.tmpl processed on Jan 16, 2017 at 21:42:43.
(provided by the Example extension).