96878 – [Bisected: cc2d0e6][HSW] "GPU HANG" msg after autologin to gnome-session

Bug 96878 - [Bisected: cc2d0e6][HSW] "GPU HANG" msg after autologin to gnome-session

Summary: [Bisected: cc2d0e6][HSW] "GPU HANG" msg after autologin to gnome-session

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Topi Pohjolainen
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Duplicates (1):	96899 (view as bug list)
Depends on:
Blocks:

Reported:	2016-07-10 09:58 UTC by Arek Ruśniak
Modified:	2016-07-13 16:16 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
gpu crash dump (3.06 MB, text/plain) 2016-07-10 09:58 UTC, Arek Ruśniak	Details
dmesg log (66.36 KB, text/plain) 2016-07-10 10:02 UTC, Arek Ruśniak	Details
Xorg.0.log (41.35 KB, text/x-log) 2016-07-10 12:42 UTC, Arek Ruśniak	Details
dmesg from ubuntu (73.03 KB, text/plain) 2016-07-12 10:29 UTC, Arek Ruśniak	Details
dmesg from ubuntu v2 (72.36 KB, text/plain) 2016-07-12 10:31 UTC, Arek Ruśniak	Details
Proposed fix (4.83 KB, patch) 2016-07-13 07:12 UTC, Topi Pohjolainen	Details \| Splinter Review
View All

Description Arek Ruśniak 2016-07-10 09:58:39 UTC

Created attachment 124984 [details]
gpu crash dump

hi, dmesg shows me this:
[   28.353961] [drm] stuck on render ring
[   28.354632] [drm] GPU HANG: ecode 7:0:0x85dfbff9, in gnome-shell [597], reason: Engine(s) hung, action: reset
[   28.354633] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   28.354634] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   28.354635] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   28.354636] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   28.354637] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   28.355972] drm/i915: Resetting chip after gpu hang

It's only appear when i try autologin into gnome-session (or budgie.desktop) right after boot (DM is started automaticaly by systemd).When i start gdm.service or sddm.servece manually from tty, autologin works. 
When I've tried autologin to another WM like fluxbox or xfce4 it works normally  without hang.

After this hang, gdm (or sddm) starts again and i can login to gnome.session and everything works fine. I've found no other trigger to make this happens again.

Comment 1 Arek Ruśniak 2016-07-10 10:01:11 UTC

And it is all happen after this:

cc2d0e64c0b10884bc12d80018b622911e8b152f is the first bad commit
commit cc2d0e64c0b10884bc12d80018b622911e8b152f
Author: Topi Pohjolainen <topi.pohjolainen@intel.com>
Date:   Fri May 20 11:15:35 2016 +0300

    i965/blorp/gen7+: Stop trashing push constant allocation
    
    Packet 3DSTATE_CONSTANT_PS is still emitted explicitly as ps stage
    itself is enabled and hardware may try to prefetch constants from
    the buffer. From the BSpec: 3D Pipeline - Windower -
    3DSTATE_PUSH_CONSTANT_ALLOC_PS
    
      "Specifies the size of the PS constant buffer. This value will
       determine the amount of data the command stream can pre-fetch
       before the buffer is full."
    
    This is not possible on gen6. From the BSpec about 3DSTATE_CONSTANT_PS:
    
    "This packet must be followed by WM_STATE."
    
    Binding table emissions for stages other than PS can be now dropped,
    they were only needed for the 3DSTATE_CONSTANT_XS to be effective:
    
    From the BSpec:
    
      "The 3DSTATE_CONSTANT_* command is not committed to the shader unit
       until the corresponding (same shader) 3DSTATE_BINDING_TABLE_POINTER_*
       command is parsed."
    
    Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

Comment 2 Arek Ruśniak 2016-07-10 10:02:36 UTC

Created attachment 124985 [details]
dmesg log

Comment 3 Kenneth Graunke 2016-07-10 10:48:34 UTC

Thanks so much for bisecting this, it's very helpful!

Topi, can you take a look?

Comment 4 Arek Ruśniak 2016-07-10 12:29:25 UTC

I was wrong autologin is not the trigger, the problem is gnome-shell only i belive. 
I've removed radeon card (radeonsi module causes X crash and this is why autologin fail)
I disabled gdm service at all and started gnome-session from tty by xinit script. 

dmesg output for intel only system:

[   32.537198] [drm] stuck on render ring
[   32.537922] [drm] GPU HANG: ecode 7:0:0x85dfbff9, in gnome-shell [562], reason: Engine(s) hung, action: reset
[   32.537923] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   32.537923] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   32.537924] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   32.537924] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   32.537925] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   32.539993] drm/i915: Resetting chip after gpu hang
[   42.570515] [drm] stuck on render ring
[   42.571591] [drm] GPU HANG: ecode 7:0:0x86d2bff9, in gnome-shell [562], reason: Engine(s) hung, action: reset
[   42.573774] drm/i915: Resetting chip after gpu hang
[   52.570434] [drm] stuck on render ring
[   52.571521] [drm] GPU HANG: ecode 7:0:0x85dfbff9, in Xorg [517], reason: Engine(s) hung, action: reset
[   52.573678] drm/i915: Resetting chip after gpu hang

I found funny thing, when I run&quit weston session before startx everything looks normal, no hang, no crash. 

If new crash dump is needed just tell me.

Comment 5 Arek Ruśniak 2016-07-10 12:42:24 UTC

Created attachment 124988 [details]
Xorg.0.log

Ah i forgot, this is only true if I use intel ddx, for generic Xorg driver(modesseting) everything works good as always.

Comment 6 Topi Pohjolainen 2016-07-11 08:21:09 UTC

I tried HSW woth Ubuntu 16.04. I disabled lightdm, kicked off Xorg manually from commnand line:

sudo -E /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch

And then simple "gnome-session" from command line also. Looks fine, no hang. Can you try if that works for you, and if I need to do something differently?


Using older kernel than you (4.5.0-rc1+). In xorg.log I have:

[   190.200] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20160124
[   190.200] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1 (Timo Aaltonen <tjaalton@debian.org>)

Comment 7 Arek Ruśniak 2016-07-11 09:37:28 UTC

Topi thanks for reply. I'll install Ubuntu and try it. Maybe it's arch issue,  will see.

Comment 8 Arek Ruśniak 2016-07-12 10:29:35 UTC

Created attachment 125021 [details]
dmesg from ubuntu

I disable display-manager.service & run ubuntu with kernel line: "root=... ro quiet" only

from tty1
"sudo -E /usr/bin/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch"

from tty2
"DISPLAY=:0 gnome-session-classic" or "DISPLAY=:0 gnome-shell" (gnome-session from tty don't start gnome-shell for me and I don't have time to figure out why) 

As I wrote when I run weston at first and then gnome-shell gpu doesn't hang, the same is for Unity!

Comment 9 Arek Ruśniak 2016-07-12 10:31:44 UTC

Created attachment 125022 [details]
dmesg from ubuntu v2

Comment 10 Arek Ruśniak 2016-07-12 11:38:57 UTC

some system info:
clean ubuntu + gnome3 + oibaf ppa
kernel - 4.4.0-28-generic 
xorg - 1.18.3 
mesa - 12.1~git1607120730.e30069~gd~x 
ddx - 2:2.99.917+git1607041932.26f8ab~gd~x

Comment 11 Topi Pohjolainen 2016-07-12 15:03:51 UTC

I tried that but still couldn't get a hang. Your machine is a laptop I take? I tried with a desktop machine.

Comment 12 Aaron Watry 2016-07-12 15:07:50 UTC

I just spent last night bisecting this same issue on my machine (haswell mobile), but I found the following results:

75e095: good (commit before cc2d0e6, no artifacts)
cc2d0e6: good-ish (no hangs, but some artifacts that weren't observed previously when at the login screen)
39fdee6: bad (First bad commit that hangs GPU, reverting fixes the hang)

cc2d0e6 doesn't hang my GDM login screen, but there's some odd visual artifacts that didn't appear before this commit.

In order to trigger the hang, I have to do a cold boot following the mesa build/install. A warm restart of gdm doesn't seem to trigger the issue.

Note: 39fdee6 can be cleanly reverted, and after doing that my hangs go away, but the visual artifacts caused by cc2d0e6 remain.

We may want to mark 96899 (which I just opened a few minutes ago) as a duplicate of this...

Comment 13 Aaron Watry 2016-07-12 15:13:36 UTC

*** Bug 96899 has been marked as a duplicate of this bug. ***

Comment 14 Topi Pohjolainen 2016-07-12 15:17:03 UTC

I was just about to comment that I got the hang afterall, but I actually have it with cc2d0e6.

Comment 15 Arek Ruśniak 2016-07-12 15:21:10 UTC

I have desktop cpu:
Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz

I've just wanted to see how PRIME works today, and here I am :)

Comment 16 Topi Pohjolainen 2016-07-12 19:07:40 UTC

It looks that it is enough to program the _3DSTATE_CONSTANT_XS packets. The CONSTANT_ALLOC_XS packets can be omitted (in the patch the call gen7_emit_push_constant_state()).
Discussing now with others how we want to fix this.

Comment 17 Topi Pohjolainen 2016-07-13 07:12:49 UTC

Created attachment 125045 [details] [review]
Proposed fix

The attached patch fixes the hang at least for me. Could you give it a spin as well?

Comment 18 Arek Ruśniak 2016-07-13 08:00:20 UTC

Yes, it works. Gnome starts without "gpu hang" msg now.
Close it immediately or wait for drop to the master?

Comment 19 Topi Pohjolainen 2016-07-13 08:02:49 UTC

Lets wait for master, cleaner that way. I suspect review won't take long.

Comment 20 Topi Pohjolainen 2016-07-13 12:15:37 UTC

Pushed:

commit 26778da5716b2f3ad1f2ca5881b4ed500306b035
Author: Topi Pohjolainen <topi.pohjolainen@intel.com>
Date:   Tue Jul 12 22:09:42 2016 +0300

Comment 21 Arek Ruśniak 2016-07-13 12:39:32 UTC

Thanks for help.

Comment 22 Aaron Watry 2016-07-13 16:16:41 UTC

Yup, fixes the issue for me as well. Thanks Topi!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.