Created attachment 124984 [details]
gpu crash dump
hi, dmesg shows me this:
[ 28.353961] [drm] stuck on render ring
[ 28.354632] [drm] GPU HANG: ecode 7:0:0x85dfbff9, in gnome-shell , reason: Engine(s) hung, action: reset
[ 28.354633] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 28.354634] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 28.354635] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 28.354636] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 28.354637] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 28.355972] drm/i915: Resetting chip after gpu hang
It's only appear when i try autologin into gnome-session (or budgie.desktop) right after boot (DM is started automaticaly by systemd).When i start gdm.service or sddm.servece manually from tty, autologin works.
When I've tried autologin to another WM like fluxbox or xfce4 it works normally without hang.
After this hang, gdm (or sddm) starts again and i can login to gnome.session and everything works fine. I've found no other trigger to make this happens again.
And it is all happen after this:
cc2d0e64c0b10884bc12d80018b622911e8b152f is the first bad commit
Author: Topi Pohjolainen <email@example.com>
Date: Fri May 20 11:15:35 2016 +0300
i965/blorp/gen7+: Stop trashing push constant allocation
Packet 3DSTATE_CONSTANT_PS is still emitted explicitly as ps stage
itself is enabled and hardware may try to prefetch constants from
the buffer. From the BSpec: 3D Pipeline - Windower -
"Specifies the size of the PS constant buffer. This value will
determine the amount of data the command stream can pre-fetch
before the buffer is full."
This is not possible on gen6. From the BSpec about 3DSTATE_CONSTANT_PS:
"This packet must be followed by WM_STATE."
Binding table emissions for stages other than PS can be now dropped,
they were only needed for the 3DSTATE_CONSTANT_XS to be effective:
From the BSpec:
"The 3DSTATE_CONSTANT_* command is not committed to the shader unit
until the corresponding (same shader) 3DSTATE_BINDING_TABLE_POINTER_*
command is parsed."
Signed-off-by: Topi Pohjolainen <firstname.lastname@example.org>
Reviewed-by: Jason Ekstrand <email@example.com>
Created attachment 124985 [details]
Thanks so much for bisecting this, it's very helpful!
Topi, can you take a look?
I was wrong autologin is not the trigger, the problem is gnome-shell only i belive.
I've removed radeon card (radeonsi module causes X crash and this is why autologin fail)
I disabled gdm service at all and started gnome-session from tty by xinit script.
dmesg output for intel only system:
[ 32.537198] [drm] stuck on render ring
[ 32.537922] [drm] GPU HANG: ecode 7:0:0x85dfbff9, in gnome-shell , reason: Engine(s) hung, action: reset
[ 32.537923] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 32.537923] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 32.537924] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 32.537924] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 32.537925] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 32.539993] drm/i915: Resetting chip after gpu hang
[ 42.570515] [drm] stuck on render ring
[ 42.571591] [drm] GPU HANG: ecode 7:0:0x86d2bff9, in gnome-shell , reason: Engine(s) hung, action: reset
[ 42.573774] drm/i915: Resetting chip after gpu hang
[ 52.570434] [drm] stuck on render ring
[ 52.571521] [drm] GPU HANG: ecode 7:0:0x85dfbff9, in Xorg , reason: Engine(s) hung, action: reset
[ 52.573678] drm/i915: Resetting chip after gpu hang
I found funny thing, when I run&quit weston session before startx everything looks normal, no hang, no crash.
If new crash dump is needed just tell me.
Created attachment 124988 [details]
Ah i forgot, this is only true if I use intel ddx, for generic Xorg driver(modesseting) everything works good as always.
I tried HSW woth Ubuntu 16.04. I disabled lightdm, kicked off Xorg manually from commnand line:
sudo -E /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
And then simple "gnome-session" from command line also. Looks fine, no hang. Can you try if that works for you, and if I need to do something differently?
Using older kernel than you (4.5.0-rc1+). In xorg.log I have:
[ 190.200] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20160124
[ 190.200] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1 (Timo Aaltonen <firstname.lastname@example.org>)
Topi thanks for reply. I'll install Ubuntu and try it. Maybe it's arch issue, will see.
Created attachment 125021 [details]
dmesg from ubuntu
I disable display-manager.service & run ubuntu with kernel line: "root=... ro quiet" only
"sudo -E /usr/bin/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch"
"DISPLAY=:0 gnome-session-classic" or "DISPLAY=:0 gnome-shell" (gnome-session from tty don't start gnome-shell for me and I don't have time to figure out why)
As I wrote when I run weston at first and then gnome-shell gpu doesn't hang, the same is for Unity!
Created attachment 125022 [details]
dmesg from ubuntu v2
some system info:
clean ubuntu + gnome3 + oibaf ppa
kernel - 4.4.0-28-generic
xorg - 1.18.3
mesa - 12.1~git1607120730.e30069~gd~x
ddx - 2:2.99.917+git1607041932.26f8ab~gd~x
I tried that but still couldn't get a hang. Your machine is a laptop I take? I tried with a desktop machine.
I just spent last night bisecting this same issue on my machine (haswell mobile), but I found the following results:
75e095: good (commit before cc2d0e6, no artifacts)
cc2d0e6: good-ish (no hangs, but some artifacts that weren't observed previously when at the login screen)
39fdee6: bad (First bad commit that hangs GPU, reverting fixes the hang)
cc2d0e6 doesn't hang my GDM login screen, but there's some odd visual artifacts that didn't appear before this commit.
In order to trigger the hang, I have to do a cold boot following the mesa build/install. A warm restart of gdm doesn't seem to trigger the issue.
Note: 39fdee6 can be cleanly reverted, and after doing that my hangs go away, but the visual artifacts caused by cc2d0e6 remain.
We may want to mark 96899 (which I just opened a few minutes ago) as a duplicate of this...
*** Bug 96899 has been marked as a duplicate of this bug. ***
I was just about to comment that I got the hang afterall, but I actually have it with cc2d0e6.
I have desktop cpu:
Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
I've just wanted to see how PRIME works today, and here I am :)
It looks that it is enough to program the _3DSTATE_CONSTANT_XS packets. The CONSTANT_ALLOC_XS packets can be omitted (in the patch the call gen7_emit_push_constant_state()).
Discussing now with others how we want to fix this.
Created attachment 125045 [details] [review]
The attached patch fixes the hang at least for me. Could you give it a spin as well?
Yes, it works. Gnome starts without "gpu hang" msg now.
Close it immediately or wait for drop to the master?
Lets wait for master, cleaner that way. I suspect review won't take long.
Author: Topi Pohjolainen <email@example.com>
Date: Tue Jul 12 22:09:42 2016 +0300
Thanks for help.
Yup, fixes the issue for me as well. Thanks Topi!