Bugzilla – Bug 62141
[SNB] hang on gnome-shell start with Fedora 19/rawhide + rc6
Last modified: 2013-05-05 11:31:52 UTC
Okay wierd one, recently gnome shell started hard hanging the whole machine on boot on F19, and it wasn't due to a kernel/ddx or mesa update, so something in cogl/clutter/mutter/g-s is triggering something latent on Sandybridge.
disabling rc6 makes it go away.
8086:0126 is the PCI ID.
So I talked to ickle and he said he was seeing rc6 issues that didn't happen with Mesa 8.0, so I built 8.0 and installed it and things worked.
So I bisected 8.0 -> 9.0 (which also hung when I downgraded to it), and landed at
bce58e155db7202a98642c10e6132dee4e08162b intel: Convert to using private depth/stencil buffers (v2)
This seems quite consistent, I can go commit before and reboot lots, commit later and its all death. I assume this just means the timing changed with some flush now and it hits the hardware and it dies now.
The only way to reliably reproduce is to boot the machine and when gdm starts gnome-shell it hangs hard, nothing in netconsole or anywhere else I can find.
cc'ing the crap out of this.
I can confirm that commit triggers a hard lockup on the first glxgears frame after booting on this i5-2500 (snb gt1).
Just to check, have you tested this with v3.8 kernels already? Specifically
Author: Daniel Vetter <email@example.com>
Date: Fri Dec 14 23:38:29 2012 +0100
drm/i915: Implement WaSetupGtModeTdRowDispatch
That's the only gt1 snb thing I remember off-hand which might help.
Iirc there's more snb gt1 stuff for the userspace part of the driver in the w/a db, but would need to check (travelling atm).
Presuming ofc that it's a hw fail we now trigger thanks to slightly different code.
whats a 3.8 kernel? there is only one kernel to test things with Linus :-)
and yes it hangs with every kernel I threw at it, 3.8.x and 3.9-*
this marcheu hack also works around the issue for me here.
(In reply to comment #5)
> this marcheu hack also works around the issue for me here.
Adding my own +1 to the above patch fixing the problem for me as well.
I've no clue what it DOES, as this is will outside my domain of expertise, but I can tell you that when I build mesa 9.1.1 with the above referenced patch applied, nothing added to the boot options for disabling RC6, I no longer see this bug.
Been testing and using the setup for more than a week.
Here are the gross specs for my setup:
Boot: root=/dev/sda2 ro vga=current splash quiet BOOT_IMAGE=../vmlinuz
*** Bug 52424 has been marked as a duplicate of this bug. ***
Arch Linux: Our users report that activating SNA seems to be a workaround as well.
I can also confirm that bug goes away when SNA is enabled. On my driver it was enabled by default (using configure switch) up to some point, but after I rebuilt driver without default-accel switch, GDM hung the system on boot - I had to hard reset the machine.
I'm not 100% sure this is the same issue, but it might be as I have the exact same symptoms and pci id.
I opened Bug 63899 not know about this report.
I managed to get the i915_error_state and some kernel trace by limiting myself to just X and glxgears and waiting for a watchdog to unhang my system. That output is attached to the other bug report, not sure if it is of any assistance.
So... Daniel seems to think this is caused by blorp not setting something up quite correctly.
Daniel: Can you give Paul some details about this.
Paul: Can you look into this more?
This seems to be causing quite a lot of pain for anyone trying to update gnome shell, so this should be considered pretty high priority.
After all the debug I've done, we're pretty sure this is the same as 56416. There's no clear fix yet. The next step in userland is to try forcing a non-blorp blit as the first op on context creation (a color blorp blit was insufficient). But we're concerned that this problem might show up on resume/thaw, in which case an in-userland solution isn't good enough since we don't know when resume/thaw are.
*** This bug has been marked as a duplicate of bug 56416 ***