Bug 62141

Summary: [SNB] hang on gnome-shell start with Fedora 19/rawhide + rc6
Product: Mesa Reporter: Dave Airlie <airlied>
Component: Drivers/DRI/i965Assignee: Paul Berry <stereotype441>
Status: RESOLVED DUPLICATE QA Contact:
Severity: normal    
Priority: high CC: amiadb, austin.lund, ben, bjorn.lie, byron, cancan.feng, chris, chrisf, daniel, ealloc, eric, fan4326, garyvdm, huax.lu, idr, jan.steffens, kenneth, krejzi, pachoramos1, shane.bryan
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Dave Airlie 2013-03-11 05:08:44 UTC
Okay wierd one, recently gnome shell started hard hanging the whole machine on boot on F19, and it wasn't due to a kernel/ddx or mesa update, so something in cogl/clutter/mutter/g-s is triggering something latent on Sandybridge.

disabling rc6 makes it go away.

8086:0126 is the PCI ID.

So I talked to ickle and he said he was seeing rc6 issues that didn't happen with Mesa 8.0, so I built 8.0 and installed it and things worked.

So I bisected 8.0 -> 9.0 (which also hung when I downgraded to it), and landed at
bce58e155db7202a98642c10e6132dee4e08162b intel: Convert to using private depth/stencil buffers (v2)

This seems quite consistent, I can go commit before and reboot lots, commit later and its all death. I assume this just means the timing changed with some flush now and it hits the hardware and it dies now.

The only way to reliably reproduce is to boot the machine and when gdm starts gnome-shell it hangs hard, nothing in netconsole or anywhere else I can find.
Comment 1 Dave Airlie 2013-03-11 05:13:25 UTC
cc'ing the crap out of this.
Comment 2 Chris Wilson 2013-03-11 09:18:56 UTC
I can confirm that commit triggers a hard lockup on the first glxgears frame after booting on this i5-2500 (snb gt1).
Comment 3 Daniel Vetter 2013-03-11 17:46:17 UTC
Just to check, have you tested this with v3.8 kernels already? Specifically

commit 6547fbdbfff62c99e4f7b4f985ff8b3454f33b0f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Dec 14 23:38:29 2012 +0100

    drm/i915: Implement WaSetupGtModeTdRowDispatch
 

That's the only gt1 snb thing I remember off-hand which might help.

Iirc there's more snb gt1 stuff for the userspace part of the driver in the w/a db, but would need to check (travelling atm).

Presuming ofc that it's a hw fail we now trigger thanks to slightly different code.
Comment 4 Dave Airlie 2013-03-11 20:05:28 UTC
whats a 3.8 kernel? there is only one kernel to test things with Linus :-)

and yes it hangs with every kernel I threw at it, 3.8.x and 3.9-*
Comment 5 Dave Airlie 2013-03-19 00:29:09 UTC
http://git.chromium.org/gitweb/?p=chromiumos/overlays/chromiumos-overlay.git;a=blob;f=media-libs/mesa/files/9.0-i965-Make-sure-we-do-render-bet
ween-two-hiz-flushes.patch;h=590b13b4c3da13be6ad8a8115576e6320cef7825;hb=HEAD

this marcheu hack also works around the issue for me here.
Comment 6 Shane Bryan 2013-04-08 16:44:06 UTC
(In reply to comment #5)
> http://git.chromium.org/gitweb/?p=chromiumos/overlays/chromiumos-overlay.git;
> a=blob;f=media-libs/mesa/files/9.0-i965-Make-sure-we-do-render-bet
> ween-two-hiz-flushes.patch;h=590b13b4c3da13be6ad8a8115576e6320cef7825;hb=HEAD
> 
> this marcheu hack also works around the issue for me here.

Adding my own +1 to the above patch fixing the problem for me as well.

I've no clue what it DOES, as this is will outside my domain of expertise, but I can tell you that when I build mesa 9.1.1 with the above referenced patch applied, nothing added to the boot options for disabling RC6, I no longer see this bug.

Been testing and using the setup for more than a week.

Here are the gross specs for my setup:

Kernel: 3.8.2
GPU:    8086:0106
Rev:    09
Driver: i915
Boot:   root=/dev/sda2 ro vga=current  splash quiet BOOT_IMAGE=../vmlinuz
Mesa:   9.1.1
Comment 7 Chris Wilson 2013-04-17 18:02:00 UTC
*** Bug 52424 has been marked as a duplicate of this bug. ***
Comment 8 Jan Alexander Steffens (heftig) 2013-04-23 14:12:43 UTC
Arch Linux: Our users report that activating SNA seems to be a workaround as well.
Comment 9 Armin K 2013-04-24 09:07:27 UTC
I can also confirm that bug goes away when SNA is enabled. On my driver it was enabled by default (using configure switch) up to some point, but after I rebuilt driver without default-accel switch, GDM hung the system on boot - I had to hard reset the machine.
Comment 10 Austin Lund 2013-04-29 00:38:48 UTC
I'm not 100% sure this is the same issue, but it might be as I have the exact same symptoms and pci id.

I opened Bug 63899 not know about this report.

I managed to get the i915_error_state and some kernel trace by limiting myself to just X and glxgears and waiting for a watchdog to unhang my system.  That output is attached to the other bug report, not sure if it is of any assistance.
Comment 11 Ian Romanick 2013-04-29 21:21:48 UTC
So... Daniel seems to think this is caused by blorp not setting something up quite correctly.

Daniel: Can you give Paul some details about this.

Paul: Can you look into this more?

This seems to be causing quite a lot of pain for anyone trying to update gnome shell, so this should be considered pretty high priority.
Comment 12 Eric Anholt 2013-04-29 21:54:20 UTC
After all the debug I've done, we're pretty sure this is the same as 56416.  There's no clear fix yet.  The next step in userland is to try forcing a non-blorp blit as the first op on context creation (a color blorp blit was insufficient).  But we're concerned that this problem might show up on resume/thaw, in which case an in-userland solution isn't good enough since we don't know when resume/thaw are.

*** This bug has been marked as a duplicate of bug 56416 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.