Bug 35820 - [bisected SNB] System hangs when Gnome with compiz start up
[bisected SNB] System hangs when Gnome with compiz start up
Status: VERIFIED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965
git
All Linux (All)
: high critical
Assigned To: Kenneth Graunke
:
: 35853 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-30 22:58 UTC by fangxun
Modified: 2011-04-06 02:49 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description fangxun 2011-03-30 22:58:34 UTC
System Environment:
--------------------------
Arch:           i386
Platform:       SNB(Huronriver)
Libdrm: (master)2.4.24-9-g5cb554a0d6e986f2d7300a91d95983fa09b17f65
Mesa:   (master)5eb9f687087a4bc71775a32efcd848fc6cd67694
Xserver:(master)xorg-server-1.10.0-143-g327e1d88012102af6aca6c6840aa0ed3c7041a77
Xf86_video_intel:(master)2.14.902-2-g630d77bf10ba6234bb9c04538636f7d8aa319aea
Kernel: (drm-intel-next)f0c860246472248a534656d6cdbed5a36d1feb2e

Bug detailed description:
-------------------------
With latest Mesa(master) code, starting up gnome desktop with compiz make GPU hangs, and then system hangs(hard hang).
Error in output: 
(EE) intel(0): Detected a hung GPU, disabling acceleration.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: X (xorg_backtrace+0x3b) [0x80a600b]
1: X (mieqEnqueue+0x1ab) [0x809dd6b]
2: X (xf86PostMotionEventM+0xbf) [0x80ba3bf]
3: /opt/X11R7/lib/xorg/modules/input/evdev_drv.so (0xb7302000+0x444f) [0xb730644f]
4: /opt/X11R7/lib/xorg/modules/input/evdev_drv.so (0xb7302000+0x48b5) [0xb73068b5]
5: X (0x8048000+0x6f3cf) [0x80b73cf]
6: X (0x8048000+0x10d844) [0x8155844]
7: (vdso) (__kernel_sigreturn+0x0) [0xb7876400]
8: /opt/X11R7/lib/libpixman-1.so.0 (pixman_image_composite32+0x3c1) [0xb7814a71]
9: /opt/X11R7/lib/libpixman-1.so.0 (pixman_image_composite+0x8c) [0xb78151bc]
10: /opt/X11R7/lib/xorg/modules/libfb.so (fbComposite+0x1bb) [0xb76cb43b]
11: /opt/X11R7/lib/xorg/modules/drivers/intel_drv.so (0xb76de000+0x2e5eb) [0xb770c5eb]
12: /opt/X11R7/lib/xorg/modules/drivers/intel_drv.so (0xb76de000+0x2cca8) [0xb770aca8]
13: /opt/X11R7/lib/xorg/modules/drivers/intel_drv.so (0xb76de000+0x285d6) [0xb77065d6]
14: X (0x8048000+0xca589) [0x8112589]
15: X (CompositeGlyphs+0xb0) [0x810f100]
16: X (0x8048000+0xc253b) [0x810a53b]
17: X (0x8048000+0xbe2b3) [0x81062b3]
18: X (0x8048000+0x3e297) [0x8086297]
19: X (0x8048000+0x1a2ca) [0x80622ca]
20: /lib/libc.so.6 (__libc_start_main+0xe6) [0x48aa2cc6]
21: X (0x8048000+0x19e91) [0x8061e91]

Bisect shows 9a21bc640188e4078075b9f8e6701853a4f0bbe4 is the first bad commit
commit 9a21bc640188e4078075b9f8e6701853a4f0bbe4
Author:     Kenneth Graunke <kenneth@whitecape.org>
AuthorDate: Wed Mar 16 14:09:17 2011 -0700
Commit:     Kenneth Graunke <kenneth@whitecape.org>
CommitDate: Tue Mar 29 05:29:06 2011 -0700

    i965: Refactor Sandybridge implied move handling.

    This was open-coded in three different places, and more are necessary.
    Extract this into a function so it can be reused.

    Unfortunately, not all variations were the same: in particular, one set
    compression control and checked that the source register was not
    ARF_NULL.  This seemed like a good idea, so all cases now do so.


Bug reproduce steps:
---------------------------
1. start gnome-sesison with compiz enabled
Comment 1 Gordon Jin 2011-03-31 22:06:35 UTC
*** Bug 35853 has been marked as a duplicate of this bug. ***
Comment 2 Ian Romanick 2011-04-01 11:41:45 UTC
Ken:

Can you see if you can reproduce this?  It is bisected to one of your refactoring commits.

Fang:

Is this Huron River GT1 or GT2?  Is it reproducible on other SNB platforms or just that one?
Comment 3 fangxun 2011-04-01 20:18:26 UTC
It happens on all our SNB testing machines: Huron River(id=0x0116, rev 09, GT2), Huron River(id=0x0126, rev 08, GT2+), Sugarbay(id=0x0112, rev 09, GT2), Sugarbay(id=0x0102, rev 09, GT1). 

BTW, we have no Huron River GT1 for testing.
Comment 4 Kenneth Graunke 2011-04-02 17:49:53 UTC
I can reproduce this (checking with gnome shell at the moment; haven't tried compiz yet).  It looks like the wm_emit backend is generating obviously broken code now.  I should have a fix soon.
Comment 5 Kenneth Graunke 2011-04-02 18:27:43 UTC
I tested GNOME Shell and Compiz.  Both work with the following commit:

commit a019dd0d6e5bba00e8ee7818e004ee42ca507102
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sun Apr 3 00:57:30 2011 -0700

    i965: Fix null register use in Sandybridge implied move resolution.
    
    Fixes regressions caused by commit 9a21bc6401, namely GPU hangs when
    running gnome-shell or compiz (Mesa bugs #35820 and #35853).
    
    I incorrectly refactored the case that dealt with ARF_NULL; even in that
    case, the source register needs to be changed to the MRF.
    
    NOTE: This is a candidate for the 7.10 branch (if 9a21bc6401 is
    cherry-picked, take this one too).
Comment 6 fangxun 2011-04-06 02:49:20 UTC
Verified with Mesa master commit: 6caac3ecb8bc32d92c35fdb1f0a67541ffa8af29.