Bug 37348 - [SNB bisected] GPU hang when running 3DMMES/taiji on Huron River(0126 rev08)(with GPU semaphores enabled)
Summary: [SNB bisected] GPU hang when running 3DMMES/taiji on Huron River(0126 rev08...
Status: CLOSED WONTFIX
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium major
Assignee: Eric Anholt
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-19 02:33 UTC by meng
Modified: 2011-07-07 20:35 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg.0.log (59.90 KB, text/plain)
2011-05-19 02:33 UTC, meng
Details
i915_error_state.tar.bz2 (258.28 KB, text/plain)
2011-05-19 02:34 UTC, meng
Details

Description meng 2011-05-19 02:33:23 UTC
System Environment:
--------------------------
Platform:  Sugarbay(GT1)
Libdrm:    (master)2.4.25-1-g61be94018ae9c403517d53f69357719224fa6ff3
Mesa:      (master)3e0bb02358d627e784a2b7041d6e2e23e3dfd2c5
Xserver:	   
(master)xorg-server-1.10.0-376-g0e7f61d72c4a929319e57c9b5b777e9413c23051
Xf86_video_intel:
(master)2.15.0-17-g9d6e02a135efdea1d169d1938359ab2b553e941c
Cairo:      (master)4d96859ba5eb6018ae257ef6bfead228583908cf
Libva:      (master)88931373a169a30f73837fde2739fdf1042f4820
Kernel:(drm-intel-next) daab1470018f025e0b1c8731dfb825ff421ffd9b

Bug detailed description:
-------------------------
On desktop,GPU hang running 3DMMES/taijion on a Sugarbay(GT1) when its
resolution isn't Maximum.Pls see attached Xorg and i915_error_state..
Specifically:
1.It's Ok when running on the Maximum resolution.
2.It works fine on Sugarbay(GT2) and Huronriver.
3.It's Mesa regression.fa7a051c251552c4581caadce772a29c64f6a850 is the good commit.


Reproduce steps:
-------------------------
1.xinit&
2.gnome-session
2.LD_LIBRARY_PATH=lib:$LD_LIBRARY_PATH ../src/common/fm_oes_player/fm_oesplayer
(make sure:resolution less than the maximum resolution)
Comment 1 meng 2011-05-19 02:33:53 UTC
Created attachment 46892 [details]
Xorg.0.log
Comment 2 meng 2011-05-19 02:34:26 UTC
Created attachment 46893 [details]
i915_error_state.tar.bz2
Comment 3 meng 2011-05-19 02:45:07 UTC
When by bisected,there are only 'skipped' commits left to test.
The first bad commit could be any of:
5c742ea1ee0cea031cb99651155d0c7521f42b4e(skip)
855f56ca13c1003396a81da1a110357d624a2101(skip)
a82a43e8d99e1715dd11c9c091b5ab734079b6a6(bad commit)
I cannot bisect more.
Comment 4 Gordon Jin 2011-05-20 00:07:21 UTC
This is with semaphores on (by default in the kernel), and regression in mesa.

bug#37070 is to track the issue when semaphores off.
Comment 5 Gordon Jin 2011-05-20 00:18:12 UTC
(In reply to comment #4)
> This is with semaphores on (by default in the kernel), and regression in mesa.
> bug#37070 is to track the issue when semaphores off.

Sorry, it should be bug#37090.
Comment 6 Ian Romanick 2011-05-20 00:27:50 UTC
All of those potential bisects are Eric's, so I'm reassigning to him.

What is meant by "It's Ok when running on the Maximum resolution."?  Does this meaning running taiji in a window (vs full screen)?  Does this mean using xrandr to set a lower resolution (but still running full screen)?  Does this mean something else?
Comment 7 meng 2011-05-20 00:35:42 UTC
(In reply to comment #6)
This mean using xrandr to set a lower resolution in a window,not full screen.
Comment 8 Eric Anholt 2011-05-20 13:32:47 UTC
Are you reliably reproducing this?  I've seen intermittent GPU hangs (1 in 10-20 runs) in 3DMMES since I first started running it on Sandybridge.
Comment 9 meng 2011-05-21 00:06:50 UTC
(In reply to comment #8)
When running 3DMMES/taiji, soon intermittent GPU hangs, then the game is quitted forcedly, but game freeze still on the screen. In console, output:
(EE) intel(0): Detected a hung GPU, disabling acceleration.
(EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg.
It's easily reproduce on our Sugarbay(GT1).

System Environment:
--------------------------
Libdrm:  (master)2.4.25-1-g61be94018ae9c403517d53f69357719224fa6ff3
Xserver:	 (master)xorg-server-1.10.0-376-g0e7f61d72c4a929319e57c9b5b777e9413c23051
Xf86_video_intel: (master)2.15.0-17-g9d6e02a135efdea1d169d1938359ab2b553e941c
Cairo:  (master)bdfd860ae7a4e5fd7157748f90b0d8c6cc04e5ca
Libva:  (master)88931373a169a30f73837fde2739fdf1042f4820
Kernel:(drm-intel-next) daab1470018f025e0b1c8731dfb825ff421ffd9b
Comment 10 meng 2011-05-22 19:36:57 UTC
(In reply to comment #9)
> When running 3DMMES/taiji, soon intermittent GPU hangs, then the game is
> quitted forcedly, but game freeze still on the screen. 
Sorry, "soon intermittent GPU hangs" isn't correct statement, should be "soon screen stuttered". This bug reproduced every time when running the game.
Comment 11 meng 2011-05-22 19:42:11 UTC
Sorry, the problem isn't only Sugarbay(GT1),but also Huron River(0126 rev08).
Especially, system hung when running 3DMMES/taiji on Huron River(0126 rev08).
Comment 12 Gordon Jin 2011-05-22 19:47:08 UTC
Please explain on the bug: why are you removing GT1. 
You need make audience understand, and keep the content consistent.

> -----Original Message-----
> From: bugzilla-daemon@freedesktop.org
> [mailto:bugzilla-daemon@freedesktop.org]
> Sent: Monday, May 23, 2011 10:26 AM
> To: Jin, Gordon
> Subject: [Bug 37348] [SNB regression] GPU hang when running 3DMMES/taiji
> (with GPU semaphores enabled)
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=37348
> 
> meng <mengmeng.meng@intel.com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>             Summary|[SNB GT1 regression] GPU    |[SNB  regression]
> GPU hang
>                    |hang when running           |when running
> 3DMMES/taiji
>                    |3DMMES/taiji (with GPU      |(with GPU
> semaphores
>                    |semaphores enabled)         |enabled)
> 
> --
> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are watching the assignee of the bug.
> You are watching someone on the CC list of the bug.
> You are watching the reporter.
Comment 13 Gordon Jin 2011-05-22 19:47:10 UTC
OK, I see your further explanation now.

But, it's system hang? So might not be the same issue?

> -----Original Message-----
> From: bugzilla-daemon@freedesktop.org
> [mailto:bugzilla-daemon@freedesktop.org]
> Sent: Monday, May 23, 2011 10:42 AM
> To: Jin, Gordon
> Subject: [Bug 37348] [SNB regression] GPU hang when running 3DMMES/taiji
> (with GPU semaphores enabled)
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=37348
> 
> --- Comment #11 from meng <mengmeng.meng@intel.com> 2011-05-22
> 19:42:11 PDT ---
> Sorry, the problem isn't only Sugarbay(GT1),but also Huron River(0126
> rev08).
> Especially, system hung when running 3DMMES/taiji on Huron River(0126
> rev08).
> 
> --
> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are watching the assignee of the bug.
> You are watching someone on the CC list of the bug.
> You are watching the reporter.
Comment 14 meng 2011-05-22 20:18:35 UTC
(In reply to comment #13)
1.The same reproduce steps:
The problem exists when the lower resolution in a window .And they also work fine on Maximum resolution in full screen.
2.The same good commit and bad commit:
bd661a933b18fccd7102d05932774ee61a90ec9e bad
fa7a051c251552c4581caadce772a29c64f6a850 good
3.There may be " *ERROR* Hangcheck timer elapsed... GPU hung " in dmesg before system hung on Huron River(0126 rev08) ,which the same as Sugarbay(GT1)(system doesn’t hang).
Comment 15 Eric Anholt 2011-05-27 08:41:35 UTC
Please quantify "Soon".  Does that mean "on the first run of the application, every time, within a few seconds" or does it mean "After about n runs of the application, it happens".  Specifically numbers.

Also, why are rev08 machines still in the testing lab?  Those need to be removed -- we won't debug issues that appear on them.
Comment 16 meng 2011-05-27 22:39:36 UTC
(In reply to comment #15)
That mean "on the first run of the application,every time, within a few seconds".
Comment 17 Eric Anholt 2011-06-07 00:14:05 UTC
Please retest as of:

commit ef59049c5242a1be7fa59a182d342191185dd62b
Author: Eric Anholt <eric@anholt.net>
Date:   Sun Jun 5 23:20:57 2011 -0700

    i965: Fix flipped GT1 vs GT2 URB VS entry count limits.
Comment 18 meng 2011-06-08 21:51:54 UTC
(In reply to comment #17)
It can't reproduce on our Sugarbay(GT1) because of replacing the hard disk. 
So I test on Huron River(0126 rev08).The problem still exists with the commit ef59049c5.
By bisected on Huron River(0126 rev08),shows that a82a43e8d99e1715dd11c9c091b5ab734079b6a6 is the first bad commit.
Comment 19 meng 2011-06-08 21:55:48 UTC
a82a43e8d99e1715dd11c9c091b5ab734079b6a6 is the first bad commit
commit a82a43e8d99e1715dd11c9c091b5ab734079b6a6
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Apr 22 16:00:14 2011 -0700

    i965/gen6: Use the dynamic state base address to reduce relocations.

    Now that all the dynamic state is streamed through the top of the
    batchbuffer, we can cut out many of our relocations to that state by
    using the base address.

    Improves 3DMMES taiji performance 3.3% +/- 0.4% (n=15).

    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Comment 20 Eric Anholt 2011-06-21 13:20:03 UTC
Please retest as of :

commit f6e5230b2614cc91e4c849c07781b2230878d274
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Jun 17 18:44:26 2011 -0700

    i965/gen6: Apply documented workaround for nonpipelined state packets.
    
    Fixes a 100% reproducible GPU hang in topogun-1.06-orc-84k.trace.
    
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Comment 21 meng 2011-06-22 00:37:16 UTC
(In reply to comment #20)

Test with the commit f6e5230b26, the problem still exists.
Comment 22 Eric Anholt 2011-06-29 10:48:31 UTC
I haven't seen the reported behavior (hanging on every execution of 3DMMES) on any of my 3 SNB machines, all GT2.
Comment 23 meng 2011-06-29 17:33:32 UTC
The problem only existed on Sugarbay(GT1) and Huron River(0126 rev08) ago. But it can't reproduce on our Sugarbay(GT1) because of replacing the hard disk.
So I change the title.
Comment 24 Gordon Jin 2011-06-29 19:00:30 UTC
Eric, do you still want to investigate it? If so, maybe you can remote login.

Note we are going to move this rev08 machine to other team soon.
Comment 25 Eric Anholt 2011-07-07 15:24:37 UTC
On further review of the comments, it's all about rev08 hardware.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.