Bug 48508

Summary: [snb] GPU hangs when playing Red Alert 3
Product: Mesa Reporter: Nikita Tsukanov <keks9n>
Component: Drivers/DRI/i965Assignee: Kenneth Graunke <kenneth>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: medium CC: ben, chris, daniel, idr, jbarnes
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
i915 platform: i915 features:
Attachments: /debug/dri/0/i915_error_state

Description Nikita Tsukanov 2012-04-10 07:53:21 UTC
Created attachment 59734 [details]

It always hangs when I reach a certain point of the first mission of Yuriko's campaign.

Ubuntu 12.04
uname -r: 3.2.0-20-generic
Intel(R) Core(TM) i5-2430M

Console (MESA_DEBUG=1):
intel_do_flush_locked failed: Input/output error

[  392.316357] [drm] Changing LVDS panel from (-hsync, -vsync) to (-hsync, +vsync)
[  520.052825] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  520.052835] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[  520.063530] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 77202 at 77199, next 77203)
[  526.643310] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  526.643330] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 77239 at 77235, next 77240)
[  533.549912] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  533.549933] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 77346 at 77343, next 77347)
[  540.564560] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  540.564587] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 77407 at 77402, next 77408)
[  547.047000] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  547.047039] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 77429 at 77426, next 77430)
[  553.437409] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  553.437458] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 77439 at 77435, next 77440)
[  559.815814] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  559.815843] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 77445 at 77442, next 77446)
[  566.584785] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  566.584835] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 77471 at 77467, next 77472)

/sys/kernel/debug/dri/0/i915_error_state attached
Comment 1 Nikita Tsukanov 2012-04-12 05:24:16 UTC
It seems that GPU hungs while trying to render aura around healing points. Tried it with software renderer, passed first one and got another hung (of _software_ renderer) at the second one. It looks like an infinite cycle inside a shader or something like that.
Comment 2 Daniel Vetter 2012-04-13 05:47:41 UTC
SNB uses the i965 mesa driver.
Comment 3 Nikita Tsukanov 2012-04-23 04:54:47 UTC
It seems to be a regression since RA3 works like a charm with mesa 7.11.2 on the same system.
Comment 4 Kenneth Graunke 2012-05-04 08:58:50 UTC

Can you try it with the following environment variables set?

export hiz=false
export INTEL_HIZ=0

Does that fix the problem?
Comment 5 Nikita Tsukanov 2012-05-04 09:16:54 UTC
INTEL_SEPARATE_STENCIL=0 fixes the problem even without INTEL_HIZ and hiz
Comment 6 Kenneth Graunke 2012-05-05 02:02:41 UTC
Thanks! Based on that, I believe this should be fixed by the following kernel commit (available in linus/master or drm-intel-fixes...it should be landing in 3.3 stable soon):

commit 3a69ddd6f872180b6f61fda87152b37202118fbc
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Fri Apr 27 12:44:41 2012 -0700

    drm/i915: Set the Stencil Cache eviction policy to non-LRA mode.

    Clearing bit 5 of CACHE_MODE_0 is necessary to prevent GPU hangs in
    OpenGL programs such as Google MapsGL, Google Earth, and gzdoom when
    using separate stencil buffers.  Without it, the GPU tries to use the
    LRA eviction policy, which isn't supported.  This was supposed to be off
    by default, but seems to be on for many machines.

    This cannot be done in gen6_init_clock_gating with most of the other
    workaround bits; the render ring needs to exist.  Otherwise, the
    register write gets dropped on the floor (one printk will show it
    changed, but a second printk immediately following shows the value
    reverts to the old one).

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47535
    Cc: stable@vger.kernel.org
    Cc: Rob Castle <futuredub@gmail.com>
    Cc: Eric Appleman <erappleman@gmail.com>
    Cc: aaron667@gmx.net
    Cc: Keith Packard <keithp@keithp.com>
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Signed-off-by: Dave Airlie <airlied@redhat.com>

Alternatively, if upgrading kernels is inconvenient, you can also verify with:

$ sudo intel_reg_write 0x2120 0x1206800

(The intel_reg_write utility comes from the intel-gpu-tools package).  This does the same thing as the kernel patch, but isn't automatic.

If either of those don't fix it, feel free to reopen the bug and I'll take a more detailed look.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.