Bug 50271

Summary: [SNB]Piglit texturing/depthstencil-render-miplevels_292_s=z24_s8_d=z24 fails
Product: Mesa Reporter: lu hua <huax.lu>
Component: Drivers/DRI/i965Assignee: Paul Berry <stereotype441>
Status: VERIFIED FIXED QA Contact:
Severity: major    
Priority: low CC: ben, chris, daniel, eric, jbarnes, xunx.fang, yi.sun
Version: 8.0   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: i915_error_state
full——dmesg

Description lu hua 2012-05-23 02:30:17 UTC
System Environment:
--------------------------
Arch:             i386
Platform:         Sandybridge
Libdrm:          (master)2.4.31-10-gbe30d350b64c1a83473a9ffbedf8e2c680a65fcd
Mesa:            (8.0)fa68a8bae3961808288cfd84d5a7843f6fc0f317
Xserver:         (server-1.12-branch)xorg-server-1.12.1
Xf86_video_intel: (master)2.18.0-27-gaaed9e9722aa30a3d6dc9a3f07309655de65b6bd
Libva:           (master)bdbc9675fb2529b276bc3e8f720709e75beeae10
Libva_intel_driver:(master)506205da0b04024415d38a451a595f8075c49073
Kernel:          (drm-intel-fixes) 65e818660275ecda3702a4245f308923e3813a85

Bug detailed description:
-----------------------------
Under compositor,It happens on sandybridge with rm-intel-fixes kernel.It doesn't happens on drm-intel-next-queued kernel.
Under X mode, This case causes system hang.

Following cases also have this issue:
texturing_depthstencil-render-miplevels_146_d=z24_s=z24_s8
texturing_depthstencil-render-miplevels_146_d=z32f_s8_s=z24_s8
texturing_depthstencil-render-miplevels_146_d=z32f_s=z24_s8
texturing_depthstencil-render-miplevels_146_s=z24_s8_d=z24
texturing_depthstencil-render-miplevels_146_s=z24_s8_d=z24_s8
texturing_depthstencil-render-miplevels_146_s=z24_s8_d=z32f
texturing_depthstencil-render-miplevels_146_d=z24_s8_s=z24_s8
texturing_depthstencil-render-miplevels_292_ds=z24_s8
texturing_depthstencil-render-miplevels_292_ds=z32f_s8
texturing_depthstencil-render-miplevels_292_d=z24_s8_s=z24_s8
texturing_depthstencil-render-miplevels_292_d=z24_s=z24_s8
texturing_depthstencil-render-miplevels_292_d=z32f_s8_s=z24_s8
texturing_depthstencil-render-miplevels_292_d=z32f_s=z24_s8
texturing_depthstencil-render-miplevels_292_s=z24_s8_d=z24
texturing_depthstencil-render-miplevels_292_s=z24_s8_d=z24_s8
texturing_depthstencil-render-miplevels_292_s=z24_s8_d=z32f
texturing_depthstencil-render-miplevels_292_s=z24_s8_d=z32f_s8
texturing_depthstencil-render-miplevels_585_ds=z24_s8
texturing_depthstencil-render-miplevels_585_ds=z32f_s8
texturing_depthstencil-render-miplevels_585_d=z24_s8_s=z24_s8
texturing_depthstencil-render-miplevels_585_d=z24_s=z24_s8
texturing_depthstencil-render-miplevels_585_d=z32f_s8_s=z24_s8
texturing_depthstencil-render-miplevels_585_d=z32f_s=z24_s8
texturing_depthstencil-render-miplevels_585_s=z24_s8_d=z24
texturing_depthstencil-render-miplevels_585_s=z24_s8_d=z24_s8
texturing_depthstencil-render-miplevels_585_s=z24_s8_d=z32f
texturing_depthstencil-render-miplevels_585_s=z24_s8_d=z32f_s8

dmesg:
[  155.156387] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  155.156393] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[  155.172594] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[  161.206535] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  161.206909] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[  167.244559] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  167.245058] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[  208.427423] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  208.427732] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[  214.465528] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung


Reproduce steps:
----------------------------
1.    start  x
2.    start  gnome-session
3.    ./bin/depthstencil-render-miplevels 292 s=z24_s8_d=z24 -auto
Comment 1 Daniel Vetter 2012-05-23 02:36:52 UTC
Hm, the -fixes sha1 you're citing contains pretty much all of -next-queued, safe for a few patches that I've merged in the last few days. But these should not affect snb in any way. Can you please double-check your kernel versions, both the broken commit on -fixes and the working commit in -next-queued?
Comment 2 lu hua 2012-05-24 18:10:12 UTC
We used kernel version 3.4.0rc6.
This issue happens on drm-intel-fixes kernel and drm-intel-next-queued kernel.

dmesg on -queued kernel(commit:6f13b7b5be500178d5541b69ab911af2a77ec488):
[   73.313550] [drm:gen6_sanitize_pm] *ERROR* Power management discrepancy: GEN6_RP_INTERRUPT_LIMITS expected 16110000, was 16000000
[  101.724586] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  101.724591] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[  101.728110] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
[  107.752785] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  107.753324] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
[  113.772993] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  113.773508] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
[  117.388101] [drm:gen6_sanitize_pm] *ERROR* Power management discrepancy: GEN6_RP_INTERRUPT_LIMITS expected 16000000, was 12000000


Testing on drm-intel-fixes kernel(commit:c6a389f123b9f68d605bb7e0f9b32ec1e3e14132,version:3.1.0rc4), system hangs.
dmesg:
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm] capturing error event; look for more information in /debug/dri/0/i915_erro                                                                                                 r_state
[drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 252 at 2                                                                                                 50, next 254)
Comment 3 Chris Wilson 2012-05-25 00:12:00 UTC
You've mentioned GPU hang a couple of times now... Are you going to share the i915_error_state and full dmesg?
Comment 4 lu hua 2012-05-25 00:53:31 UTC
Created attachment 62089 [details]
i915_error_state
Comment 5 lu hua 2012-05-25 00:54:19 UTC
Created attachment 62090 [details]
full——dmesg
Comment 6 Daniel Vetter 2012-05-25 01:01:59 UTC
You're fixes tree is 3.1-rc4-ish. This is known to blow up totally all over the place. You should use -fixes from my drm-intel repository (the same where the -queued branch is). The current tip of that is

commit 89ba829e38bd500f438bc08af4229204c8ed7f35
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Tue May 22 09:30:33 2012 -0700

    drm/i915: always use RPNSWREQ for turbo change requests

Closing as already fixed (there's really not much difference between -fixes and -queued atm that would even remotely explain this).
Comment 7 Daniel Vetter 2012-05-25 01:42:19 UTC
Ok, reopening, the 3.1-rc4 was just to hunt for a known-good kernel. And despite what the original report says, it seeems to happen on both -fixes and -queued.

Iirc mesa guys recently merged some fixes and new piglit tests to check whether depth/stencil mimaps work correctly for separate stencil. Have you checked whether this is a mesa problem/regression?
Comment 8 Chris Wilson 2012-05-25 01:46:09 UTC
The error-state points towards HiZ as the unit that stalled. I'd focus on the chicken bits patches for a potential fox, but without a known-good kernel I'm probably yapping up the wrong tree.
Comment 9 Chris Wilson 2012-06-12 04:23:57 UTC
We're waiting on confirmation that this is a regression in the kernel or, by cross-checking, an issue with mesa/i965.
Comment 10 lu hua 2012-06-12 22:28:09 UTC
It is a mesa issue.
It happens on mesa 8.0 branch, It doesn't happen on mesa master branch.
Comment 11 Kenneth Graunke 2012-08-05 07:21:25 UTC
I suspect it's because the 8.0 branch is missing Paul's workarounds.

commit 714b4f6184db84a738cf2d063980f0e19ab03b4b
Author: Paul Berry <stereotype441@gmail.com>
Date:   Thu Apr 26 06:35:56 2012 -0700

    i965/Gen7: Work around GPU hangs due to misaligned depth coordinate offsets.
    
    In i965 Gen7, Mesa has for a long time used the "depth coordinate
    offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
    render to miplevels other than 0.  Unfortunately, this doesn't work,
    because these offsets must be aligned to multiples of 8, and miplevels
    in the depth buffer are only guaranteed to be aligned to multiples of
    4.  When the offsets aren't aligned to a multiple of 8, the GPU
    sometimes hangs.
    
    As a temporary measure, to avoid GPU hangs, this patch smashes the 3
    LSB's of "depth coordinate offset X/Y" to 0.  This results in
    incorrect rendering to mipmapped depth textures, but that seems like a
    reasonable stopgap while we figure out a better solution.
    
    Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
    texture sizes that are not powers of 2.
    
    Reviewed-by: Chad Verace <chad.versace@linux.intel.com>

as well as ~1 of that for the equivalent Gen6 fix.  Assigning to Paul to confirm and decide whether to backport the fixes to 8.0.  (I would just do it myself, but I think some infrastructure patches are necessary.)
Comment 12 Paul Berry 2012-08-05 14:43:37 UTC
(In reply to comment #11)
> I suspect it's because the 8.0 branch is missing Paul's workarounds.

(snip)

> Assigning to Paul to
> confirm and decide whether to backport the fixes to 8.0.  (I would just do it
> myself, but I think some infrastructure patches are necessary.)

Ken is correct.  This is a bug which probably has been around since Gen6/Gen7 support was initially added to Mesa.  The patches weren't backported to the 8.0 branch because to apply them without merge conflicts would require an infrastructure patch that probably is too invasive to cherry-pick to 8.0 (3ec0e55: i965: Fix mipmap offsets for HiZ and separate stencil buffers).

Fortunately the merge conflicts aren't very bad--I'll fix them up and send them to the Mesa-dev list for review.

Leaving at "high" priority, since IIRC this GPU hang affects Google Earth.
Comment 13 Paul Berry 2012-08-05 15:39:04 UTC
Patches sent to Mesa-dev list for review ("[PATCH 0/2] i965: Back-port GPU hang workarounds (bug 50271)").
Comment 14 Paul Berry 2012-08-13 20:38:14 UTC
The GPU hang is fixed in the 8.0 branch by the following commits:

commit 889cc4d9225084e15b9e8d010e30b31a87dbfd2d
Author: Paul Berry <stereotype441@gmail.com>
Date:   Thu Apr 26 06:35:56 2012 -0700

    i965/Gen7: Work around GPU hangs due to misaligned depth coordinate offsets.
    
    In i965 Gen7, Mesa has for a long time used the "depth coordinate
    offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
    render to miplevels other than 0.  Unfortunately, this doesn't work,
    because these offsets must be aligned to multiples of 8, and miplevels
    in the depth buffer are only guaranteed to be aligned to multiples of
    4.  When the offsets aren't aligned to a multiple of 8, the GPU
    sometimes hangs.
    
    As a temporary measure, to avoid GPU hangs, this patch smashes the 3
    LSB's of "depth coordinate offset X/Y" to 0.  This results in
    incorrect rendering to mipmapped depth textures, but that seems like a
    reasonable stopgap while we figure out a better solution.
    
    Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
    texture sizes that are not powers of 2.
    
    Reviewed-by: Chad Verace <chad.versace@linux.intel.com>
    
    Cherry-picked from 714b4f6184db84a738cf2d063980f0e19ab03b4b
    Conflicts:
    
        src/mesa/drivers/dri/i965/gen7_misc_state.c
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50271

commit 24db6d63dab7cc8d3e5d4482b95d58e46113ba44
Author: Paul Berry <stereotype441@gmail.com>
Date:   Thu Apr 26 06:35:56 2012 -0700

    i965/Gen6: Work around GPU hangs due to misaligned depth coordinate offsets.
    
    In i965 Gen6, Mesa has for a long time used the "depth coordinate
    offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
    render to miplevels other than 0.  Unfortunately, this doesn't work,
    because these offsets must be aligned to multiples of 8, and miplevels
    in the depth buffer are only guaranteed to be aligned to multiples of
    4.  When the offsets aren't aligned to a multiple of 8, the GPU
    sometimes hangs.
    
    As a temporary measure, to avoid GPU hangs, this patch smashes the 3
    LSB's of "depth coordinate offset X/Y" to 0.  This results in
    incorrect rendering to mipmapped depth textures, but that seems like a
    reasonable stopgap while we figure out a better solution.
    
    (Note that we have only ever observed this GPU hang on Gen6 when HiZ
    is enabled, so another possible stopgap would be to disable HiZ).
    
    Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
    texture sizes that are not powers of 2.
    
    Reviewed-by: Chad Verace <chad.versace@linux.intel.com>
    
    Cherry-picked from a683012a80a3408b3b71f22b2a97d9eaaac11a46
    Conflicts:
    
        src/mesa/drivers/dri/i965/brw_misc_state.c
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50271


I'm leaving the bug open, since the above commits do not produce correct rendering; they merely avoid a GPU hang.  So all of the hanging Piglit tests should now produce a FAIL result.  I'm changing the bug to low priority, since the incorrect rendering happens in a rare enough corner case that it's unlikely to every be a problem for any real-world apps.  If in the future we discover a real-world app (or a conformance test) that produces incorrect rendering due to this bug, we can always raise the priority again.
Comment 15 Paul Berry 2012-08-15 13:24:49 UTC
As mentioned above, working around the GPU hang results in incorrect rendering.  The following piglit test failures are now expected:

- fbo/fbo-depth-array
- spec/ARB_depth_buffer_float/fbo-clear-formats
- spec/ARB_depth_texture/fbo-clear-formats
- spec/EXT_packed_depth_stencil/fbo-clear-formats
- All texturing/depthstencil-render-miplevels tests

These tests have failed on the master branch for some time.  Now that the fix has been cherry-picked back to the 8.0 branch, they fail on 8.0 as well.

Eric Anholt has started investigating a technique to fix this bug for good, but it is not yet ready for use (branch "depthstencil" of git://people.freedesktop.org/~anholt/mesa).
Comment 16 Paul Berry 2012-08-15 13:25:35 UTC
*** Bug 53522 has been marked as a duplicate of this bug. ***
Comment 17 Eric Anholt 2012-11-18 19:30:47 UTC
The original hang is fixed, the remaining failure is a non-regression, and it can continue to be tracked by looking at piglit results.
Comment 18 lu hua 2012-11-20 03:08:52 UTC
Verified.Fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.