Bug 59700

Summary: [ILK/SNB/IVB Bisected]Oglc vertexshader(advanced.TestLightsTwoSided) causes GPU hung
Product: Mesa Reporter: lu hua <huax.lu>
Component: Drivers/DRI/i965Assignee: Kenneth Graunke <kenneth>
Status: VERIFIED FIXED QA Contact:
Severity: major    
Priority: high CC: idr, xunx.fang
Version: 9.0   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: i915_error_state

Description lu hua 2013-01-22 07:54:25 UTC
Created attachment 73432 [details]
i915_error_state

System Environment:
--------------------------
Arch:           x86_64
Platform:       Ivybridge
Libdrm:		(master)libdrm-2.4.41-3-g303ca37e722e68900cb7eb43ddbef8069b0c711b
Mesa:		(9.0)cd0e19a749951c0d7e88e3cce5cf71de54681d11
Xserver:	(server-1.13-branch)xorg-server-1.13.1.901
Xf86_video_intel:(master)2.20.19-9-g208ca91a31182e8ddad36e6a735c725362cbd071
Cairo:		(master)ed2fa6b16b03fccc3e21598cdb9157cbcebd1d37
Libva:		(master)9f4dedc4de014cc665c32dfbac1c017f9396b563
Libva_intel_driver:(master)88b21b9aba9be13e08109fe5d213973447f38558
Kernel:	(drm-intel-fixes) b514407547890686572606c9dfa4b7f832db9958

Bug detailed description:
-------------------------
It happens on ironlake sandybridge and ivybridge with mesa 9.0 branch. It works well on mesa master branch.
Following cases also have this issue:
vertexshader(advanced.TestLights)
vertexshader(advanced.TestMaterials)
glsl-arrayobject(operator.equal.structure.function)
glsl-bif-tex-proj(advanced.equality.sampler1D)

The last known good commit:3703e9920c2a4d1a022871624bd0d7bd16073867
The last known bad commit: cd0e19a749951c0d7e88e3cce5cf71de54681d11

dmesg:
[15428.286502] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15428.286505] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[15436.272070] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15444.273620] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15446.264019] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15446.264293] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[15446.264294] [drm:i915_reset] *ERROR* Failed to reset chip.
[15446.275206] oglconform[11012]: segfault at 230 ip 00007fc13a3d282b sp 00007fff43437640 error 4 in i965_dri.so[7fc13a375000+b6000]

Reproduce steps:
----------------
1. xinit
2. ./oglconform -z -suite all -v 2  -test vertexshader \
advanced.TestLightsTwoSided
Comment 1 lu hua 2013-01-23 03:52:13 UTC
Bisect shows: b22de71c1bc2530e139d75d934e203f4eee89f41 is the first bad commit.
commit b22de71c1bc2530e139d75d934e203f4eee89f41
Author:     Kenneth Graunke <kenneth@whitecape.org>
AuthorDate: Mon Oct 1 15:28:56 2012 -0700
Commit:     Andreas Boll <andreas.boll.dev@gmail.com>
CommitDate: Sun Jan 20 15:08:26 2013 +0100

    i965/vs: Implement register spilling.

    To validate this code, I ran piglit -t vs quick.tests with the "go spill
    everything" debugging code enabled.  There was only one regression:
    glsl-vs-unroll-explosion simply ran out of registers.  This should be
    fine in the real world, since no one actually spills every single
    register.

    NOTE: This is a candidate for the 9.0 branch. Even if it proves to have
    bugs, it's likely better than simply failing to compile.
Comment 2 Kenneth Graunke 2013-01-29 20:52:54 UTC
Was this in a release build?  I get an assertion failure here, not a GPU hang.  I suppose if you compile without assertions, it might GPU hang though...
Comment 3 lu hua 2013-01-30 02:30:08 UTC
It isn't in a release build

output:
CLI options echo:
oglconform -suite all -v 2 -test vertexshader advanced.TestLightsTwoSided

Window will be recreated 32 times.
  Window 0 will run 1 testcases on config with id 156.
  Window 1 will run 1 testcases on config with id 151.
  Window 2 will run 1 testcases on config with id 150.
  Window 3 will run 1 testcases on config with id 132.
  Window 4 will run 1 testcases on config with id 127.
  Window 5 will run 1 testcases on config with id 126.
  Window 6 will run 1 testcases on config with id 157.
  Window 7 will run 1 testcases on config with id 133.
  Window 8 will run 1 testcases on config with id 148.
  Window 9 will run 1 testcases on config with id 147.
  Window 10 will run 1 testcases on config with id 124.
  Window 11 will run 1 testcases on config with id 123.
  Window 12 will run 1 testcases on config with id 154.
  Window 13 will run 1 testcases on config with id 145.
  Window 14 will run 1 testcases on config with id 144.
  Window 15 will run 1 testcases on config with id 130.
  Window 16 will run 1 testcases on config with id 121.
  Window 17 will run 1 testcases on config with id 120.
  Window 18 will run 1 testcases on config with id 155.
  Window 19 will run 1 testcases on config with id 131.
  Window 20 will run 1 testcases on config with id 142.
  Window 21 will run 1 testcases on config with id 141.
  Window 22 will run 1 testcases on config with id 118.
  Window 23 will run 1 testcases on config with id 117.
  Window 24 will run 1 testcases on config with id 149.
  Window 25 will run 1 testcases on config with id 125.
  Window 26 will run 1 testcases on config with id 146.
  Window 27 will run 1 testcases on config with id 122.
  Window 28 will run 1 testcases on config with id 143.
  Window 29 will run 1 testcases on config with id 119.
  Window 30 will run 1 testcases on config with id 140.
  Window 31 will run 1 testcases on config with id 116.
Total of 32 testcases will be executed.


Setup Report.
    Verbose level = 2.
    Path inactive.

Visual Report for ID 156 (32 bits).
ID      |ACCELERA|DB      |REND_T  |SURF_T  |C_BUF_T |BUF_S   |RED_S   |
     156|       1|       1|      gl|  wipbpx|    rgba|      32|       8|

GREEN_S |BLUE_S  |ALPHA_S |DEPTH_S |STENC_S |ACCUM_S |SPL_BUF |SAMPLES |
       8|       8|       8|      24|       8|       0|       0|       0|

SRGB    |TEX_RGB |TEX_RGBA|CAVEAT  |SWAP    |M_PBUF_W|M_PBUF_H|M_PBUF_P
       0|       0|       0|    none|   undef|       0|       0|       0

OpenGL Report.
    Vendor - 'Intel Open Source Technology Center'
    Renderer - 'Mesa DRI Intel(R) Ironlake Desktop '
    Version - '2.1 Mesa 9.0.2 (git-0f687f8)'
    GLSL Version - '1.20'

>> Vertex Shader (vertexshader)  test:
--> 2.7 - advanced.TestLightsTwoSided subcase:
    File - /GFX/build/testsuite/Oglconform_31/oglconform_31/src/OGLconform/vertexshader.c, line - 578.
        Expected to get a color of (1.000000, 1.000000, 0.389013, 1.000000)
        but found (0.600000, 0.298039, 0.800000, 1.000000}
        Error: Vertex Shader Tests: Two Sided Lighting.
    File - /GFX/build/testsuite/Oglconform_31/oglconform_31/src/OGLconform/vertexshader.c, line - 578.
        Expected to get a color of (0.804806, 1.000000, 1.000000, 1.000000)
        but found (0.600000, 0.298039, 0.800000, 1.000000}
        Error: Vertex Shader Tests: Two Sided Lighting.
intel_do_flush_locked failed: Input/output error

dmesg:
[   58.169943] [drm:i915_driver_open],
[   66.784448] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[   66.784455] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[   66.787494] [drm:i915_error_work_func], resetting chip
[   66.787607] [drm:init_ring_common], render ring head not reset to zero ctl 00000000 head 00000734 tail 00000000 start 00003000
[   66.787619] [drm:init_ring_common] *ERROR* failed to set render ring head to zero ctl 00000000 head 00000734 tail 00000000 start 00003000
[   66.839437] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00000734 tail 00000000 start 00003000
[   66.839451] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B
[   68.771598] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[   68.771700] [drm:i915_error_work_func], resetting chip
[   68.771777] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[   68.771782] [drm:i915_reset] *ERROR* Failed to reset chip.
Comment 4 Gordon Jin 2013-02-18 04:36:53 UTC
Do we still care 9.0 and want to keep this as high priority?
Comment 5 Ian Romanick 2013-02-20 00:09:51 UTC
The GPU hang should be fixed on 9.0 by the following commit.  At the very least, vertexshader advanced.TestLightsTwoSided fails, but I believe it failed before.  As long as the nothing hangs and no other tests regress, I'd call that good.  The patches required to make the tests pass are not suitable for the stable branch.

commit 4e35ffa762d763820b7defc14af564b2a02c61c8
Author: Eric Anholt <eric@anholt.net>
Date:   Wed Oct 3 10:03:22 2012 -0700

    i965/vs: Try again when we've successfully spilled a reg.
    
    Before, we'd spill one reg, then continue on without actually register
    allocating, then assertion fail when we tried to use a vgrf number as a
    register number.
    
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    (cherry picked from commit d4bcc6591812ebe72a363cf98371de5e5016f481)
    
    This should have been picked when 9237f0e was picked.
    
    Bugzill: https://bugs.freedesktop.org/show_bug.cgi?id=59700
Comment 6 lu hua 2013-02-26 02:43:23 UTC
Verified.Fixed on latest mesa 9.1 branch(commit: 5b19631f7cffc0fb48de9).

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.