Bug 81203 - [BSW]Most of Piglit/ogles3conform/ogles2conform/ogles1conform/webglc cases cause GPU HANG
Summary: [BSW]Most of Piglit/ogles3conform/ogles2conform/ogles1conform/webglc cases ca...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: All Linux (All)
: high critical
Assignee: Ville Syrjala
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-11 03:30 UTC by lu hua
Modified: 2017-10-06 14:37 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (101.81 KB, text/plain)
2014-08-18 05:40 UTC, lu hua
no flags Details
output(piglit) (116.51 KB, text/plain)
2014-08-18 05:40 UTC, lu hua
no flags Details
i915_error_state(zip) (251.33 KB, application/octet-stream)
2014-08-18 05:42 UTC, lu hua
no flags Details

Description lu hua 2014-07-11 03:30:47 UTC
System Environment:
--------------------------
Platform: BSW
Libdrm:		(master)libdrm-2.4.54-17-ge8c3c1358ecaf4e90f7d43762357ae6f8e2022b6
Mesa:		(master)50bbe49c330095ba451d0f48c56759d148a609c2
Xserver:	(master)xorg-server-1.15.99.902-121-g2f5cf9ff9a0f713b7e038636484c77f113a5f10a
Xf86_video_intel:(master)2.99.912-227-g8587b2fff218537c6ff568ac3ef561f0d39f03ff
Libva:		(master)c61d8c6ce9ffc27320e9e177c1e1123d5f1b5014
Libva_intel_driver:(master)c5cb17ea86f0065a939d3636dd26651c93d497c8
Kernel: drm-intel-nightly/16025dad8e9964a5810385f755d43f1c48d6fdcc

Bug detailed description:
------------------------- 
Many Piglit cases report <3>[  146.267151] [drm:i915_reset] *ERROR* Failed to reset chip: -110.  It happens on -queued and -nightly kernel. Run it 20 cycles on -fixes kernel, it works well.

This error blocks other piglit cases.

[root@x-bsw01 piglit]# bin/shader_runner /home/GFX/Test/Piglit/piglit/generated_tests/spec/glsl-1.30/execution/built-in-functions/vs-op-lshift-ivec2-uvec2.shader_test -auto
PIGLIT: {"result": "pass" }
[root@x-bsw01 piglit]# bin/shader_runner /home/GFX/Test/Piglit/piglit/generated_tests/spec/glsl-1.30/execution/built-in-functions/vs-op-lshift-ivec2-uvec2.shader_test -auto
PIGLIT: {"result": "pass" }
[root@x-bsw01 piglit]# bin/shader_runner /home/GFX/Test/Piglit/piglit/generated_tests/spec/glsl-1.30/execution/built-in-functions/vs-op-lshift-ivec2-uvec2.shader_test -auto
PIGLIT: {"result": "pass" }
[root@x-bsw01 piglit]# bin/shader_runner /home/GFX/Test/Piglit/piglit/generated_tests/spec/glsl-1.30/execution/built-in-functions/vs-op-lshift-ivec2-uvec2.shader_test -auto
Probe color at (0,0)
  Expected: 0.000000 1.000000 0.000000 1.000000
  Observed: 0.000000 0.000000 0.000000 0.000000
Probe color at (1,0)
  Expected: 0.000000 1.000000 0.000000 1.000000
  Observed: 0.000000 0.000000 0.000000 0.000000
Probe color at (2,0)
  Expected: 0.000000 1.000000 0.000000 1.000000
  Observed: 0.000000 0.000000 0.000000 0.000000
intel_do_flush_locked failed: Input/output error
[root@x-bsw01 piglit]# bin/shader_runner /home/GFX/Test/Piglit/piglit/generated_tests/spec/glsl-1.30/execution/built-in-functions/vs-op-lshift-ivec2-uvec2.shader_test -auto
intel_do_flush_locked failed: Input/output error

Reproduce steps:
-------------------------
1. xinit
2. gnome-session
3. bin/shader_runner generated_tests/spec/glsl-1.30/execution/built-in-functions/vs-op-lshift-ivec2-uvec2.shader_test -auto
Comment 1 lu hua 2014-08-08 08:58:21 UTC
Test on drm-intel-nightly kernel(79e44bfa100) and latest Mesa master branch.
Run "bin/shader_runner generated_tests/spec/glsl-1.30/execution/built-in-functions/vs-op-lshift-ivec2-uvec2.shader_test -auto" 50 cycles, it doesn't have this issue. but run following cases still have GPU hang.

Output:
 bin/shader_runner /home/GFX/Test/Piglit/piglit/generated_tests/spec/glsl-1.30/execution/built-in-functions/vs-op-mod-ivec2-int.shader_test -auto
PIGLIT: {"result": "pass" }


Piglit
returncode: 0
result: pass
summary: Piglit/spec_glsl-1.30_execution_built-in-functions_vs-op-mod-ivec2-int    PASS

!
@test: Piglit/spec_glsl-1.30_execution_built-in-functions_vs-op-mod-ivec2-ivec2
info: @@@Returncode: 1


test case start at: Thu Dec 12 00:05:48 2013
test case end at:   Thu Dec 12 00:06:07 2013

Errors:
intel_do_flush_locked failed: Input/output error


Dmesg:<6>[  370.763992] [drm] stuck on render ring
<6>[  370.779238] [drm] GPU HANG: ecode 0:0xf3cffffe, in X [3741], reason: Ring hung, action: reset
<6>[  376.764291] [drm] stuck on render ring
<6>[  376.779584] [drm] GPU HANG: ecode 0:0xf3cffffe, in X [3741], reason: Ring hung, action: reset


Output:
 bin/shader_runner /home/GFX/Test/Piglit/piglit/generated_tests/spec/glsl-1.30/execution/built-in-functions/vs-op-mod-ivec2-ivec2.shader_test -auto
Probe color at (0,0)
  Expected: 0.000000 1.000000 0.000000 1.000000
  Observed: 0.000000 0.000000 0.000000 0.000000
Probe color at (1,0)
  Expected: 0.000000 1.000000 0.000000 1.000000
  Observed: 0.000000 0.000000 0.000000 0.000000
Probe color at (2,0)
  Expected: 0.000000 1.000000 0.000000 1.000000
  Observed: 0.000000 0.000000 0.000000 0.000000


Piglit
errors!
 intel_do_flush_locked failed: Input/output error
!
returncode: 1
result: fail
summary: Piglit/spec_glsl-1.30_execution_built-in-functions_vs-op-mod-ivec2-ivec2    FAIL
Comment 2 Jesse Barnes 2014-08-13 15:49:56 UTC
Oh I didn't see that -fixes worked but that -nightly failed.  Maybe this is already fixed by one of Ville's patches, or maybe the workaround change that's still pending.
Comment 3 lu hua 2014-08-15 08:14:41 UTC
Test on latest -nightly kernel and Mesa master branch, run all webglc cases, it doesn't happen.
I will double check it.
Comment 4 lu hua 2014-08-18 05:40:29 UTC
Created attachment 104786 [details]
dmesg

Run Piglit case on -nightly kernel(2b6e6b9c29dbd) and mesa master branch(f08d7b8fe1e6689beb), GPU hang still occurs.
Run bin/glean -o -v -v -v -t +blendFunc --quick

dmesg:
[   64.739865] [drm] stuck on render ring
[   64.755811] [drm] GPU HANG: ecode 0:0x85dffffb, in X [3768], reason: Ring hung, action: reset
[   64.755834] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   64.755840] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   64.755845] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   64.755850] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   64.755859] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   64.755980] [drm:i915_error_work_func] resetting chip
[   64.759545] [drm:init_status_page] render ring hws offset: 0x00013000
[   64.759578] [drm:init_status_page] bsd ring hws offset: 0x00037000
[   64.759596] [drm:init_status_page] blitter ring hws offset: 0x0005a000
[   64.759614] [drm:init_status_page] video enhancement ring hws offset: 0x0007d000
[   64.759693] [drm:i9xx_update_primary_plane] Writing base 00889000 00000000 0 0 7680
[   66.739909] [drm:cherryview_enable_rps] GT fifo had a previous error 1080000
[   66.739948] [drm:cherryview_enable_rps] PCBR offset : 0x7eef8001
[   66.741896] [drm:cherryview_enable_rps] GPLL enabled? yes
[   66.741902] [drm:cherryview_enable_rps] GPU status: 0x00203010
[   66.741907] [drm:cherryview_enable_rps] current GPU freq: 480 MHz (48)
[   66.741911] [drm:cherryview_enable_rps] setting GPU freq to 320 MHz (32)
[   66.741917] [drm:valleyview_set_rps] GPU freq request from 480 MHz (48) to 320 MHz (32)
[   70.752138] [drm] stuck on render ring
[   70.768104] [drm] GPU HANG: ecode 0:0x85dffffb, in X [3768], reason: Ring hung, action: reset
[   70.768282] [drm:i915_error_work_func] resetting chip
Comment 5 lu hua 2014-08-18 05:40:52 UTC
Created attachment 104787 [details]
output(piglit)
Comment 6 lu hua 2014-08-18 05:42:50 UTC
Created attachment 104788 [details]
i915_error_state(zip)
Comment 7 lu hua 2014-08-22 08:42:29 UTC
Modify the summary, as comment 4, "*ERROR* Failed to reset chip" goes away and only "GPU HANG" reports in dmesg.
Add Kenneth.
Comment 8 Ville Syrjala 2014-08-22 11:29:35 UTC
Can you re-test with this brach as it has a (hacky) patch to init workarounds earlier:

git://gitorious.org/vsyrjala/linux.git chv_stuff_9_small
Comment 9 lu hua 2014-08-25 05:38:07 UTC
(In reply to comment #8)
> Can you re-test with this brach as it has a (hacky) patch to init
> workarounds earlier:
> 
> git://gitorious.org/vsyrjala/linux.git chv_stuff_9_small

Test this branch, commit 41f899b16642b5a026241d994e02f6387c41758b
Run all ogles1conform, ogles2conform, ogles3conform and webglc cases, GPU hang doesn't occur.
I will run all piglit cases.
Comment 10 lu hua 2014-08-26 06:44:25 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Can you re-test with this brach as it has a (hacky) patch to init
> > workarounds earlier:
> > 
> > git://gitorious.org/vsyrjala/linux.git chv_stuff_9_small
> 
> Test this branch, commit 41f899b16642b5a026241d994e02f6387c41758b
> Run all ogles1conform, ogles2conform, ogles3conform and webglc cases, GPU
> hang doesn't occur.
> I will run all piglit cases.

Run Piglit cases, the GPU hang doesn't occur.
Comment 11 lu hua 2014-09-05 06:59:05 UTC
(In reply to comment #8)
> Can you re-test with this brach as it has a (hacky) patch to init
> workarounds earlier:
> 
> git://gitorious.org/vsyrjala/linux.git chv_stuff_9_small

Did your patch merge?
Test one cycle on latest nightly kernel, GPU hang goes away.
I will double check it.
Comment 12 Chris Wilson 2014-09-06 12:04:40 UTC
commit 00e1e623e62cd8452e28633182b91ddcbb70cc7c
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Wed Aug 27 17:33:12 2014 +0300

    drm/i915: Init some CHV workarounds via LRIs in ring->init_context()
Comment 13 lu hua 2014-09-09 06:06:51 UTC
Verified.Fixed.
Comment 14 Elizabeth 2017-10-06 14:37:17 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.