Bug 84212 - [BSW]ES3-CTS.shaders.loops.do_while_dynamic_iterations.vector_counter_vertex fails and causes GPU hang
Summary: [BSW]ES3-CTS.shaders.loops.do_while_dynamic_iterations.vector_counter_vertex ...
Status: VERIFIED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high critical
Assignee: Ben Widawsky
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-23 05:13 UTC by lu hua
Modified: 2014-11-25 07:26 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Disable DDChk (937 bytes, patch)
2014-11-21 01:12 UTC, Ben Widawsky
Details | Splinter Review

Description lu hua 2014-09-23 05:13:42 UTC
System Environment:
--------------------------
Platform: BSW
Libdrm:		(master)libdrm-2.4.56-29-g666788a6062de62aa0b3560760fbb0903167a319
Mesa:		(master)d69faf851fff5d41086c9940b2fcf2aa72c40e60
Xserver:(master)xorg-server-1.16.0-317-geaee6572beefca240c42791f9a3a6e547bedd410 
Xf86_video_intel:(master)2.99.916-52-g376037e6336dfc3b32c51b774ab8a80f64390e02
Libva:		(master)e0d25ece01e7aba819c910e98c4fb4706cdab055
Libva_intel_driver:(master)bc2e06ef0f89b264fe968fbff4f06e425385c3d8
Kernel:   (drm-intel-nightly)c5660b4ad395f1e34eacc22cf81c687edfc9c83c

Bug detailed description:
---------------------------
It fails on BSW with mesa master branch, works well on BDW.
Following cases also fail:
ES3-CTS.shaders.loops.for_dynamic_iterations.vector_counter_vertex
ES3-CTS.shaders.loops.while_dynamic_iterations.vector_counter_vertex

output:
dEQP Core GL-CTS-2.0 (0x0052484b) starting..
  target implementation = 'X11'

Test case 'ES3-CTS.shaders.loops.do_while_dynamic_iterations.vector_counter_vertex'..
Vertex compile time = 1.925000 ms
Fragment compile time = 0.714000 ms
Link time = 2.871000 ms
  Fail (Fail)

DONE!

Test run totals:
  Passed:        0/1 (0.00%)
  Failed:        1/1 (100.00%)
  Not supported: 0/1 (0.00%)
  Warnings:      0/1 (0.00%)

Reproduce steps:
-------------------------
1. xinit
2. ./glcts --deqp-case=ES3-CTS.shaders.loops.do_while_dynamic_iterations.vector_counter_vertex
Comment 1 lu hua 2014-09-23 05:21:39 UTC
It also causes GPU hang.

dmesg:
[  179.747534] [drm:i915_gem_open]
[  179.805336] [drm:i915_gem_context_create_ioctl] HW context 1 created
[  179.909310] [drm:valleyview_set_rps] GPU freq request from 160 MHz (16) to 480 MHz (48)
[  185.752305] [drm] stuck on render ring
[  185.767345] [drm] GPU HANG: ecode 0:0x84dffffc, in glcts [3993], reason: Ring hung, action: reset
[  185.767363] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  185.767369] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  185.767374] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  185.767382] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  185.767387] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  185.767497] [drm:i915_error_work_func] resetting chip
[  185.769291] [drm:init_status_page] render ring hws offset: 0x00013000
[  185.769322] [drm:init_status_page] bsd ring hws offset: 0x00037000
[  185.769342] [drm:init_status_page] blitter ring hws offset: 0x0005a000
[  185.769361] [drm:init_status_page] video enhancement ring hws offset: 0x0007d000
[  185.769416] [drm:i9xx_update_primary_plane] Writing base 00889000 00000000 0 0 7680
[  187.633614] [drm:valleyview_set_rps] GPU freq request from 480 MHz (48) to 320 MHz (32)
[  187.637454] [drm:i915_gem_context_destroy_ioctl] HW context 1 destroyed
[  187.752369] [drm:cherryview_enable_rps] GT fifo had a previous error 1080000
[  187.752444] [drm:cherryview_enable_rps] PCBR offset : 0x7eef8001
[  187.754347] [drm:cherryview_enable_rps] GPLL enabled? yes
[  187.754354] [drm:cherryview_enable_rps] GPU status: 0x00202010
[  187.754359] [drm:cherryview_enable_rps] current GPU freq: 320 MHz (32)
[  187.754363] [drm:cherryview_enable_rps] setting GPU freq to 320 MHz (32)
[  187.754369] [drm:valleyview_set_rps] GPU freq request from 320 MHz (32) to 320 MHz (32)
[  187.856359] [drm:valleyview_set_rps] GPU freq request from 320 MHz (32) to 160 MHz (16)
Comment 2 lu hua 2014-09-26 06:40:30 UTC
Increasing priority, it has GPU hang issue.
Comment 3 Gavin Hindman 2014-11-12 23:00:51 UTC
Ben - are you actively looking at this issue?
Comment 4 Ben Widawsky 2014-11-12 23:06:19 UTC
I spent over a week trying to debug it. It works in simulation, and as far as we can tell there is no software bug. I've filed an HSD yesterday. No response yet.
Comment 5 Ben Widawsky 2014-11-12 23:08:07 UTC
Sorry, I need to correct that. I've filed an HSD on a different issue which we think is related. I will try to look at this one specifically.
Comment 6 Ben Widawsky 2014-11-14 18:36:10 UTC
Okay. I'm not convinced this is the same bug. I am investigating. There appears to be nothing wrong from the hardware perspective... need to investigate the generated shaders.
Comment 7 Ben Widawsky 2014-11-15 23:56:35 UTC
Full cmd

./glcts --deqp-surface-width=64 --deqp-surface-height=64 --deqp-base-seed=1 --deqp-surface-type=window --deqp-gl-config-id=14 --deqp-case=ES3-CTS.shaders.loops.do_while_dynamic_iterations.vector_counter_vertex
Comment 8 Ben Widawsky 2014-11-18 05:11:45 UTC
It looks like a more general problem with control flow instructions on CHV.

Can you confirm ./bin/shader_runner tests/shaders/glsl-vs-loop-300.shader_test fails for you as well?
Comment 9 Ben Widawsky 2014-11-18 21:54:00 UTC
Disregard test request. I am not able to reproduce the shader runner hang after reboot.
Comment 10 Ben Widawsky 2014-11-20 03:12:52 UTC
Please test my mesa branch: 
http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=b84212

(Should be SHA 7684eaf239cab0f54ee35b385b6c9205f513a082)

This fixes the issue for me.
Comment 11 Gordon Jin 2014-11-20 03:32:40 UTC
Hua is on leave.
Shuo, could you follow up with Ben?
Comment 12 Ben Widawsky 2014-11-20 05:33:15 UTC
Apparently my push didn't actually work, and my machine at work has locked up. 

Looking at the docs closer, I don't believe what I did is correct anyway. I think the docs are telling me that you cannot program 3 as opposed to, you must program 3. We do not have a check for this, but we're not programming 3 in the failing case. Let's at least make sure this makes the test pass on your system (it does on mine). Then I'll come up with some more experiments.

Here is the patch from memory:
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index c475393..d3b1b32 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -517,6 +517,7 @@ brw_set_src1(struct brw_compile *p, brw_inst *inst, struct brw_reg reg)
         else
             brw_inst_set_src1_vstride(brw, inst, reg.vstride);
       }
+      brw_inst_set_src1_reg_type(brw, inst, 0x3);
    }
 }
Comment 13 Ben Widawsky 2014-11-21 01:12:40 UTC
Created attachment 109786 [details] [review]
Disable DDChk

Please test this instead of the diff I posted yestersday
Comment 14 shuo.wang 2014-11-21 04:37:26 UTC
Thanks Ben, verified and the case is pass.

I tested it by mesa10.4rc1 with kernel 3.18.0-rc5_drm-intel-nightly_3cb89f_20141119. The bug is reproduced.
Then I add Ben's patch into mesa10.4RC1, then the case is pass
After the patch is merged into master, I will close the bug.
Comment 15 Ben Widawsky 2014-11-21 18:53:50 UTC
The patches that I'd like to upstream are now on the mailing list:
http://patchwork.freedesktop.org/patch/37276/
http://patchwork.freedesktop.org/patch/37277/

Please verify they still fix the issue.
Comment 16 Ben Widawsky 2014-11-21 20:10:39 UTC
Nevermind. I pushed it to master.
Comment 17 lu hua 2014-11-25 07:26:04 UTC
Verified.Fixed.
[root@x-bsw01 cts]# ./glcts --deqp-case=ES3-CTS.shaders.loops.do_while_dynamic_iterations.vector_counter_vertex
dEQP Core GL-CTS-2.0 (0x0052484b) starting..
  target implementation = 'X11'

Test case 'ES3-CTS.shaders.loops.do_while_dynamic_iterations.vector_counter_vertex'..
Vertex compile time = 1.886000 ms
Fragment compile time = 0.710000 ms
Link time = 2.645000 ms
  Pass (Pass)

DONE!

Test run totals:
  Passed:        1/1 (100.00%)
  Failed:        0/1 (0.00%)
  Not supported: 0/1 (0.00%)
  Warnings:      0/1 (0.00%)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.