Bug 95462 - [BXT,BSW] arb_gpu_shader_fp64 causes gpu hang
Summary: [BXT,BSW] arb_gpu_shader_fp64 causes gpu hang
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Mark Janes
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 96253
  Show dependency treegraph
 
Reported: 2016-05-17 20:56 UTC by Mark Janes
Modified: 2016-06-21 21:52 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Janes 2016-05-17 20:56:31 UTC
When fp64 was enabled, corresponding piglit tests cause gpu hang on bxt and bsw:

2016-05-17T11:37:58,456330-0700 [drm] GPU HANG: ecode 8:0:0x85dffffb, in shader_runner [23184], reason: Ring hung, action: reset
Hanging Test:
piglit.spec.arb_gpu_shader_fp64.uniform_buffers.fs-dvec4-uniform-array-direct-indirect
Comment 1 Iago Toral 2016-06-06 06:18:25 UTC
(In reply to Mark Janes from comment #0)
> When fp64 was enabled, corresponding piglit tests cause gpu hang on bxt and
> bsw:
> 
> 2016-05-17T11:37:58,456330-0700 [drm] GPU HANG: ecode 8:0:0x85dffffb, in
> shader_runner [23184], reason: Ring hung, action: reset
> Hanging Test:
> piglit.spec.arb_gpu_shader_fp64.uniform_buffers.fs-dvec4-uniform-array-
> direct-indirect

Hi Mark, I don't think we have BXT or BSW hardware here so I am afraid  we would need some help from Intel :(

Also I get from your report that there are multiple tests producing hangs like this on these two platforms?
Comment 2 Iago Toral 2016-06-06 06:25:27 UTC
(In reply to Iago Toral from comment #1)
> (In reply to Mark Janes from comment #0)
> > When fp64 was enabled, corresponding piglit tests cause gpu hang on bxt and
> > bsw:
> > 
> > 2016-05-17T11:37:58,456330-0700 [drm] GPU HANG: ecode 8:0:0x85dffffb, in
> > shader_runner [23184], reason: Ring hung, action: reset
> > Hanging Test:
> > piglit.spec.arb_gpu_shader_fp64.uniform_buffers.fs-dvec4-uniform-array-
> > direct-indirect
> 
> Hi Mark, I don't think we have BXT or BSW hardware here so I am afraid  we
> would need some help from Intel :(
> 
> Also I get from your report that there are multiple tests producing hangs
> like this on these two platforms?

Would it be possible to get a list of the tests that cause a hang? If they are related to a specific feature of fp64 it would help us narrow down the problem to something more specific.
Comment 3 Mark Janes 2016-06-06 21:10:10 UTC
Two tests fail on bsw, related to this bug, which are not covered by the set of tests with arb_gpu_shader_fp64 in the name:

   - piglit.spec.arb_tessellation_shader.execution.dmat-vs-gs-tcs-tes
   - piglit.spec.arb_tessellation_shader.execution.dvec3-vs-tcs-tes
Comment 4 Mark Janes 2016-06-08 19:26:20 UTC
I re-enabled fp64, and encountered gpu hang on:

piglit.spec.arb_gpu_shader_fp64.uniform_buffers.fs-dvec4-uniform-array-direct-indirect

For this run, only one gpu hang occured.
Comment 5 Kenneth Graunke 2016-06-08 23:50:16 UTC
The 'chv-fixes' branch of my tree may fix this.

https://cgit.freedesktop.org/~kwg/mesa/commit/?h=chv-fixes

I don't have a Braswell or Broxton system to test with me, so I haven't tested these at all.  They do make one of the hanging tests happier in the simulator.
Comment 6 Iago Toral 2016-06-09 06:16:27 UTC
(In reply to Kenneth Graunke from comment #5)
> The 'chv-fixes' branch of my tree may fix this.
> 
> https://cgit.freedesktop.org/~kwg/mesa/commit/?h=chv-fixes
> 
> I don't have a Braswell or Broxton system to test with me, so I haven't
> tested these at all.  They do make one of the hanging tests happier in the
> simulator.

Thanks Ken! we finally got a couple of BSW systems today so we will be able to look into this. We'll check with your branch too and let you know the results.
Comment 7 Samuel Iglesias 2016-06-09 14:32:07 UTC
I wrote a patch that fixes the GPU hang in BSW:

https://github.com/samuelig/mesa/commit/1b0350a566f5f7d23f1d96226b9dc8f85aff0d30

It is included in my wip/siglesias/chv-fixes branch which you can clone running this command:

$ git clone -b wip/siglesias/chv-fixes https://github.com/samuelig/mesa.git

This branch includes Ken's patches.

Mark, Can you run it in the CI to see if it fixes the GPU hang in BXT and doesn't add regressions on other generations? I ran piglit on BSW and there are no GPU hangs and, on BDW, there are no regressions compared to current master.
Comment 8 Mark Janes 2016-06-09 18:20:55 UTC
Samuel's branch resolves BSW/BXT gpu hangs, and doesn't generate other regressions.
Comment 9 Mark Janes 2016-06-09 19:00:13 UTC
Beyond the hang, there are nearly 60 fp64 piglit tests that do not pass on BSW or BXT.  

For example:

piglit.spec.arb_gpu_shader_fp64.execution.conversion.vert-conversion-explicit-bvec2-dvec2

Standard Output

bin/shader_runner /tmp/build_root/m64/lib/piglit/generated_tests/spec/arb_gpu_shader_fp64/execution/conversion/vert-conversion-explicit-bvec2-dvec2.shader_test -auto
piglit: debug: Requested an OpenGL 3.2 Core Context, and received a matching 4.3 context

Probe color at (0,2)
  Expected: 0 255 0 255
  Observed: 3 252 0 255


I need guidance as to whether these test failures should gate the 12.0 release or should be written up in a separate bug.
Comment 10 Mark Janes 2016-06-09 19:02:51 UTC
full list of failing tests:

piglit.spec.arb_gpu_shader_fp64.execution.conversion.frag-conversion-explicit-bool-double
piglit.spec.arb_gpu_shader_fp64.execution.conversion.frag-conversion-explicit-bvec2-dvec2
piglit.spec.arb_gpu_shader_fp64.execution.conversion.frag-conversion-explicit-bvec3-dvec3
piglit.spec.arb_gpu_shader_fp64.execution.conversion.frag-conversion-explicit-bvec4-dvec4
piglit.spec.arb_gpu_shader_fp64.execution.conversion.geom-conversion-explicit-bool-double
piglit.spec.arb_gpu_shader_fp64.execution.conversion.geom-conversion-explicit-bvec2-dvec2
piglit.spec.arb_gpu_shader_fp64.execution.conversion.geom-conversion-explicit-bvec3-dvec3
piglit.spec.arb_gpu_shader_fp64.execution.conversion.geom-conversion-explicit-bvec4-dvec4
piglit.spec.arb_gpu_shader_fp64.execution.conversion.vert-conversion-explicit-bool-double
piglit.spec.arb_gpu_shader_fp64.execution.conversion.vert-conversion-explicit-bvec2-dvec2
piglit.spec.arb_gpu_shader_fp64.execution.conversion.vert-conversion-explicit-bvec3-dvec3
piglit.spec.arb_gpu_shader_fp64.execution.conversion.vert-conversion-explicit-bvec4-dvec4
piglit.spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-mixed-shader
piglit.spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-shader
piglit.spec.arb_gpu_shader_fp64.shader_storage.layout-std430-fp64-mixed-shader
piglit.spec.arb_gpu_shader_fp64.shader_storage.layout-std430-fp64-shader
piglit.spec.arb_gpu_shader_fp64.uniform_buffers.fs-double-uniform-array-direct-indirect
piglit.spec.arb_gpu_shader_fp64.uniform_buffers.gs-double-uniform-array-direct-indirect
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat2 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat2 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat2 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat2x3 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat2x3 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat2x3 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat2x4 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat2x4 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat2x4 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat3 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat3 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat3 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat3x2 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat3x2 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat3x2 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat3x4 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat3x4 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat3x4 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat4 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat4 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat4 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat4x2 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat4x2 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat4x2 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat4x3 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat4x3 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dmat4x3 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple double array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple double arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple double separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dvec2 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dvec2 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dvec2 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dvec3 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dvec3 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dvec3 separate
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dvec4 array
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dvec4 arrays_of_arrays
piglit.spec.arb_gpu_shader_fp64.varying-packing.simple dvec4 separate
Comment 11 Samuel Iglesias 2016-06-10 05:18:18 UTC
(In reply to Mark Janes from comment #9)
> Beyond the hang, there are nearly 60 fp64 piglit tests that do not pass on
> BSW or BXT.  
> 
[...]
> I need guidance as to whether these test failures should gate the 12.0
> release or should be written up in a separate bug.

OK, I am going to take a look at them and tell you what to do with them.

Thanks!
Comment 12 Samuel Iglesias 2016-06-15 06:47:47 UTC
(In reply to Samuel Iglesias from comment #11)
> (In reply to Mark Janes from comment #9)
> > Beyond the hang, there are nearly 60 fp64 piglit tests that do not pass on
> > BSW or BXT.  
> > 
> [...]
> > I need guidance as to whether these test failures should gate the 12.0
> > release or should be written up in a separate bug.
> 
> OK, I am going to take a look at them and tell you what to do with them.
> 
> Thanks!

We have written a couple of patches to fix these tests:

$ git clone -b wip/siglesias/chv-fixes https://github.com/samuelig/mesa.git

The fixes are specific to Cherryview, so they might be failing in BXT. There is one patch from Kenneth that did not land master but it is still in my branch. With these patches I got 0 piglit regressions on BSW and BDW and the remaining fp64 failed tests on BSW are fixed.

Mark, Would you mind testing them in CI system? Our plan is to land them in master before the 4th release candidate (planned on Friday, AFAIK), so they will part of the final 12.0 release.

Thanks!
Comment 13 Samuel Iglesias 2016-06-15 07:48:14 UTC
(In reply to Samuel Iglesias from comment #12)
> (In reply to Samuel Iglesias from comment #11)
> > (In reply to Mark Janes from comment #9)
> > > Beyond the hang, there are nearly 60 fp64 piglit tests that do not pass on
> > > BSW or BXT.  
> > > 
> > [...]
> > > I need guidance as to whether these test failures should gate the 12.0
> > > release or should be written up in a separate bug.
> > 
> > OK, I am going to take a look at them and tell you what to do with them.
> > 
> > Thanks!
> 
> We have written a couple of patches to fix these tests:
> 
> $ git clone -b wip/siglesias/chv-fixes https://github.com/samuelig/mesa.git
> 
> The fixes are specific to Cherryview, so they might be failing in BXT. There
> is one patch from Kenneth that did not land master but it is still in my
> branch. With these patches I got 0 piglit regressions on BSW and BDW and the
> remaining fp64 failed tests on BSW are fixed.
> 
> Mark, Would you mind testing them in CI system? Our plan is to land them in
> master before the 4th release candidate (planned on Friday, AFAIK), so they
> will part of the final 12.0 release.
> 
> Thanks!

If they are failing in BXT too, you can test this branch instead:

$ git clone -b wip/siglesias/chv-bxt-fixes https://github.com/samuelig/mesa.git

If fp64 tests pass in BXT with the second branch, I will update the patches before pushing them to master (possibly sending a v2 before), because I have already sent them for review to save time.
Comment 14 Mark Janes 2016-06-16 00:04:13 UTC
All fp64 tests on bxt/bsw are fixed by the branch:

wip/siglesias/chv-bxt-fixes 

thanks!
Comment 15 Samuel Iglesias 2016-06-17 09:40:16 UTC
(In reply to Mark Janes from comment #14)
> All fp64 tests on bxt/bsw are fixed by the branch:
> 
> wip/siglesias/chv-bxt-fixes 
> 
> thanks!

The following two patches landed master:

bdab572 i965/fs: indirect addressing with doubles is not supported in CHV/BSW/BXT
0177dbb i965/fs: Fix single-precision to double-precision conversions for CHV/BSW/BXT

There is still one patch from Kenneth that did not land master:
   "i965: Fix multiplication of immediates on Cherryview/Broxton."
Comment 16 Mark Janes 2016-06-21 16:18:27 UTC
The final patch has been merged.


bug/show.html.tmpl processed on Jan 16, 2017 at 17:21:43.
(provided by the Example extension).