Bug 83962

Summary: [HSW/BYT]Piglit spec_ARB_gpu_shader5_arb_gpu_shader5-emitstreamvertex_nodraw fails
Product: Mesa Reporter: lu hua <huax.lu>
Component: Drivers/DRI/i965Assignee: Ian Romanick <idr>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: christophe.prigent, itoral
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description lu hua 2014-09-17 05:37:22 UTC
System Environment:
--------------------------
Platform: HSW
Libdrm:		(master)libdrm-2.4.56-25-g86b37c61c78edd1353a3f76f678c39e2ec168771
Mesa:		(master)7f6872d012e66b11b64179cd7c214d10d4ae55cd
Xserver:(master)xorg-server-1.16.0-176-gd3427717f2c6a473dc3d20631dff653e4e37228e
Xf86_video_intel:(master)2.99.916-43-gd470f0f520f6bd160ae4acef2b4b3c86afd8dbbd
Libva:		(master)e0d25ece01e7aba819c910e98c4fb4706cdab055
Libva_intel_driver:(master)bc2e06ef0f89b264fe968fbff4f06e425385c3d8
Kernel:   (drm-intel-nightly)43df30da20447e2856b2761215ff274886a9f931

Bug detailed description:
---------------------------
New case spec_ARB_gpu_shader5_arb_gpu_shader5-emitstreamvertex_nodraw fails on HSW and BYT with mesa master and 10.3 branch.
It works well on SNB, skip on BDW/ILK/PNV.

output:
Probe color at (50,50)
  Expected: 0.000000 0.000000 0.000000
  Observed: 1.000000 0.000000 0.000000
PIGLIT: {"result": "fail" }

Reproduce steps:
-------------------------
1. xinit
2. bin/arb_gpu_shader5-emitstreamvertex_nodraw -fbo -auto
Comment 1 Iago Toral 2015-03-09 10:56:05 UTC
I am looking into this.
Comment 2 Iago Toral 2015-03-09 11:46:40 UTC
The test renders 4 points with a geometry shader, 3 points go to stream 0 and one point goes to stream 1.

The test specs that only the 3 points assigned to stream 0 get rendered, since according to ARB_gpu_shader5, only these should be passed down the rendering pipeline for rasterization, while the point assigned to stream 1 should be removed after stream output.

The Haswell PRM says:

"Rendering Disable
Independent of SOL function enable, if rendering (i.e, 3D pipeline functions past the SOL stage) is enabled (via clearing the Rendering Disable bit), the SOL stage will pass topologies for a specific input stream (as selected by Render Stream Select) down the pipeline, with the exception of PATCHLIST_n
topologies which are never passed downstream."

This is the same language that exists, for example, for IVB, where this test is passing correctly. Unfortunately, the hardware is not doing this, and when SOL stage is not active, it ignores the value of the Render Stream Select bits from the 3DSTATE_STREAMOUT state packet, which means that it renderes primitives coming from all streams, which leads to the piglit error.

Always enabling the SOL unit on Haswell like this:

diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index 7e9b285..ea8319e 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -223,6 +223,9 @@ upload_3dstate_streamout(struct brw_context *brw, bool active,
    uint32_t dw1 = 0, dw2 = 0;
    int i;
 
+   if (brw->is_haswell)
+      dw1 |= SO_FUNCTION_ENABLE;
+
    if (active) {
       int urb_entry_read_offset = 0;
       int urb_entry_read_length = (vue_map->num_slots + 1) / 2 -

fixes the piglit test, but as we know from past experiences (see 99f8ea295f and 
f976b4c1bf22) this can lead to significant performance degradation, so we probably don't want to do something like this.

I can't think of a another way to fix this though, since after SOL it will be too late to discard geometry assigned to non-zero streams, since we won't have stream information any more and we won't be able to tell which primitives originated from stream 0 and which didn't.
Comment 3 Iago Toral 2015-03-09 14:26:06 UTC
Ok, found a way to work around the hardware bug in Haswell and sent a patch for review to the mailing list:
http://lists.freedesktop.org/archives/mesa-dev/2015-March/078920.html

I fixed this by ignoring vertices bound to non-zero streams in the GS when TF is disabled. In this scenario these vertices are useless (they won't be rendered and they won't be captured by TF) so this should be valid.

I was tempted to enable the patch for all platforms (gen >= 6), at least that seemed to work fine for SNB, IVB and HSW according to piglit, but since the patch might introduce a minor change in behavior when a shader emits more vertices than the maximum it declares I decided to keep it specific to Haswell in the end.
Comment 4 Emil Velikov 2015-07-06 12:40:53 UTC
Hi all,

Iago's patch has been merged for quite some time now, so I'm included to set this as resolved. Feel free to reopen/verify as appropriate.
Comment 5 lu hua 2015-07-15 05:51:40 UTC
Christophe, 
Does this issue still happen on your machines?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.