Bug 107579 - [SNB] The graphic corruption when we reuse the GS compiled and used for TFB when statebuffer contain magic trash in the unused space
Summary: [SNB] The graphic corruption when we reuse the GS compiled and used for TFB w...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 18.2
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-15 12:08 UTC by asimiklit
Modified: 2018-09-04 10:23 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
log with options INTEL_DEBUG=bat,buf (885.90 KB, text/plain)
2018-08-15 12:08 UTC, asimiklit
Details
a bit more description about "magic trash" in state buffer (64.66 KB, image/png)
2018-08-15 12:11 UTC, asimiklit
Details
The graphical corruption example 1 (714.26 KB, image/png)
2018-08-15 12:13 UTC, asimiklit
Details
The graphical corruption example 2 (randmo dots on the screen) (818.75 KB, image/png)
2018-08-15 12:15 UTC, asimiklit
Details
The apitrace which cause to the graphical corruption (4.21 MB, application/octet-stream)
2018-08-15 13:37 UTC, asimiklit
Details
simple program and makefile (3.94 KB, application/zip)
2018-08-16 10:16 UTC, asimiklit
Details

Description asimiklit 2018-08-15 12:08:18 UTC
Created attachment 141105 [details]
log with options INTEL_DEBUG=bat,buf

The graphic corruption when we reuse the Geometry Shader compiled and used at least once for Transform Feedback
when "statebuffer" contains the magic trash in the unused space.

The apitrace, simple_reproduccer, screens will be attached shortly.
The log with option "INTEL_DEBUG=bat,buf" was attached.

After long investigation of this issue following details were found:

1. Sometimes this bug leads to GPU hang.

2. The bug appears on first glFlush (on "execbuffer" function when we are sending the validation list to drmIoctl) 
    after drawing which is located after glEndTransformFeedback function. 
    One more point here it is mandatory to use the same shader which was used for TFB.

3. The intel_sanitize_gpu util detects the "buffer out-of-bounds write" almost in all BOs

4. The bug is reproduced if and only if:
    
    a. We use custom GS shader even if this shader implemented as passthrough 
       (output all input data as is without changes at all).
    
    b. We do not have to call "glDrawArray" function between 
       "glBeginTransformFeedback" and "glEndTransformFeedback" to reproduce this issue.

    c. The "statebuffer" contains some magic trash in third dword.
        The 0xFFFFFFFF value is enough to reproduce. 
        There are few legal ways to put this trash to "statebuffer" according to "brw_bo_alloc" implementation for example:
            1. Alloc several 16KB BOs filled by 0xFF using regular GL calls
            2. Use very big shaders to increase size of "program cache". It is produce the 16KB freed buffer with some trash.

    d. We use the same "Kernel Start Pointer" in 3DSTATE_GS for drawing with transform feedback and without.


Looks like the GS shader continue to write TFX after call the glEndTransformFeedback function for some reason.


This bug is based on https://bugs.freedesktop.org/show_bug.cgi?id=91827 bug.
Comment 1 asimiklit 2018-08-15 12:11:20 UTC
Created attachment 141106 [details]
a bit more description about "magic trash" in state buffer
Comment 2 asimiklit 2018-08-15 12:13:44 UTC
Created attachment 141107 [details]
The graphical corruption example 1
Comment 3 asimiklit 2018-08-15 12:15:46 UTC
Created attachment 141108 [details]
The graphical corruption example 2 (randmo dots on the screen)
Comment 4 asimiklit 2018-08-15 12:52:28 UTC
At the begging of the investigation we found out the "MESA_GLSL_CACHE_DISABLE=1" option fixes this issue but later 
we found out that this option just helps to avoid the mandatory 
condition of this bug the "magic trash" in the "statebuffer" because 16KB default "program cache" was release too early when we used the disk shader cache.
Comment 5 asimiklit 2018-08-15 13:00:14 UTC
Actually the 

>if(batch->state.map) { memset(batch->state.map, 0, STATE_SZ); }  

immediately after

>recreate_growing_buffer(brw, &batch->state, "statebuffer", STATE_SZ,
>                           BRW_MEMZONE_DYNAMIC);

helps to fix this problem but unfortunately it is not a root cause of the issue.
Comment 6 asimiklit 2018-08-15 13:37:35 UTC
Created attachment 141111 [details]
The apitrace which cause to the graphical corruption

The following code could help us to show state buffer before send (last 3 byte could be missing but i think they is do not matter):

--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -750,6 +750,19 @@ submit_batch(struct brw_context *brw, int in_fence_fd, int *out_fence_fd)
       memcpy(bo_map, batch->state.map, batch->state_used);
    }
 
+      fprintf(stderr, "=================================================\n");
+      if(batch->state.map)
+      {
+        uint32_t * data = (uint32_t *)batch->state.map;
+        const size_t s = batch->state_used / 4;
+        for(size_t i = 0u; i < s; ++i)
+        {
+           fprintf(stderr, "0x%08x : 0x%08x\n", (uint32_t)(i*4), data[i]);
+        }
+      }
+      fprintf(stderr, "=================================================\n");
+
+
    brw_bo_unmap(batch->batch.bo);
    brw_bo_unmap(batch->state.bo);
Comment 7 asimiklit 2018-08-16 10:16:17 UTC
Created attachment 141137 [details]
simple program and makefile

Added simple reproducer program

Note: I was a bit wrong the magic value 0xFFFFFFFF is enough for HANG only for the graphical corruption we need the following values:

((uint32_t*)vbuffer0Ptr)[0] =  0x2003625aU;
((uint32_t*)vbuffer0Ptr)[1] =  0x02040110U;
((uint32_t*)vbuffer0Ptr)[2] =  0x2027625aU;
((uint32_t*)vbuffer0Ptr)[3] =  0x02040210U;

They are used in the following function:
>bool allocBufferWithMagicTrash(GLuint idx)
>{
>   bool oval = false;
>   enum { kSize = 16384 };
>   glBindBuffer(GL_ARRAY_BUFFER, idx);
>   glBufferData(GL_ARRAY_BUFFER, kSize, NULL, GL_STATIC_DRAW);
>   void * vbuffer0Ptr = glMapBuffer(GL_ARRAY_BUFFER, GL_READ_WRITE);
>   assert(vbuffer0Ptr);
>   if(vbuffer0Ptr)
>   {
>      memset(vbuffer0Ptr, 0xFF, kSize);
>      ((uint32_t*)vbuffer0Ptr)[0] =  0x2003625aU;
>      ((uint32_t*)vbuffer0Ptr)[1] =  0x02040110U;
>      ((uint32_t*)vbuffer0Ptr)[2] =  0x2027625aU;
>      ((uint32_t*)vbuffer0Ptr)[3] =  0x02040210U;
>      glUnmapBuffer(GL_ARRAY_BUFFER);
>      glBindBuffer(GL_ARRAY_BUFFER, 0);
>   }
>   return oval;
>}
Comment 8 asimiklit 2018-09-04 10:23:46 UTC
This issue should be fixed by commit:

1b0df8a46020cc88afeaa4decb42a782ab168afb

i965/gen6/xfb: handle case where transform feedback is not active


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.