Summary: | [SKL bisected] texsubimage pbo intermittent failures | ||
---|---|---|---|
Product: | Mesa | Reporter: | Ben Widawsky <ben> |
Component: | Drivers/DRI/i965 | Assignee: | Topi Pohjolainen <topi.pohjolainen> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | major | ||
Priority: | high | CC: | ben, chadversary, james.ausmus, jani.saarinen, kenneth, mark.a.janes |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Ben Widawsky
2015-09-08 23:15:59 UTC
Reproducible on IVB with fast clears, and blorp disabled too. Takes around 100 runs to hit it on IVB. It's more frequent on SKL. (Leaving SKL only in the title because the above is not in any released version of mesa). And furthermore, on IVB I can also get failure with a revert. texsubimage failed target: GL_TEXTURE_2D internal format: GL_LUMINANCE8 region: 26, 4 101 x 59 (Note that there are two types of error messages, one that doesn't cause the piglit to return failure and one that really fails. This is the latter.) Also to note that I'm experimenting with roughly 3 weeks old master - current master contains another patch removing infrastructure needed by the revert. Among other things I had simplified the test and altered it report every faulty pixel (instead of bailing out on first error). That way I had noticed that the failing area is always a 4x4 block (16 pixels). And with the RGBA8 format I've been concentrating on thus corresponding to one cache line of 64 bytes. Moreover, the erroneous values are not just random garbage but exactly the values expected to see for the 4x4 block in position 0,0 - 3,3. Hence I've been suspecting the cache simply containing "old data". Now, I also noticed that brw_meta_fast_clear() unconditionally issues a cache flush (by calling brw_emit_mi_flush()). This doesn't happen when glsl-based meta clears are used. On IVB I can specifically reproduce the error if fast clears are _disabled_ but not with. So I added a call for brw_emit_mi_flush() also after glsl-based clear in (in the end of brw_clear()). That makes a difference on both IVB and SKL - I don't seem to be able to reproduce the error anymore. This is heavy flush and it maybe that the real bug is still hidden due to change in runtime dynamics. In case of fast clears the spec specifically says that the flush has to be made but with glsl based clears we are talking about normal rendering and therefore the flush is not readily justified. I need to read the specs some more. (In reply to Topi Pohjolainen from comment #4) > Moreover, the erroneous values are not just random garbage but > exactly the values expected to see for the 4x4 block in position 0,0 - 3,3. > Hence I've been suspecting the cache simply containing "old data". Just wanted to amend the statement to be that contains data from another cacheline, which might be "old" but we really don't know. Out of curiosity I also tried the flush (brw_emit_mi_flush()) after the texture data upload in intelTexSubImage() - I put it just after _mesa_meta_pbo_TexSubImage(). That doesn't help. What about before _mesa_meta_pbo_TexSubImage()? That doesn't help either - and in my mind it shouldn't. It is the data left in the cache by the uploading, i.e., by _mesa_meta_pbo_TexSubImage(), that seems to be the problem. Or this is how I see it at the moment. I've been hopefully narrowing this down some more. What seems to make a difference between a flush just after upload and a flush just after clear is that in the latter case it is preceded by a _mesa_update_state() call. I made brw_clear() to skip the entire clear but to just issue brw_emit_mi_flush() (note that _mesa_clear() calls _mesa_update_state() before it calls driver to do its thing). In the test there is specifically a glClear() call between the data upload and the draw call sampling the loaded data. This prevents the error from happening. If in turn I remove this specific glClear() call in the test (that now doesn't no anything else than state update and flush) than the error will re-appear. Recall that simple brw_emit_mi_flush() after the upload BUT before _mesa_update_state() didn't help. I'll now start slicing _mesa_update_state(). FYI I tried this test with fast clear's enabled (http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=skl-clear-redux) and it still fails intermittently. This was also the case previously when it was only the single revert. (In reply to Topi Pohjolainen from comment #4) > Among other things I had simplified the test and altered it report every > faulty pixel (instead of bailing out on first error). That way I had noticed > that the failing area is always a 4x4 block (16 pixels). And with the RGBA8 > format I've been concentrating on thus corresponding to one cache line of 64 > bytes. Moreover, the erroneous values are not just random garbage but > exactly the values expected to see for the 4x4 block in position 0,0 - 3,3. > Hence I've been suspecting the cache simply containing "old data". Are you seeing any pattern to the location of the erroneous data? I haven't, although it doesn't necessarily mean there isn't any. I was able to narrow down the combination of _mesa_update_state() and brw_emit_mi_flush(). I got it simplified to a call _mesa_update_texture() followed by part of the logic in brw_context.c:intel_update_state() - namely calling brw_render_cache_set_check_flush() at least once. That call in turn is simple check for existence of the underlying buffer object of the texture and then a call to brw_emit_mi_flush(). So, after more simplification, I found out that if I'm calling brw_emit_mi_flush() _twice_ after the call for _mesa_meta_pbo_TexSubImage() in intelTexSubImage() I'm not able to reproduce the error. Recall that calling brw_emit_mi_flush() isn't enough. This is turn means that I'm mostly likely in square one - I'm simple changing the dynamics enough to hide the real bug. Topi, can you push your fix rebased on top of master? We want to see if it helps intermittent Braswell failures too. Sure, you mean hacks I think. I have two alternatives but both without any rational. Which one would you prefer, one brw_emit_mi_flush() call in the end of brw_clear() or two subsequent brw_emit_mi_flush() calls just after the texture data uploading in intelTexSubImage()? I pushed both fixes in git://people.freedesktop.org/~tpohjola/mesa:skl_hacks. They are the two top most commits and can be used without the other. This branch also contains my blorp + fast clear disabling patches for IVB. It is based on top of roughly three weeks old master but the fixes themselves are very small and should rebase on top of current without trouble. There looks to be at least the following pattern in the error. The test creates a source texture and uploads initial data for it using glTexImage2D(). This is followed by a creation of pixel buffer object that is used to overwrite part of the created source texture using glTexSubImage2D(). Finally the source texture is sampled and the pixel values are written to the framebuffer to be examined using glReadPixels(). I noticed that the failing values are always outside the updated the region. I modified the test to skip checking the pixels corresponding to the updated region allowing me to skip also the partial update of the source texture (i.e., to skip calling glTexSubImage2D() in the test). With the glTexSubImage2D() in place there are errors _outside_ the area addressed by that call but without it I can't see them anymore. Therefor it is possible that there is some cache flushing issue between the initial glTexImage2D() and the following glTexSubImage2D() call. I'm trying to look into this next. I found out that the initial uploading using glTexImage2D() falls to intel_texsubimage_tiled_memcpy() path. I tried disabling special streaming instructions in intel_tiled_memcpy.c and now I seem to get results from different cache line(s) than before. Previously it was consistently the same, now the pixel values correspond to different location in the texture and vary from run to another. And forcing the initial upload to use the generic implementation in mesa core (_mesa_store_teximage(), also cpu based) results into yet different pixels values to be received. In the second upload round (glTexSubImage2D()) one can disable the update using GPU (i.e., disable _mesa_meta_pbo_TexSubImage() which a glsl-based blit) and force it to cpu path. This prevents the error from happening as well. This is another random observation. I noticed that _mesa_meta_pbo_TexSubImage() itself stores the current gl-state and resets the state for new meta operation (calls _mesa_meta_begin()). This is followed by a call _mesa_meta_BlitFramebuffer() which in turn does the same - stores the current gl-state and resets it again (by calling _mesa_meta_begin() also). This should be fine as the meta-state mechanism is designed for nested operations. This is just used quite rarely and therefore possible (though unlikely) source for a lurking bug. Hi Thank you for your email. I am Out of Office and returning on Monday 19th of Oct 20th 2015. If you have an urgent issue, which must be addressed during this time please SMS me +358503819465. Br, Jani I further hacked _mesa_meta_pbo_TexSubImage() to use driver blit callback allowing me to experiment with blorp or blitter engine handling the blit. I can't reproduce the error either with blorp or blitter engine. (In reply to Topi Pohjolainen from comment #23) > I further hacked _mesa_meta_pbo_TexSubImage() to use driver blit callback > allowing me to experiment with blorp or blitter engine handling the blit. I > can't reproduce the error either with blorp or blitter engine. This confirms my suspicion. This data point is SKL, or the hacked IVB? This is hacked IVB still. Note also that non-failing blorp-path is essentially similar to the failing glsl-path - both do the blit using fragment shader. I tried the same treatment on SKL, there intel_blit_framebuffer_with_blitter() fails and it falls to cpu (_swrast_BlitFramebuffer()). This causes a myriad of pixel errors. Using INTEL_DEBUG=perf I get on SKL: intel_miptree_blit: Can't use hardware blitter from MESA_FORMAT_R8G8B8A8_UNORM to MESA_FORMAT_RGBA_UNORM16, falling back. glBlitFramebuffer(): unknown blit failure. Falling back to software rendering. fail Flushing before mapping a referenced bo. CPU mapping a busy miptree BO stalled and took 0.314 ms. On IVB I get only: Flushing before mapping a referenced bo. CPU mapping a busy miptree BO stalled and took 0.118 ms. (In reply to Topi Pohjolainen from comment #27) > Using INTEL_DEBUG=perf I get on SKL: > > intel_miptree_blit: Can't use hardware blitter from > MESA_FORMAT_R8G8B8A8_UNORM to MESA_FORMAT_RGBA_UNORM16, falling back. > glBlitFramebuffer(): unknown blit failure. Falling back to software > rendering. > fail This line is my own debug printing, ignore that. > Flushing before mapping a referenced bo. > CPU mapping a busy miptree BO stalled and took 0.314 ms. > > > On IVB I get only: > > Flushing before mapping a referenced bo. > CPU mapping a busy miptree BO stalled and took 0.118 ms. My mistake, I'm seeing the same fallback error in IVB case as well. It just doesn't happen for all formats and I missed it when checking the output. And to clarify: whereas on IVB intel_blit_framebuffer_with_blitter() fails only in some cases and falls to cpu, one SKL it fails always and therefore does the blit with cpu. I wonder why the blit is failing always on SKL. I can't think of anything IVB can do that SKL cannot do. (In reply to Ben Widawsky from comment #31) > I wonder why the blit is failing always on SKL. I can't think of anything > IVB can do that SKL cannot do. And you are correct. Neither actually fail the blit when doing the texture upload, the fail comes from elsewhere. I made the same mistake I did before when interpreting my logs - only this time in both IVB and SKL. I didn't bother creating special driver hook for only the texture upload path to use but simply used the general. I assumed the test is simple enough not require blitting elsewhere - this is not true unfortunately. When I disabled the glsl-blit option in the driver hook some of these non-tex-upload blits started to fall to the software path because the blitter couldn't do a format conversion. Furthermore, the use of blitter engine helps the special of the test I've been studying on IVB. Full test (original version of the piglit test) fails on both IVB and SKL with blitter engine. In fact, it looks to fail in more cases than the glsl blit. I need to look into this some more. Reversing the mutual order of intel_blit_framebuffer_with_blitter() and _mesa_meta_BlitFramebuffer() in i965-driver callback for blits fixes the said newly introduced errors. This setup uses the blitter whenever possible and then in cases not handled by the blitter prefers glsl-based blit over the software path which looks to introduce its own set of failures. With this setup on SKL the original failure of interest manifests itself after a few testing rounds. On IVB in turn I cannot see failures in the original piglit test or in my reduced version of it. (In reply to Topi Pohjolainen from comment #13) > I was able to narrow down the combination of _mesa_update_state() and > brw_emit_mi_flush(). I got it simplified to a call _mesa_update_texture() > followed by part of the logic in brw_context.c:intel_update_state() - namely > calling brw_render_cache_set_check_flush() at least once. That call in turn > is simple check for existence of the underlying buffer object of the texture > and then a call to brw_emit_mi_flush(). > > So, after more simplification, I found out that if I'm calling > brw_emit_mi_flush() _twice_ after the call for _mesa_meta_pbo_TexSubImage() > in intelTexSubImage() I'm not able to reproduce the error. Recall that > calling brw_emit_mi_flush() isn't enough. > > This is turn means that I'm mostly likely in square one - I'm simple > changing the dynamics enough to hide the real bug. I have a little more to add here. I started reading bspec again regarding flushing and related pipe control options. I added another brw_emit_pipe_control_flush() into brw_emit_mi_flush() in order to experiment with individual bits of the pipe control command. (All that brw_emit_mi_flush() does is to emit pipe-control commands). Neither on IVB or on SKL I cannot see any failures if I just submit another pipe-control command with only a single bit set, namely PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE. Any other bit didn't seem to make any difference. (In reply to Topi Pohjolainen from comment #34) > (In reply to Topi Pohjolainen from comment #13) > > I was able to narrow down the combination of _mesa_update_state() and > > brw_emit_mi_flush(). I got it simplified to a call _mesa_update_texture() > > followed by part of the logic in brw_context.c:intel_update_state() - namely > > calling brw_render_cache_set_check_flush() at least once. That call in turn > > is simple check for existence of the underlying buffer object of the texture > > and then a call to brw_emit_mi_flush(). > > > > So, after more simplification, I found out that if I'm calling > > brw_emit_mi_flush() _twice_ after the call for _mesa_meta_pbo_TexSubImage() > > in intelTexSubImage() I'm not able to reproduce the error. Recall that > > calling brw_emit_mi_flush() isn't enough. > > > > This is turn means that I'm mostly likely in square one - I'm simple > > changing the dynamics enough to hide the real bug. > > I have a little more to add here. I started reading bspec again regarding > flushing and related pipe control options. I added another > brw_emit_pipe_control_flush() into brw_emit_mi_flush() in order to > experiment with individual bits of the pipe control command. (All that > brw_emit_mi_flush() does is to emit pipe-control commands). > > Neither on IVB or on SKL I cannot see any failures if I just submit another > pipe-control command with only a single bit set, namely > PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE. Any other bit didn't seem to make any > difference. And here it should be noted that the preceding (and already existing pipe-control command) in brw_emit_mi_flush() contains this bit among others. Also I noticed bspec telling me that on SKL TEXTURE_CACHE_INVALIDATE must always be accompanied with CS_STALL bit. I didn't have it, but the command still made a difference. On Wed, Oct 21, 2015 at 01:21:12PM +0000, bugzilla-daemon@freedesktop.org wrote: > https://bugs.freedesktop.org/show_bug.cgi?id=91926 > > --- Comment #35 from Topi Pohjolainen <topi.pohjolainen@intel.com> --- > (In reply to Topi Pohjolainen from comment #34) > > (In reply to Topi Pohjolainen from comment #13) > > > I was able to narrow down the combination of _mesa_update_state() and > > > brw_emit_mi_flush(). I got it simplified to a call _mesa_update_texture() > > > followed by part of the logic in brw_context.c:intel_update_state() - namely > > > calling brw_render_cache_set_check_flush() at least once. That call in turn > > > is simple check for existence of the underlying buffer object of the texture > > > and then a call to brw_emit_mi_flush(). > > > > > > So, after more simplification, I found out that if I'm calling > > > brw_emit_mi_flush() _twice_ after the call for _mesa_meta_pbo_TexSubImage() > > > in intelTexSubImage() I'm not able to reproduce the error. Recall that > > > calling brw_emit_mi_flush() isn't enough. > > > > > > This is turn means that I'm mostly likely in square one - I'm simple > > > changing the dynamics enough to hide the real bug. > > > > I have a little more to add here. I started reading bspec again regarding > > flushing and related pipe control options. I added another > > brw_emit_pipe_control_flush() into brw_emit_mi_flush() in order to > > experiment with individual bits of the pipe control command. (All that > > brw_emit_mi_flush() does is to emit pipe-control commands). > > > > Neither on IVB or on SKL I cannot see any failures if I just submit another > > pipe-control command with only a single bit set, namely > > PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE. Any other bit didn't seem to make any > > difference. > > And here it should be noted that the preceding (and already existing > pipe-control command) in brw_emit_mi_flush() contains this bit among others. > > Also I noticed bspec telling me that on SKL TEXTURE_CACHE_INVALIDATE must > always be accompanied with CS_STALL bit. I didn't have it, but the command > still made a difference. > Invalidating the texture cache is a pretty large hammer, particularly if you're only updating a small region of your texture. (CS stalls are used all over and probably won't make too much difference). Have you already tried throwing in a bunch of clflushes or a wbinvd before the upload? If I read Topi's comment right, it says that doing a TC flush immediately after an existing pipe control that *already* did a TC flush solved the problem...that's very strange... (In reply to Kenneth Graunke from comment #37) > If I read Topi's comment right, it says that doing a TC flush immediately > after an existing pipe control that *already* did a TC flush solved the > problem...that's very strange... Exactly. I went through half the workarounds in bspec yesterday, I'll walk through the rest today and see if there is anything. In workarounds I found the following: "For Gen6 and Gen7, any time a pipeline flush is used it needs to be followed by eight (8) separate storeDW commands. This is done as a workaround to a known issue where the pipeline flush does not push all cycles to the "GO" point. The storeDWs will not accomplish this process but will create enough stall time to ensure all cycles are at GO prior to execution of the next buffer or part of the buffer. This is required for producer/consumer model." I don't exactly know to what "storeDW" refers here and I added in brw_emit_mi_flush() after the flush: for (unsigned i = 0; i < 8; ++i) brw_emit_pipe_control_write(brw, PIPE_CONTROL_WRITE_IMMEDIATE, brw->workaround_bo, 0, 0, 0); (which writes in fact 8 times qwords). This at least doesn't make any difference either on IVB or SKL. When I started looking at the bug I was uneasy with a another error message coming from mesa although this didn't cause the test to fail. Namely: Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(out of bounds PBO access) This comes from the second update round when the pixel buffer object is used as the source for the rectangle to be update in the texture image. The test is written to try 10 different regions with random width, height and depth. The error message comes from mesa when one tries to upload with texture depth zero. This results into no upload as mesa bails out. The reason there is no piglit failure is because the piglit test _also_ skips to check any pixels as the test iterates over the depth. I left this to be fixed later on as I thought it is separate from the case where we really check pixels and find faulty square. However, just out of curiosity I modified the piglit test to skip the failing attempt to upload when one of the random dimensions (tw, th, td) is zero. With this change in the test I'm not able to reproduce the failure anymore. I'll check next if this just a coincidence or if it is somehow connected to the failure in pixel values. Right, it still fails even when the faulty dimensions are rejected. The first version I had for this just reduced the probability of faulty pixels appearing. The test runs a loop of ten iterations during each it randomly selects the dimensions. In the first version I simply skipped the iteration and therefore the test tried fewer rectangles. I replaced the skip by a loop that keeps on re-selecting dimensions until they are valid. Now the test really runs the ten iterations and after 211 rounds of the test itself I got faulty pixels on IVB. (This is based on the original form of the test). The simplified version I have doesn't suffer from invalid dimensions as one of the simplifications is hard-coded dimensions known to produce the faulty pixels with high probability. Therefore I was skeptical of the connection between the invalid dimensions (failing upload) and the pixels errors. Above confirms this. (In reply to Topi Pohjolainen from comment #39) > In workarounds I found the following: > > "For Gen6 and Gen7, any time a pipeline flush is used it needs to be > followed by eight (8) separate storeDW commands. This is done as a > workaround to a known issue where the pipeline flush does not push all > cycles to the "GO" point. The storeDWs will not accomplish this process > but will create enough stall time to ensure all cycles are at GO prior > to execution of the next buffer or part of the buffer. This is required > for producer/consumer model." > > I don't exactly know to what "storeDW" refers here and I added in > brw_emit_mi_flush() after the flush: > > for (unsigned i = 0; i < 8; ++i) > brw_emit_pipe_control_write(brw, PIPE_CONTROL_WRITE_IMMEDIATE, > brw->workaround_bo, 0, 0, 0); > > (which writes in fact 8 times qwords). > > This at least doesn't make any difference either on IVB or SKL. Also just tried using MI_STORE_DATA_IMM instead of pipe control command - doesn't help on IVB. I was about to test if Ken's recent patch for the dolphin bug fixed this issue and noticed that I couldn't reproduce the issue on master. I did a reverse bisect and found that the issue is fixed/masked now by: commit 8a0c85b25853decb4a110b6d36d79c4f095d437b Author: Chad Versace <chad.versace@intel.com> Date: Thu Oct 8 12:06:24 2015 -0700 i965/gen9: Enable rep clears on gen9 I tried Ken's tree also, with and without his patch and realized that I can't reproduce the bug with neither. I removed the unnecessary glClear() call in the test that introduces second texture cache flush with fast clears. Bisecting with this gives me: commit 9ea2a86809577cac5006a2bc4fad29fed9cb3ccc Author: Marek Olšák <marek.olsak@amd.com> Date: Mon Oct 5 03:02:42 2015 +0200 mesa: call ProgramStringNotify for fixed-function vertex programs Drivers weren't notified about this at all. This allows disabling on-demand compilation in drivers. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com> With that 200 test rounds doesn't reveal any failure but without it fails almost immediately on SKL. (In reply to Topi Pohjolainen from comment #44) > I tried Ken's tree also, with and without his patch and realized that I > can't reproduce the bug with neither. I removed the unnecessary glClear() > call in the test that introduces second texture cache flush with fast > clears. Bisecting with this gives me: > > commit 9ea2a86809577cac5006a2bc4fad29fed9cb3ccc > Author: Marek Olšák <marek.olsak@amd.com> > Date: Mon Oct 5 03:02:42 2015 +0200 > > mesa: call ProgramStringNotify for fixed-function vertex programs > > Drivers weren't notified about this at all. > This allows disabling on-demand compilation in drivers. > > Reviewed-by: Dave Airlie <airlied@redhat.com> > Reviewed-by: Brian Paul <brianp@vmware.com> > Tested-by: Brian Paul <brianp@vmware.com> > > > With that 200 test rounds doesn't reveal any failure but without it fails > almost immediately on SKL. And to clarify that this commit is introduced before Chad's repclear patch. Interesting. You're definitely seeing much higher frequency than I am. While bisecting using 100 rounds on SKL, I see between 1-3 failures with recent mesa and a new SKL part. I can try to up the number of iterations and bisect again if it's helpful. For posterity, here was my bisect command line (note it's a reverse bisect, so failure is good) i=0; while [[ $i -lt 100 ]] ; do ../piglit/bin/texsubimage pbo -auto -fbo > /dev/null 2>&1 || echo "git bisect good"; ((i++)) ; done I don't really know if bisecting tells us much. I just checked with IVB, "mesa: call ProgramStringNotify..."-commit does not help the simplified version of the piglit test. And the original seems to run just fine with and without that patch. I'll bisect IVB also. So I'm more and more convinced that these changes just simply change the runtime dynamics hiding the bug instead of really fixing it. (In reply to Topi Pohjolainen from comment #48) > I don't really know if bisecting tells us much. I just checked with IVB, > "mesa: call ProgramStringNotify..."-commit does not help the simplified > version of the piglit test. And the original seems to run just fine with and > without that patch. I'll bisect IVB also. This wasn't true after all, the probability just drops a lot, I required 160 rounds to hit the failure on IVB with the original test. I wonder if I finally have something. I started slicing the update blit of the source texture (recall that this is the round overwriting a selected region, and although the error is always in the part that corresponds to pixels outside the region, still without this blit the error doesn't manifest itself). I have simplified the test on IVB so much that the only meaningful meta operation is this update-blit allowing me to freely hack things out in meta and in the i965-state setup conditionally using _mesa_meta_in_progress(). I found out that the nested meta stacking isn't a problem as without the final kick for drawing the error can't be reproduced (or at least I don't see it any 200 test rounds. Note that normally I get in the 2nd or 3rd round latest). I continued to i965-state tracking and got all the way down to texture surface setup. This is the temporary texture created to represent the pixel buffer object used as the source for the pixel values in the sub-region to be updated. Now the buffer is offset by y * pitch + x * bpp, in my hardcoded case 4 * 512 + 46 * 4 = 2232. This is used in gen7_emit_texture_surface_state() when the relocation entry (by drm_intel_bo_emit_reloc()) is set for the surface. I tried first taking away the offset together and realised that this makes a difference - I can't reproduce the error anymore. (Note that sampled values for the updated region are of course off but the simplified test only concerns itself on the region outside where original error takes place). I narrowed this some more and realized that 2048 + 64 + 64 + 32 + 16 = 2224 is still fine but 2225 trips over and the bug reproduces with high probability. When the surface is setup we also set vertical alignment to four (alternatives are four or eight). This pbo backed surface is linear but the documentation (bspec) seems to make no difference between tiled and linear - both should be large enough for the sampler engine to treat with aligned dimensions, Here the surface is set for dimensions 59 * 81 starting in offset 46,4. The underlying buffer has space for 128 * 64, with bpp = 4, and hence with row pitch 4 * 128 = 512, of total size 32768. The offset region without alignment goes to 59 * 512 + 2232 = 32440 which is still inside. But if the alignment of four is taken into account this trips over 32952 - 32768 = 184. Although note that going over with just 176 bytes doesn't seem to cause the error. I can't readily explain why this causes the seeming cache line failure but perhaps this somehow confuses the GPU due to the alignment restriction not being honored. (In reply to Topi Pohjolainen from comment #50) > I wonder if I finally have something. I started slicing the update blit of > the source texture (recall that this is the round overwriting a selected > region, and although the error is always in the part that corresponds to > pixels outside the region, still without this blit the error doesn't > manifest itself). > > I have simplified the test on IVB so much that the only meaningful meta > operation is this update-blit allowing me to freely hack things out in meta > and in the i965-state setup conditionally using _mesa_meta_in_progress(). I > found out that the nested meta stacking isn't a problem as without the final > kick for drawing the error can't be reproduced (or at least I don't see it > any 200 test rounds. Note that normally I get in the 2nd or 3rd round > latest). > I continued to i965-state tracking and got all the way down to texture > surface setup. This is the temporary texture created to represent the pixel > buffer object used as the source for the pixel values in the sub-region to > be updated. > > Now the buffer is offset by y * pitch + x * bpp, in my hardcoded case 4 * > 512 + 46 * 4 = 2232. This is used in gen7_emit_texture_surface_state() when > the relocation entry (by drm_intel_bo_emit_reloc()) is set for the surface. > I tried first taking away the offset together and realised that this makes a > difference - I can't reproduce the error anymore. (Note that sampled values > for the updated region are of course off but the simplified test only > concerns itself on the region outside where original error takes place). I > narrowed this some more and realized that 2048 + 64 + 64 + 32 + 16 = 2224 is > still fine but 2225 trips over and the bug reproduces with high probability. > > When the surface is setup we also set vertical alignment to four > (alternatives are four or eight). This pbo backed surface is linear but the > documentation (bspec) seems to make no difference between tiled and linear - > both should be large enough for the sampler engine to treat with aligned > dimensions, > And for the reference the section in bspec: Gen Graphics » BSpec » Memory Views » Common Surface Formats » Surface Padding Requirements » Sampling Engine Surface Even though I'm dealing with non-compressed textures here in this bug, I think the text found in the section discussing compressed textures hints that hardware has limitations w.r.t. linear orientation. It says: "This must be ensured regardless of whether the surface is stored tiled or linear. This is due to the potential rotation of cache line orientation from memory to cache." I hacked intel_buffer_objects.c::alloc_buffer_object() to pass extra 4096 bytes to the size given to drm_intel_bo_alloc(). This seems to prevent the failure also (at least I get 200 test rounds without pixel failures). I used a hack since this interface (glBufferData()) has no notion of width or height and therefore preventing cleaner solution. I would like to note that without knowing the algorithm for the cache handling, these findings are not yet conclusive - it is still possible that they are once again hiding the real explanation. (In reply to Topi Pohjolainen from comment #52) > I hacked intel_buffer_objects.c::alloc_buffer_object() to pass extra 4096 > bytes to the size given to drm_intel_bo_alloc(). This seems to prevent the > failure also (at least I get 200 test rounds without pixel failures). > > I used a hack since this interface (glBufferData()) has no notion of width > or height and therefore preventing cleaner solution. > > > I would like to note that without knowing the algorithm for the cache > handling, these findings are not yet conclusive - it is still possible that > they are once again hiding the real explanation. This also makes a difference on SKL. I tried with the original piglit test - 200 rounds without failure. I had a chat in the office with Harri. Among other things he was curious if this bug just happens with relatively small buffers (here 128 * 64 * 4 bytes). I quadrupled the size to 256 * 128. The bug manifests itself but the probability drops a lot - I needed 156 rounds with the simplified case (normally I get failures in the 2nd or 3rd round). (In reply to Topi Pohjolainen from comment #54) > I had a chat in the office with Harri. Among other things he was curious if > this bug just happens with relatively small buffers (here 128 * 64 * 4 > bytes). I quadrupled the size to 256 * 128. The bug manifests itself but the > probability drops a lot - I needed 156 rounds with the simplified case > (normally I get failures in the 2nd or 3rd round). But with 512 * 256 even 2000 rounds wasn't enough anymore to reproduce. (In reply to Topi Pohjolainen from comment #54) > I had a chat in the office with Harri. Among other things he was curious if > this bug just happens with relatively small buffers (here 128 * 64 * 4 > bytes). I quadrupled the size to 256 * 128. The bug manifests itself but the > probability drops a lot - I needed 156 rounds with the simplified case > (normally I get failures in the 2nd or 3rd round). And I just realized that I simply increased the texture and the buffer size without actually increasing the size of the sub-region being blit. This effectively rules out the theory about the buffer over-run due to alignment violation. Now there was plenty extra pages available in the buffer (height == 128). So I'm back in square one it seems. Decreasing the texture and buffer size into one fourth of the original, into 64x32, seems to give me 100% probability of reproducing on IVB with the simplified test case. I'll try with the original and SKL. (In reply to Topi Pohjolainen from comment #57) > Decreasing the texture and buffer size into one fourth of the original, into > 64x32, seems to give me 100% probability of reproducing on IVB with the > simplified test case. I'll try with the original and SKL. More on IVB first. Using 32 x 16 x 4 = 2048 looks to be too small to reproduce. On the other hand 32 x 32 x 4 = 4096 looks to give 100% probability. The latter has the buffer end at the page boundary suggesting that buffer overrun due to alignment plays some role afterall (perhaps just something more complex than discussed before). (In reply to Topi Pohjolainen from comment #58) > (In reply to Topi Pohjolainen from comment #57) > > Decreasing the texture and buffer size into one fourth of the original, into > > 64x32, seems to give me 100% probability of reproducing on IVB with the > > simplified test case. I'll try with the original and SKL. > > More on IVB first. Using 32 x 16 x 4 = 2048 looks to be too small to > reproduce. On the other hand 32 x 32 x 4 = 4096 looks to give 100% > probability. The latter has the buffer end at the page boundary suggesting > that buffer overrun due to alignment plays some role afterall (perhaps just > something more complex than discussed before). Moreover 32 x 32 seems to require only two internal iterations in the test (original has 10). (In reply to Topi Pohjolainen from comment #58) > (In reply to Topi Pohjolainen from comment #57) > > Decreasing the texture and buffer size into one fourth of the original, into > > 64x32, seems to give me 100% probability of reproducing on IVB with the > > simplified test case. I'll try with the original and SKL. > > More on IVB first. Using 32 x 16 x 4 = 2048 looks to be too small to > reproduce. On the other hand 32 x 32 x 4 = 4096 looks to give 100% > probability. The latter has the buffer end at the page boundary suggesting > that buffer overrun due to alignment plays some role afterall (perhaps just > something more complex than discussed before). Just for grins, how about 32 x 16 x 8? (In reply to Ben Widawsky from comment #60) > (In reply to Topi Pohjolainen from comment #58) > > (In reply to Topi Pohjolainen from comment #57) > > > Decreasing the texture and buffer size into one fourth of the original, into > > > 64x32, seems to give me 100% probability of reproducing on IVB with the > > > simplified test case. I'll try with the original and SKL. > > > > More on IVB first. Using 32 x 16 x 4 = 2048 looks to be too small to > > reproduce. On the other hand 32 x 32 x 4 = 4096 looks to give 100% > > probability. The latter has the buffer end at the page boundary suggesting > > that buffer overrun due to alignment plays some role afterall (perhaps just > > something more complex than discussed before). > > Just for grins, how about 32 x 16 x 8? If you give me a GL format that requires eight bytes I can see if the test can be modified. (I don't know the GL api well enough to know if such thing exists). I got curious if the failing case is for some reason getting pixel values for the previous framebuffer. Note that even the simplest case requires two internal rounds (the test wants to try with different sub-regions - in my case these are now identical as the region is hardcoded). I modified the test to use different pixel values for the first and for the second round. The second round fails and the failing pixel values are such that would be valid only in the second round (in position 0,8 the values are such that would be correct for position 0,0). The pixel values are such that wouldn't be valid for any position in the first round. (In reply to Topi Pohjolainen from comment #62) > I got curious if the failing case is for some reason getting pixel values > for the previous framebuffer. Note that even the simplest case requires two > internal rounds (the test wants to try with different sub-regions - in my > case these are now identical as the region is hardcoded). > I modified the test to use different pixel values for the first and for the > second round. The second round fails and the failing pixel values are such > that would be valid only in the second round (in position 0,8 the values are > such that would be correct for position 0,0). The pixel values are such that > wouldn't be valid for any position in the first round. Interesting enough, this setup also decreases the probability of hitting the bug a lot. I thought I better note down something I noticed in the last failing 32x32 case. Linear pbo buffer was in GTT offset 0x77a8000 while the final surface to be sampled in turn started right after in the next page at 0x77a9000. I need to double check now the latest 32x32 setup. I wanted to try out of curiosity how IVB behaves with fast-clears enabled. I now get the same 4x4 block pixel failure in the second iteration with latest mesa-master. (In reply to Topi Pohjolainen from comment #65) > I need to double check now the latest 32x32 setup. I wanted to try out of > curiosity how IVB behaves with fast-clears enabled. I now get the same 4x4 > block pixel failure in the second iteration with latest mesa-master. The case looks sound. Furthermore, the same setup passes 2000 test rounds with the latest mesa-master on SKL just fine. I've been mostly looking SKL for now as I noticed that the pixel error is in fact very different there. While it happens with RGBA format on IVB and is always the size of 4x4 pixel block, on SKL it happens with LUMINANCE8 and INTENSITY8 and the errors span on over several 4x4 blocks. I'm trying to narrow it down and simplify the piglit test. Only on SKL the bug seems to be very sensitive disappears a lot easier than on IVB. I had earlier verified that in a failing run the source texture had correct pixel values in it (by dumping on the disk). I hadn't been able to verify that the same applies for the source texture after update round (with 200 test rounds I just couldn't get it to fail). I had left my laptop running it for 10000 rounds and I actually got a failing round with a dump (7101 test rounds where needed). Only problem is that as there are two internal iterations and it is always the second failing, I actually have total garbage for both on the disk. I can't readily explain this - it maybe that I'm missing a flush before the dump. I used "src/mesa/main/debug.c::write_ppm()". (In reply to Topi Pohjolainen from comment #68) > I had earlier verified that in a failing run the source texture had correct > pixel values in it (by dumping on the disk). I hadn't been able to verify > that the same applies for the source texture after update round (with 200 > test rounds I just couldn't get it to fail). I had left my laptop running it > for 10000 rounds and I actually got a failing round with a dump (7101 test > rounds where needed). Only problem is that as there are two internal > iterations and it is always the second failing, I actually have total > garbage for both on the disk. I can't readily explain this - it maybe that > I'm missing a flush before the dump. I used > "src/mesa/main/debug.c::write_ppm()". And the actual pixels gotten using intel_get_tex_sub_image(). Even though the error is very random on SKL and I have a hard time narrowing it down (i.e., simplify the piglit test itself), I see a pattern supporting the "buffer overrun due to alignment violation"-theory. Every time I see a failure, the corresponding texture update round reads the pixel-buffer-object until the last line. I modified the test to use a region that don't require access to the last four rows of the pixel buffer object. This seems to make the bug to disappear. I updated my IVB laptop (kernel 3.19.8 -> 4.1.4), and I thought I give the simplified test a try. I still get failing pixel values but now only for 2x4 block. Therefore one cannot explain the error anymore by a cache line holding data for another part for texture - half of the pixel values are correct while only the other half is incorrect. The final blit from the texture to the framebuffer is executed using fixed-function fragment shader. I hacked Mesa to simulate the color values in the test for R-component - the test uses values depending on the Y-component, specifically R = 4 * y. So I hacked Mesa to overwrite the R-component to 4 * y instead of using the value read from the texture: pln(8) g127<1>F g5.4<0,1,0>F g2<8,8,1>F pln(8) g8<1>F g6<0,1,0>F g2<8,8,1>F pln(8) g9<1>F g6.4<0,1,0>F g2<8,8,1>F pln(8) g4<1>F g7.4<0,1,0>F g2<8,8,1>F mul(8) g124<1>F g9<8,8,1>F 4F math inv(8) g5<1>F g4<8,8,1>F null mul(8) g2<1>F g8<8,8,1>F g5<8,8,1>F mul(8) g3<1>F g9<8,8,1>F g5<8,8,1>F send(8) g2<1>UW g2<8,8,1>F sampler sample SIMD8 Surface = 1 Sampler = 0 mlen 2 rlen 4 { align1 1Q }; mov(8) g125<1>F g3<8,8,1>F mov(8) g126<1>F g4<8,8,1>F sendc(8) null g124<8,8,1>F render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT }; With this I can get the same intermittent pixel failures I'm seeing without: 32:0:0:255 0:0:0:255 x=0 y=8 z=0 32:2:0:255 0:2:0:255 x=1 y=8 z=0 32:4:0:255 0:4:0:255 x=2 y=8 z=0 32:6:0:255 0:6:0:255 x=3 y=8 z=0 36:0:0:255 4:0:0:255 x=0 y=9 z=0 36:2:0:255 4:2:0:255 x=1 y=9 z=0 36:4:0:255 4:4:0:255 x=2 y=9 z=0 36:6:0:255 4:6:0:255 x=3 y=9 z=0 40:0:0:255 8:0:0:255 x=0 y=10 z=0 40:2:0:255 8:2:0:255 x=1 y=10 z=0 40:4:0:255 8:4:0:255 x=2 y=10 z=0 40:6:0:255 8:6:0:255 x=3 y=10 z=0 44:0:0:255 12:0:0:255 x=0 y=11 z=0 44:2:0:255 12:2:0:255 x=1 y=11 z=0 44:4:0:255 12:4:0:255 x=2 y=11 z=0 44:6:0:255 12:6:0:255 x=3 y=11 z=0 First four values are the expected, second four are the values read back. This tells us that the problem isn't necessarily sampling - it looks that the y-coordinate is being wrong during the tex-2-fb blit. Just a sent a series against piglit. I found a combination that makes the error 100% reproducible on SKL hardware: texsubimage pbo manual GL_TEXTURE_2D GL_RGB8 6 10 0 94 53 0 -auto -fbo I'm still no smarter why it fails though. I am on paternity leave until April 5. Expect a very very delayed reply. (In reply to Topi Pohjolainen from comment #72) > The final blit from the texture to the framebuffer is executed using > fixed-function fragment shader. I hacked Mesa to simulate the color values > in the test for R-component - the test uses values depending on the > Y-component, specifically R = 4 * y. > So I hacked Mesa to overwrite the R-component to 4 * y instead of using the > value read from the texture: > > pln(8) g127<1>F g5.4<0,1,0>F g2<8,8,1>F > pln(8) g8<1>F g6<0,1,0>F g2<8,8,1>F > pln(8) g9<1>F g6.4<0,1,0>F g2<8,8,1>F > pln(8) g4<1>F g7.4<0,1,0>F g2<8,8,1>F > mul(8) g124<1>F g9<8,8,1>F 4F > math inv(8) g5<1>F g4<8,8,1>F null > mul(8) g2<1>F g8<8,8,1>F g5<8,8,1>F > mul(8) g3<1>F g9<8,8,1>F g5<8,8,1>F > send(8) g2<1>UW g2<8,8,1>F > sampler sample SIMD8 Surface = 1 Sampler = 0 mlen 2 rlen 4 { align1 1Q }; > mov(8) g125<1>F g3<8,8,1>F > mov(8) g126<1>F g4<8,8,1>F > sendc(8) null g124<8,8,1>F > render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT }; > > > With this I can get the same intermittent pixel failures I'm seeing without: > > 32:0:0:255 0:0:0:255 x=0 y=8 z=0 > 32:2:0:255 0:2:0:255 x=1 y=8 z=0 > 32:4:0:255 0:4:0:255 x=2 y=8 z=0 > 32:6:0:255 0:6:0:255 x=3 y=8 z=0 > 36:0:0:255 4:0:0:255 x=0 y=9 z=0 > 36:2:0:255 4:2:0:255 x=1 y=9 z=0 > 36:4:0:255 4:4:0:255 x=2 y=9 z=0 > 36:6:0:255 4:6:0:255 x=3 y=9 z=0 > 40:0:0:255 8:0:0:255 x=0 y=10 z=0 > 40:2:0:255 8:2:0:255 x=1 y=10 z=0 > 40:4:0:255 8:4:0:255 x=2 y=10 z=0 > 40:6:0:255 8:6:0:255 x=3 y=10 z=0 > 44:0:0:255 12:0:0:255 x=0 y=11 z=0 > 44:2:0:255 12:2:0:255 x=1 y=11 z=0 > 44:4:0:255 12:4:0:255 x=2 y=11 z=0 > 44:6:0:255 12:6:0:255 x=3 y=11 z=0 > > First four values are the expected, second four are the values read back. > This tells us that the problem isn't necessarily sampling - it looks that > the y-coordinate is being wrong during the tex-2-fb blit. This comment on coordinates being false looks to be a wrong conclusion. Here I hacked the blit program to mimick the expected values. Now on SKL I chose to output running x,y coordinate values directly as color values. In the failing two pixels wide column (on SKL in position 6,10 7,27) I'm seeing the original values and it rather looks that the updating blit fails to write this column altogether. It is not certain if it is the tex-update-blit itself that is failing. Running the same test without -fbo option works. And that option doesn't affect the blit in question in anyway. Also adding a glClear() (against the main fbo) just after the tex-blit makes the error go a way just as in IVB case. Hi Thank you for your email. I am on OoO training until Friday 4th of of Feb 2016. I am monitoring mails during training but If you have an urgent issue, which must be addressed during this time please SMS me +358503819465. Br, Jani (In reply to Topi Pohjolainen from comment #76) > It is not certain if it is the tex-update-blit itself that is failing. > Running the same test without -fbo option works. And that option doesn't > affect the blit in question in anyway. > > Also adding a glClear() (against the main fbo) just after the tex-blit makes > the > error go a way just as in IVB case. Dumping the contents of the blit just after it unfortunately makes the error go away as well (intel_miptree_map()/unmap() pair seems to help). (In reply to Topi Pohjolainen from comment #78) > (In reply to Topi Pohjolainen from comment #76) > > It is not certain if it is the tex-update-blit itself that is failing. > > Running the same test without -fbo option works. And that option doesn't > > affect the blit in question in anyway. > > > > Also adding a glClear() (against the main fbo) just after the tex-blit makes > > the > > error go a way just as in IVB case. > > Dumping the contents of the blit just after it unfortunately makes the error > go away as well (intel_miptree_map()/unmap() pair seems to help). But dumping the contents after cache flush (just after brw_emit_mi_flush()) in brw_render_cache_set_check_flush() gives me the original value and not the one written by the blit. This cache flush is preceding the final blit from the texture to fbo. It is the way of dumping that makes a difference. First and third dump below are taken by mapping directly using brw_bo_map_gtt(). The second is taken using intel_miptree_map() which does full batch buffer flush (_intel_batchbuffer_flush()) before actually mapping the buffer: 6:10 r=40 g=12 b=0 a=255 6:10 r=12 g=0 b=40 a=255 6:10 r=12 g=0 b=40 a=255 In the normal run the batch buffer flush doesn't come between the writing blit and the reading blit. This appears to be fixed by Curro's fix to our render-to-texture PIPE_CONTROL flushes: commit 72473658c51d5e074ce219c1e6385a4cce29f467 Author: Kenneth Graunke <kenneth@whitecape.org> Date: Fri Mar 25 15:33:35 2016 -0700 i965: Fix brw_render_cache_set_check_flush's PIPE_CONTROLs. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.