Summary: | [BDW BSW SKL CTS] ES31-CTS.texture_gather.* GPU_HANG | ||
---|---|---|---|
Product: | Mesa | Reporter: | Marta Löfstedt <marta.lofstedt> |
Component: | Drivers/DRI/i965 | Assignee: | Jordan Justen <jljusten> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | major | ||
Priority: | high | CC: | ben, idr, jljusten, kenneth, marta.lofstedt |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 92778, 93312 | ||
Attachments: |
dmesg with GPU_HANG
implement WaSendDummyVFEafterPipelineSelect i965-upload_compute_sampler_state.patch |
This sounds a little bit similar to bug #92623. In that bug, Ken was able to make the hang go away by: FWIW, enabling the "Always re-emit all state" block in brw_state_upload.c:708 seems to fix this problem. always_flush_batch=true and always_flush_cache=true (way more flushing) have no impact. This makes me suspect missing dirty flags... Marta: Can you try this to see if it helps? Ken: Is there any other information that might be helpful? I don't think this has anything to do with that. That bug is about PMA and early depth testing. None of the workarounds work here. This error state is completely different. CS (command streamer) and TSG are busy, and the offending batch appears to be a compute shader invocation. Guessing it's some kind of compute bug. CC'ing Jordan. Ben suggested taking a look at WaSendDummyVFEafterPipelineSelect. "TSG unit writes null entries into context which can hang restore CS and TSG. WA: Send dummy VFE immediately after GPGPU pipeline select." Created attachment 119232 [details] [review] implement WaSendDummyVFEafterPipelineSelect I was waiting for a build of something so I thought I'd give the patch a shot. This is just compile tested. Unfortunately the "implement WaSendDummyVFEafterPipelineSelect" patch does not help. Some combinations of the test are not causing GPU_HANG, for example: ./glcts --deqp-case=ES31-CTS.texture_gather.plain-gather-int* is OK, but: ./glcts --deqp-case=ES31-CTS.texture_gather.plain-gather-uint* ./glcts --deqp-case=ES31-CTS.texture_gather.plain-gather-float* cause HANG. (In reply to Marta Löfstedt from comment #5) > Unfortunately the "implement WaSendDummyVFEafterPipelineSelect" patch does > not help. > > Some combinations of the test are not causing GPU_HANG, for example: > ./glcts --deqp-case=ES31-CTS.texture_gather.plain-gather-int* > is OK, > > but: > ./glcts --deqp-case=ES31-CTS.texture_gather.plain-gather-uint* > ./glcts --deqp-case=ES31-CTS.texture_gather.plain-gather-float* > cause HANG. Can you please post the 4 INSTDONE values from the error state both with, and without the patch? Note that you must manually clear the error state by writing to the sysfs file. The values are the same both with and without the patch: INSTDONE_0: 0xffddffff INSTDONE_1: 0xffffffff INSTDONE_2: 0xffffffff INSTDONE_3: 0xfffeffff I don't know how to run deqp... could you dump the shader? INTEL_DEBUG=cs (I am guessing) It's not deqp it is Khronos CTS its available in teamforge. When running the tests individually these test will fail on both BSW and SKL: ES31-CTS.texture_gather.plain-gather-depth-2d ES31-CTS.texture_gather.plain-gather-depth-2darray ES31-CTS.texture_gather.plain-gather-depth-cube ES31-CTS.texture_gather.offset-gather-depth-2d ES31-CTS.texture_gather.offset-gather-depth-2darray These are the only tests that set internal format to GL_DEPTH_COMPONENT32F for glTexImage2D|3D I don't think this is related to the GPU HANG, since running: ./glcts --deqp-case=ES31-CTS.texture_gather.plain-gather-uint* does not run any "depth" cases. However, only running the "depth" cases: ./glcts --deqp-case=ES31-CTS.texture_gather.plain-gather*depth* will cause GPU HANG as a lot of other combinations of texture gather test discussed earlier. *** Bug 93725 has been marked as a duplicate of this bug. *** Created attachment 121077 [details] [review] i965-upload_compute_sampler_state.patch This fixes the problem for me. Your patch is amazing, Curro! With your patch on top of Mesa: git@ad20be1f30ef5 I can no longer reproduce the HANG on BDW and SKL. And all texture gather tests pass. Also, it fixes BUG 93312. Please send it up to mesa-dev! *** Bug 93312 has been marked as a duplicate of this bug. *** *** Bug 93325 has been marked as a duplicate of this bug. *** *** Bug 93407 has been marked as a duplicate of this bug. *** Cool, thanks for testing it out :) commit f8ac314cc2353f439e6a917db4e3aeaf47e2093e Author: Francisco Jerez <currojerez@riseup.net> Date: Sat Jan 16 15:11:03 2016 -0800 i965: Implement compute sampler state atom. Fixes a number of GLES31 CTS failures and hangs on various hardware: ES31-CTS.texture_gather.plain-gather-depth-2d ES31-CTS.texture_gather.plain-gather-depth-2darray ES31-CTS.texture_gather.plain-gather-depth-cube ES31-CTS.texture_gather.offset-gather-depth-2d ES31-CTS.texture_gather.offset-gather-depth-2darray ES31-CTS.layout_binding.sampler2D_layout_binding_texture_ComputeShader ES31-CTS.layout_binding.sampler2DArray_layout_binding_texture_ComputeShader ES31-CTS.explicit_uniform_location.uniform-loc-types-samplers ES31-CTS.compute_shader.resources-texture Some of them were actually passing by luck on some generations even though we weren't uploading sampler state tables explicitly for the compute stage, most likely because they relied on the cached sampler state left from previous rendering to be close enough. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92589 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93312 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93325 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93407 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93725 Reported-by: Marta Lofstedt <marta.lofstedt@intel.com> Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 119061 [details] dmesg with GPU_HANG Software versions: 4.3.0-rc3+ OpenGL version string: 3.0 Mesa 11.1.0-devel (git-13a5805) GPU hardware: OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 5300 (Broadwell GT2) 00:02.0 VGA compatible controller [0300]: Intel Corporation Broadwell-U Integrated Graphics [8086:161e] (rev 06) CPU hardware: x86_64 Genuine Intel(R) CPU 0000 @ 0.60GHz CTS version: git@67ae88f31295 command: ./glcts --deqp-case=ES31-CTS.texture_gather* Environment: Mesa built with: --enable-debug export MESA_GLES_VERSION_OVERRIDE=3.1 export MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader ------------------------------------------------------ When running all the texture_gather test of the OpenGL ES 3.1 CTS, execution stops after 3 or 4 test with: "intel_do_flush_locked failed: Input/output error" dmesg shows:[drm] GPU HANG: ecode 8:0:0x85ddfffb, in glcts [26007], reason: Ring hung, action: reset The is not reproducible when the tests are run alone. The issue is not reproducible on HSW.