Summary: | [regression, bisected][BDW, GPU hang] stuck on render ring, always reproducible | ||
---|---|---|---|
Product: | Mesa | Reporter: | regwz <regwz> |
Component: | Drivers/DRI/i965 | Assignee: | Jason Ekstrand <jason> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | diego.viola, intel-gfx-bugs, zhouwei400 |
Version: | 12.0 | Keywords: | bisected, regression |
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | BDW | i915 features: | GPU hang |
Bug Depends on: | |||
Bug Blocks: | 98335 | ||
Attachments: |
gpu crash dump
dmesg output (drm.debug=0x1e) glxinfo lspci -vvvnn Chromium stderr output git bisect log git-revert of the offending commit (mesa 12.0.3) aub trace of crash: sklgt2 git bisect log |
Description
regwz
2016-09-12 15:37:06 UTC
Created attachment 126467 [details]
dmesg output (drm.debug=0x1e)
Created attachment 126468 [details]
glxinfo
Created attachment 126469 [details]
lspci -vvvnn
Created attachment 126470 [details]
Chromium stderr output
Assigning to Mesa product. From this error dump, hung is happening in render ring batch with active head at 0xf7299f04, with 0x7b000005 (3DPRIMITIVE) as IPEHR. Batch extract (around 0xf7299f04): 0xf7299ed4: 0x78490001: 3D UNKNOWN: 3d_965 opcode = 0x7849 0xf7299ed8: 0x00000004: MI_NOOP 0xf7299edc: 0x00000000: MI_NOOP 0xf7299ee0: 0x780c0000: 3D UNKNOWN: 3d_965 opcode = 0x780c 0xf7299ee4: 0x00000000: MI_NOOP Bad length 7 in (null), expected 6-6 0xf7299ee8: 0x7b000005: 3DPRIMITIVE: fail sequential 0xf7299eec: 0x00000104: vertex count 0xf7299ef0: 0x00019470: start vertex 0xf7299ef4: 0x00000000: instance count 0xf7299ef8: 0x00000001: start instance 0xf7299efc: 0x00000000: index bias 0xf7299f00: 0x00000000: MI_NOOP 0xf7299f04: 0x78150009: 3D UNKNOWN: 3d_965 opcode = 0x7815 0xf7299f08: 0x00000004: MI_NOOP 0xf7299f0c: 0x00000000: MI_NOOP I can reproduce this as well. Arch Linux (x86-64) Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz Confirmed on mesa 12.0.3 on SKL GT2 (8086:1912) with kernel v4.7. Error state available on request (ping me on IRC, don't cc me on this bug). Linux myhost 4.7.6-1-ARCH #1 SMP PREEMPT Fri Sep 30 19:28:42 CEST 2016 x86_64 GNU/Linux mesa 12.0.3-3 Has this always been broken? Have you tried bisecting? I haven't checked that, and it turns out I should have. It can't be reproduced with mesa-11.2.2-1 and it's broken again in mesa-12.0.0-1. (In reply to regwz from comment #10) > I haven't checked that, and it turns out I should have. > It can't be reproduced with mesa-11.2.2-1 and it's broken again in > mesa-12.0.0-1. This looks more like a kernel issue than Mesa issue. I ran a bisect with the following result: 091b6156dd8553979336c15acdaf140e5419c483 is the first bad commit commit 091b6156dd8553979336c15acdaf140e5419c483 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Tue Dec 8 17:34:38 2015 -0800 i965/fs: Push small uniform arrays Unfortunately, this also means that we need to use a slightly different algorithm for assign_constant_locations. The old algorithm worked based on the assumption that each read of a uniform value read exactly one float. If it encountered a MOV_INDIRECT, it would immediately bail and push the whole thing. Since we can now read ranges using MOV_INDIRECT, we need to be able to push a series of floats without breaking them up. To do this, we use an algorithm similar to the on in split_virtual_grfs. I also verified that the bug can no longer be reproduced after reverting the commit from mesa 12.0.3. Created attachment 127343 [details]
git bisect log
Created attachment 127344 [details] [review] git-revert of the offending commit (mesa 12.0.3) Jason, the aub dump is hosted internally at: http://otc-mesa-ci.jf.intel.com/userContent/Bug_97779.aub Created attachment 127369 [details]
aub trace of crash: sklgt2
(In reply to regwz from comment #12) > I ran a bisect with the following result: > > 091b6156dd8553979336c15acdaf140e5419c483 is the first bad commit > commit 091b6156dd8553979336c15acdaf140e5419c483 > Author: Jason Ekstrand <jason.ekstrand@intel.com> > Date: Tue Dec 8 17:34:38 2015 -0800 > > i965/fs: Push small uniform arrays > > Unfortunately, this also means that we need to use a slightly different > algorithm for assign_constant_locations. The old algorithm worked based > on > the assumption that each read of a uniform value read exactly one float. > If it encountered a MOV_INDIRECT, it would immediately bail and push the > whole thing. Since we can now read ranges using MOV_INDIRECT, we need to > be able to push a series of floats without breaking them up. To do this, > we use an algorithm similar to the on in split_virtual_grfs. This bisect is bad. You were bisecting through the Vulkan merge. Back when the Vulkan driver was still in development the i965 driver in the vulkan branch was very unstable. In order to get a proper bisect, you need to do so while ignoring the vulkan branch. The easiest way to do this is probably to test right before the vulkan branch merged and right after. The vulkan branch merging shouldn't have caused any problems. If those tests are good, bisect between the merge and 12.0. If they're bad, bisect between some older known-good commit and the vulkan merge. > I also verified that the bug can no longer be reproduced after reverting the > commit from mesa 12.0.3. I doubt that given that I can't get that commit to revert cleanly. A similar commit does exist in the main tree and happened shortly prior to merging the vulkan branch. The commit you point to got lost in the merge. In any case, please re-bisect. (In reply to Jason Ekstrand from comment #17) > This bisect is bad. You were bisecting through the Vulkan merge. Back when > the Vulkan driver was still in development the i965 driver in the vulkan > branch was very unstable. In order to get a proper bisect, you need to do > so while ignoring the vulkan branch. Sorry about that and thank you for the explanation. I redid the bisect, here are the results: 963513bb24bdd542f1af3733fab53ad450d3221b is the first bad commit commit 963513bb24bdd542f1af3733fab53ad450d3221b Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Tue Dec 8 17:34:38 2015 -0800 i965/fs: Push small uniform arrays Unfortunately, this also means that we need to use a slightly different algorithm for assign_constant_locations. The old algorithm worked based on the assumption that each read of a uniform value read exactly one float. If it encountered a MOV_INDIRECT, it would immediately bail and push the whole thing. Since we can now read ranges using MOV_INDIRECT, we need to be able to push a series of floats without breaking them up. To do this, we use an algorithm similar to the on in split_virtual_grfs. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org> > I doubt that given that I can't get that commit to revert cleanly. A > similar commit does exist in the main tree and happened shortly prior to > merging the vulkan branch. The commit you point to got lost in the merge. > In any case, please re-bisect. Yes, there were merge conflicts, but I resolved them manually (see attachment https://bugs.freedesktop.org/attachment.cgi?id=127344). Created attachment 127447 [details]
git bisect log
This bug should be fixed by the following commit: commit 2a4a86862c949055c71637429f6d5f2e725d07d8 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Fri Oct 28 14:48:53 2016 -0700 i965/fs/generator: Don't use the address immediate for MOV_INDIRECT The address immediate field is only 9 bits and, since the value is in bytes, the highest GRF we can point to with it is g15. This makes it pretty close to useless for MOV_INDIRECT. There were already piles of restrictions preventing us from using it prior to Broadwell, so let's get rid of the gen8+ code path entirely. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97779 Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com> I've tagged it for stable so it should be in 13.0 and it may even get into a 12.0 stable release. I can confirm this is fixed, thanks. Arch Linux (x86-64) mesa 13.0.0-1 *** Bug 98412 has been marked as a duplicate of this bug. *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.