Summary: | [regression] GPU hangs and misrendering in Vulkan programs on GEN8+ SoCs | ||
---|---|---|---|
Product: | Mesa | Reporter: | Eero Tamminen <eero.t.tamminen> |
Component: | Drivers/DRI/i965 | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | major | ||
Priority: | high | CC: | chris |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=96743 https://bugs.freedesktop.org/show_bug.cgi?id=101571 |
||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | 101571 | ||
Bug Blocks: |
Description
Eero Tamminen
2017-05-04 15:35:37 UTC
IndirectDraw demo is also failing on BSW/BXT/GLK. Occasionally compute (Raytracing & N-Body ones that are actually compute bound) demos will also fail on BXT: Fatal : VkResult is "ERROR_DEVICE_LOST" in /home/testrunner/work/SachaWillemsVulkan/base/VulkanTextOverlay.hpp at line 288 (There are too many things GPU hanging on BXT to know whether these are also GPU hangs, but at least on other platforms GPU hangs cause above error message.) This bisects to the following commit: commit 35e626bd0e59e7ce9fd97ccef66b2468c09206a4 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Thu Apr 13 16:30:19 2017 -0700 anv: Set EXEC_OBJECT_ASYNC when available Reviewed-by: Chad Versace <chadversary@chromium.org> I have no idea why it's broken yet. This seems like it should be safe but maybe the kernel doesn't work quite the way I think it does. Ok, I managed to track down what I think the problem is. When a BO has the EXEC_OBJECT_ASYNC flag set, the kernel is not properly flushing write-combine mappings. I'm not sure if this is a kernel bug or if that behavior is intended. Write-combined mappings? They are always flushed. Write-back mappings on byt/bsw/bxt are left to you since you opt out of the kernel synchronisation. (In reply to Eero Tamminen from comment #0) > Last time Sacha Willems' Multithreading Vulkan demo worked fine was with: > ----------------------------- > b295a52836 at 2017-04-27 16:52:25 UTC > clover: Fix build since clang r301442 > ----------------------------- > > Day later, at: > ----------------------------- > 85ca563b58 at 2017-04-28 15:54:45 UTC > anv: Drop 'x11' prefix from non-X11 WSI funcs > ----------------------------- > > And after that, it seems to be GPU hanging on all GEN8+ SOC platforms: BSW, > BXT, GLK... Btw. the still working demos got much faster during that time frame, and that could have exposed the issue. E.g. compute raytracing perf doubled. (I don't see what in the Mesa commits between these commits could explain such a perf improvement and I don't yet have bisecting setup for Vulkan.) (In reply to Jason Ekstrand from comment #3) > Ok, I managed to track down what I think the problem is. When a BO has the > EXEC_OBJECT_ASYNC flag set, the kernel is not properly flushing > write-combine mappings. I'm not sure if this is a kernel bug or if that > behavior is intended. (In reply to Chris Wilson from comment #4) > Write-combined mappings? They are always flushed. Write-back mappings on > byt/bsw/bxt are left to you since you opt out of the kernel synchronisation. Hangs are still happening. And rendering in all Vulkan demos is broken on all SoCs since this too. Is there a plan to address the issue (in some other way than reverting the breaking commit)? Jason mentioned that there's a kernel bug related to this: https://bugs.freedesktop.org/show_bug.cgi?id=101571 (My testing is done with kernel from around same time as Mesa.) Did see hangs anymore last night and Chris' patch in bug 101571 fixes the rendering issues. This was fixed with bug 101571. Vulkan tests work now fine on BXT & BSW. Vulkan compute tests have still issues on BYT, but that's not kernel issue. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.