Bug 108820 - [SKL] GPU hangs in benchmarks using compute shaders with drm-tip v4.20-rc kernels
Summary: [SKL] GPU hangs in benchmarks using compute shaders with drm-tip v4.20-rc ker...
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2018-11-21 11:34 UTC by Eero Tamminen
Modified: 2018-11-30 16:47 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eero Tamminen 2018-11-21 11:34:02 UTC
Setup:
* SKL GT2 / GT3e
* Ubuntu 18.04
* *drm-tip* v4.19 kernel
* Mesa & X git head

Test-case:
* Run a test-case using compute shaders

Expected output:
* No GPU hangs (like with earlier Mesa commits)

Actual output:
* Recoverable GPU hangs in compute shader using test-cases:
  - GfxBench Aztec Ruins, CarChase and Manhattan 3.1
  - Sacha Willems' Vulkan compute demos
  - SynMark CSDof / CSCloth
* Vulkan compute demos fail to run (other tests run successfully despite hangs)

This seems to be SKL specific, it's not visible on other HW.

This regression happened between following Mesa commits:
* dca35c598d: 2018-11-19 15:57:41: intel/fs,vec4: Fix a compiler warning
* a999798daa: 2018-11-20 17:09:22: meson: Add tests to suites

It also seems to be specific to *drm-tip* v4.19.0 kernel as I don't see it with latest drm-tip v4.20.0-rc3 kernel.  So it's also possible that it's a bug in i915, that just gets triggered by Mesa change, and which got fixed later.


Sacha Willems' Vulkan Raytracing demo outputs following on first run:
---------------------------------
SPIR-V WARNING:
    In file src/compiler/spirv/vtn_variables.c:1897
    Source and destination types of SpvOpStore do not have the same ID (but are compatible): 225 vs 212
    14920 bytes into the SPIR-V binary
SPIR-V WARNING:
    In file src/compiler/spirv/vtn_variables.c:1897
    Source and destination types of SpvOpStore do not have the same ID (but are compatible): 225 vs 212
    10300 bytes into the SPIR-V binary
SPIR-V WARNING:
    In file src/compiler/spirv/vtn_variables.c:1897
    Source and destination types of SpvOpStore do not have the same ID (but are compatible): 269 vs 256
    10944 bytes into the SPIR-V binary
SPIR-V WARNING:
    In file src/compiler/spirv/vtn_variables.c:1897
    Source and destination types of SpvOpStore do not have the same ID (but are compatible): 225 vs 212
    11920 bytes into the SPIR-V binary
INTEL-MESA: error: src/intel/vulkan/anv_device.c:2091: GPU hung on one of our command buffers (VK_ERROR_DEVICE_LOST)
vulkan_raytracing: base/vulkanexamplebase.cpp:651: void VulkanExampleBase::submitFrame(): Assertion `res == VK_SUCCESS' failed.
-----------------------------

(Other runs show just the error and assert.)
Comment 1 Mark Janes 2018-11-21 18:56:22 UTC
Since this bug is limited to a drm-tip kernel, it seems likely that the problem is in the kernel, not in mesa.  Can you reproduce it on any released kernel?
Comment 2 Eero Tamminen 2018-11-28 14:41:51 UTC
(In reply to Eero Tamminen from comment #0)
> It also seems to be specific to *drm-tip* v4.19.0 kernel as I don't see it
> with latest drm-tip v4.20.0-rc3 kernel.  So it's also possible that it's a
> bug in i915, that just gets triggered by Mesa change, and which got fixed
> later.

I've now seen hangs also with drm-tip v4.20.0-rc3 kernel.


However, these GPU hangs don't happen anymore with this or later Mesa commit
(regardless of whether they're with v1.19 or v4.20-rc4 drm-tip kernels):
3c96a1e3a97ba 2018-11-26 08-29-39: radv: Fix opaque metadata descriptor last layer

-> FIXED?

(I'm lacking data for several previous days, so I can't give an exact time when those hangs stopped.)

Raytracing demo SPIR-V warnings happen still, although I updated Sacha Willem's demos to latest Git version.
Comment 3 Eero Tamminen 2018-11-30 16:47:18 UTC
Sorry, all the hangs have happened with drm-tip v4.20-rc versions, not v4.19.

Last night there were again recoverable hangs on SKL, with drm-tip v4.20-rc4:
* GfxBench v5-GOLD2 Aztec Ruins GL & Vulkan ("normal") versions
* Ungine Heaven v4.0
* SynMark v7 CSCloth

Heaven doesn't use compute shaders, so maybe the issue isn't compute related after all.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.