|Summary:||GPU hang when running heavy compute workload|
|Product:||Mesa||Reporter:||Wu Zhiwen <zhiwen.wu>|
|Component:||Drivers/Vulkan/intel||Assignee:||Intel 3D Bugs Mailing List <intel-3d-bugs>|
|Status:||RESOLVED NOTOURBUG||QA Contact:||Intel 3D Bugs Mailing List <intel-3d-bugs>|
|i915 platform:||i915 features:|
Description Wu Zhiwen 2018-12-05 07:24:33 UTC
I wrote an compute shader to do the convolution algorithm and run it on Intel Apollo Lake GPU by using Vulkan API. When the convolution is a heavy workload, GPU hang occurred. ==== Test environments: Ubuntu 16.04 Mesa 18.3 Vulkan SDK: 126.96.36.199 CPU: Intel Celeron J3455 GPU: HD Graphics 500 (Apollo Lake, 12 EU) ==== Steps to reproduce: git clone https://github.com/wzw-intel/vulkan_minimal_compute.git cd vulkan_minimal_compute mkdir build cd build cmake .. make cd ../ ./build/vulkan_minimal_compute ==== What does the test program do This program will run a convolution shader 10 times serially. Each run will be synced by a dedicated VkFence object. GPU hang may occur at any iteration and print log "INTEL-MESA: error: vulkan/anv_device.c:2091: GPU hang on one of our command buffers (VK_ERROR_DEVICE_LOST)" Not every run for program triger the GPU hang. If not hang, try more. ==== Other foundings: - Setting "LIGHT_WORKLOAD=1" environement variable (it makes the total GFLOPS reduced by 50%) make GPU hang disappear. It seems that GPU hang only occur for heavy workload - No GPU hang for high end Intel GPU. I tested this program on i7-6770HQ (GPU: Iris Pro Graphics 580, GT4e, 72 EU), no GPU hang. But on Intel Celeron J3455 (GPU: HD Graphics 500). and Intel Soc with HD Graphics 530 GPU, GPU hang occurs.
Comment 2 Lionel Landwerlin 2018-12-05 12:38:19 UTC
A probable guess is that your shader is taking too long to complete, so the i915 driver declares that your workload has hung the GPU even though it's still in process. You can recompile your kernel with an adjusted value for DRM_I915_HANGCHECK_PERIOD or disable the hangcheck by giving the i915.enable_hangcheck=0 parameter on the kernel command line. If that solves your problem, I'll reassign the issue to i915.
Comment 3 Wu Zhiwen 2018-12-06 02:37:21 UTC
Problem solved. I tried "i915.enable_hangcheck=0" kernel option, and no GPU hang anymore. Thank you.
Comment 4 Jason Ekstrand 2018-12-06 03:42:12 UTC
In that case, I'm closing this bug. The compute shader is just taking too long to run and triggering the kernel watchdog timer.