Created attachment 138701 [details]
Running HEVC FEI encode media workloads we see a random system stall issue.
We observe the issue on specifically this hardware:
* Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0c 10/06/2017
* video card device ID 0x191d
We do not see the bug on other SKL systems in our possession.
The issue is reproducible on CentOS 7 (7.4) with 4.14 and drm-tip (commit 29940f138482ff38047287ad288cea1fcf1f73b4) kernels.
The issue is a regression. We could bisect the faulty patch, it's below:
From bedc054eb8a2751aa22797c4a21a34029b3e5475 Mon Sep 17 00:00:00 2001
From: Imre Deak <firstname.lastname@example.org>
Date: Wed, 18 Nov 2015 19:53:50 +0200
Subject: [PATCH 08/10] drm/i915/skl: re-enable power well support
Now that the known DMC/DC issues are fixed, let's try again and
re-enable the power well support.
Please note that in order to boot on drm-tip we have to use additional parameters:
intel_pstate=disable i915.enable_rc6=0 intel_idle.max_cstate=1
The Supermicro systems boot but i915 produces some warnings. https://bugs.freedesktop.org/show_bug.cgi?id=105949 describes it better.
1) Build this stack: https://software.intel.com/en-us/articles/build-and-debug-open-source-media-stack
System stalls in 1-10 minutes.
Thanks for the detailed bug report. I am assigning Imre to comment on the issue.
Imre, have you checked out this bug report?
Dmitry, is this something you can reproduce on multiple Supermicro SKL boards? Can you reproduce this issue on any other platform?
(In reply to Martin Peres from comment #2)
> Imre, have you checked out this bug report?
Yes, this problem happens on the same system as the problem in
bug#105949 . My guess is that this one here has the same root cause, that is unreliable CDCLK. As such we should first fix the problem in the other bug and re-check this one afterwards.
Update for myself:
Functional DMC issues are related to display.