Bug 105955

Summary: System stall on Supermicro SKL boards on HEVC FEI workloads
Product: DRI Reporter: Dmitry Ermilov <dmitry.ermilov>
Component: DRM/IntelAssignee: Imre Deak <imre.deak>
Status: NEW --- QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: dmitry.ermilov, dmitry.v.rogozhkin, imre.deak, intel-gfx-bugs
Version: unspecified   
Hardware: Other   
OS: Linux (All)   
Whiteboard: ReadyForDev
i915 platform: SKL i915 features: display/Other, firmware/dmc
Bug Depends on: 105949    
Bug Blocks:    
Description Flags
reproducer none

Description Dmitry Ermilov 2018-04-09 12:24:45 UTC
Created attachment 138701 [details]

Running HEVC FEI encode media workloads we see a random system stall issue.

We observe the issue on specifically this hardware:
* Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0c 10/06/2017 
* video card device ID 0x191d
We do not see the bug on other SKL systems in our possession.

The issue is reproducible on CentOS 7 (7.4) with 4.14 and drm-tip (commit 29940f138482ff38047287ad288cea1fcf1f73b4) kernels.

The issue is a regression. We could bisect the faulty patch, it's below:
From bedc054eb8a2751aa22797c4a21a34029b3e5475 Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak@intel.com>
Date: Wed, 18 Nov 2015 19:53:50 +0200
Subject: [PATCH 08/10] drm/i915/skl: re-enable power well support
Now that the known DMC/DC issues are fixed, let's try again and
re-enable the power well support. 

Please note that in order to boot on drm-tip we have to use additional parameters:
intel_pstate=disable i915.enable_rc6=0 intel_idle.max_cstate=1

The Supermicro systems boot but i915 produces some warnings. https://bugs.freedesktop.org/show_bug.cgi?id=105949 describes it better. 

1) Build this stack: https://software.intel.com/en-us/articles/build-and-debug-open-source-media-stack
2) Run:
System stalls in 1-10 minutes.
Comment 1 Martin Peres 2018-04-10 10:18:11 UTC
Thanks for the detailed bug report. I am assigning Imre to comment on the issue.
Comment 2 Martin Peres 2018-07-18 10:39:00 UTC
Imre, have you checked out this bug report?
Comment 3 Martin Peres 2018-07-18 10:47:33 UTC
Dmitry, is this something you can reproduce on multiple Supermicro SKL boards? Can you reproduce this issue on any other platform?
Comment 4 Imre Deak 2018-07-18 12:03:03 UTC
(In reply to Martin Peres from comment #2)
> Imre, have you checked out this bug report?

Yes, this problem happens on the same system as the problem in
bug#105949 . My guess is that this one here has the same root cause, that is unreliable CDCLK. As such we should first fix the problem in the other bug and re-check this one afterwards.
Comment 5 Lakshmi 2019-02-07 08:11:56 UTC
Update for myself: 
Functional DMC issues are related to display.
Comment 6 Lakshmi 2019-06-04 09:46:02 UTC
Any update on this bug Imre?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.