Bug 105955 - System stall on Supermicro SKL boards on HEVC FEI workloads
Summary: System stall on Supermicro SKL boards on HEVC FEI workloads
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: high major
Assignee: Imre Deak
QA Contact: Intel GFX Bugs mailing list
Whiteboard: ReadyForDev
Depends on: 105949
  Show dependency treegraph
Reported: 2018-04-09 12:24 UTC by Dmitry Ermilov
Modified: 2019-11-29 17:44 UTC (History)
4 users (show)

See Also:
i915 platform: SKL
i915 features: display/Other, firmware/dmc

reproducer (536 bytes, text/x-sh)
2018-04-09 12:24 UTC, Dmitry Ermilov
no flags Details

Description Dmitry Ermilov 2018-04-09 12:24:45 UTC
Created attachment 138701 [details]

Running HEVC FEI encode media workloads we see a random system stall issue.

We observe the issue on specifically this hardware:
* Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0c 10/06/2017 
* video card device ID 0x191d
We do not see the bug on other SKL systems in our possession.

The issue is reproducible on CentOS 7 (7.4) with 4.14 and drm-tip (commit 29940f138482ff38047287ad288cea1fcf1f73b4) kernels.

The issue is a regression. We could bisect the faulty patch, it's below:
From bedc054eb8a2751aa22797c4a21a34029b3e5475 Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak@intel.com>
Date: Wed, 18 Nov 2015 19:53:50 +0200
Subject: [PATCH 08/10] drm/i915/skl: re-enable power well support
Now that the known DMC/DC issues are fixed, let's try again and
re-enable the power well support. 

Please note that in order to boot on drm-tip we have to use additional parameters:
intel_pstate=disable i915.enable_rc6=0 intel_idle.max_cstate=1

The Supermicro systems boot but i915 produces some warnings. https://bugs.freedesktop.org/show_bug.cgi?id=105949 describes it better. 

1) Build this stack: https://software.intel.com/en-us/articles/build-and-debug-open-source-media-stack
2) Run:
System stalls in 1-10 minutes.
Comment 1 Martin Peres 2018-04-10 10:18:11 UTC
Thanks for the detailed bug report. I am assigning Imre to comment on the issue.
Comment 2 Martin Peres 2018-07-18 10:39:00 UTC
Imre, have you checked out this bug report?
Comment 3 Martin Peres 2018-07-18 10:47:33 UTC
Dmitry, is this something you can reproduce on multiple Supermicro SKL boards? Can you reproduce this issue on any other platform?
Comment 4 Imre Deak 2018-07-18 12:03:03 UTC
(In reply to Martin Peres from comment #2)
> Imre, have you checked out this bug report?

Yes, this problem happens on the same system as the problem in
bug#105949 . My guess is that this one here has the same root cause, that is unreliable CDCLK. As such we should first fix the problem in the other bug and re-check this one afterwards.
Comment 5 Lakshmi 2019-02-07 08:11:56 UTC
Update for myself: 
Functional DMC issues are related to display.
Comment 6 Lakshmi 2019-06-04 09:46:02 UTC
Any update on this bug Imre?
Comment 7 Lakshmi 2019-08-27 09:13:48 UTC
Imre, any updates here?
Comment 8 Jani Saarinen 2019-11-14 09:20:17 UTC
Dmitry, very sorry but no effort to out to this. Is this still issue ?
Comment 9 Dmitry Ermilov 2019-11-14 10:50:45 UTC
Jani, this issue was found in our internal validation of MediaServerStudio 2018 R2. For that release the issue is actual.
But we're not going to release new versions of MSS anymore (since MediaSDK moved to GitHub), only HotFixes. I haven't heard of any external complains about this issue. So now, I assume the exposure is low.
Comment 10 Martin Peres 2019-11-29 17:44:25 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/99.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.