Bug 105955 - System stall on Supermicro SKL boards on HEVC FEI workloads
Summary: System stall on Supermicro SKL boards on HEVC FEI workloads
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: high major
Assignee: Imre Deak
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on: 105949
Blocks:
  Show dependency treegraph
 
Reported: 2018-04-09 12:24 UTC by Dmitry Ermilov
Modified: 2019-08-27 09:13 UTC (History)
4 users (show)

See Also:
i915 platform: SKL
i915 features: display/Other, firmware/dmc


Attachments
reproducer (536 bytes, text/x-sh)
2018-04-09 12:24 UTC, Dmitry Ermilov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Ermilov 2018-04-09 12:24:45 UTC
Created attachment 138701 [details]
reproducer

Running HEVC FEI encode media workloads we see a random system stall issue.

We observe the issue on specifically this hardware:
* Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0c 10/06/2017 
* video card device ID 0x191d
We do not see the bug on other SKL systems in our possession.

The issue is reproducible on CentOS 7 (7.4) with 4.14 and drm-tip (commit 29940f138482ff38047287ad288cea1fcf1f73b4) kernels.

The issue is a regression. We could bisect the faulty patch, it's below:
https://patchwork.freedesktop.org/patch/65235/
From bedc054eb8a2751aa22797c4a21a34029b3e5475 Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak@intel.com>
Date: Wed, 18 Nov 2015 19:53:50 +0200
Subject: [PATCH 08/10] drm/i915/skl: re-enable power well support
Now that the known DMC/DC issues are fixed, let's try again and
re-enable the power well support. 

Please note that in order to boot on drm-tip we have to use additional parameters:
intel_pstate=disable i915.enable_rc6=0 intel_idle.max_cstate=1

The Supermicro systems boot but i915 produces some warnings. https://bugs.freedesktop.org/show_bug.cgi?id=105949 describes it better. 

Reproducer:
1) Build this stack: https://software.intel.com/en-us/articles/build-and-debug-open-source-media-stack
2) Run:
./repr.sh
System stalls in 1-10 minutes.
Comment 1 Martin Peres 2018-04-10 10:18:11 UTC
Thanks for the detailed bug report. I am assigning Imre to comment on the issue.
Comment 2 Martin Peres 2018-07-18 10:39:00 UTC
Imre, have you checked out this bug report?
Comment 3 Martin Peres 2018-07-18 10:47:33 UTC
Dmitry, is this something you can reproduce on multiple Supermicro SKL boards? Can you reproduce this issue on any other platform?
Comment 4 Imre Deak 2018-07-18 12:03:03 UTC
(In reply to Martin Peres from comment #2)
> Imre, have you checked out this bug report?

Yes, this problem happens on the same system as the problem in
bug#105949 . My guess is that this one here has the same root cause, that is unreliable CDCLK. As such we should first fix the problem in the other bug and re-check this one afterwards.
Comment 5 Lakshmi 2019-02-07 08:11:56 UTC
Update for myself: 
Functional DMC issues are related to display.
Comment 6 Lakshmi 2019-06-04 09:46:02 UTC
Any update on this bug Imre?
Comment 7 Lakshmi 2019-08-27 09:13:48 UTC
Imre, any updates here?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.