Bug 105955 - System stall on Supermicro SKL boards on HEVC FEI workloads
Summary: System stall on Supermicro SKL boards on HEVC FEI workloads
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: high major
Assignee: Imre Deak
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on: 105949
Blocks:
  Show dependency treegraph
 
Reported: 2018-04-09 12:24 UTC by Dmitry Ermilov
Modified: 2019-06-04 09:46 UTC (History)
4 users (show)

See Also:
i915 platform: SKL
i915 features: display/Other, firmware/dmc


Attachments
reproducer (536 bytes, text/x-sh)
2018-04-09 12:24 UTC, Dmitry Ermilov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Ermilov 2018-04-09 12:24:45 UTC
Created attachment 138701 [details]
reproducer

Running HEVC FEI encode media workloads we see a random system stall issue.

We observe the issue on specifically this hardware:
* Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0c 10/06/2017 
* video card device ID 0x191d
We do not see the bug on other SKL systems in our possession.

The issue is reproducible on CentOS 7 (7.4) with 4.14 and drm-tip (commit 29940f138482ff38047287ad288cea1fcf1f73b4) kernels.

The issue is a regression. We could bisect the faulty patch, it's below:
https://patchwork.freedesktop.org/patch/65235/
From bedc054eb8a2751aa22797c4a21a34029b3e5475 Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak@intel.com>
Date: Wed, 18 Nov 2015 19:53:50 +0200
Subject: [PATCH 08/10] drm/i915/skl: re-enable power well support
Now that the known DMC/DC issues are fixed, let's try again and
re-enable the power well support. 

Please note that in order to boot on drm-tip we have to use additional parameters:
intel_pstate=disable i915.enable_rc6=0 intel_idle.max_cstate=1

The Supermicro systems boot but i915 produces some warnings. https://bugs.freedesktop.org/show_bug.cgi?id=105949 describes it better. 

Reproducer:
1) Build this stack: https://software.intel.com/en-us/articles/build-and-debug-open-source-media-stack
2) Run:
./repr.sh
System stalls in 1-10 minutes.
Comment 1 Martin Peres 2018-04-10 10:18:11 UTC
Thanks for the detailed bug report. I am assigning Imre to comment on the issue.
Comment 2 Martin Peres 2018-07-18 10:39:00 UTC
Imre, have you checked out this bug report?
Comment 3 Martin Peres 2018-07-18 10:47:33 UTC
Dmitry, is this something you can reproduce on multiple Supermicro SKL boards? Can you reproduce this issue on any other platform?
Comment 4 Imre Deak 2018-07-18 12:03:03 UTC
(In reply to Martin Peres from comment #2)
> Imre, have you checked out this bug report?

Yes, this problem happens on the same system as the problem in
bug#105949 . My guess is that this one here has the same root cause, that is unreliable CDCLK. As such we should first fix the problem in the other bug and re-check this one afterwards.
Comment 5 Lakshmi 2019-02-07 08:11:56 UTC
Update for myself: 
Functional DMC issues are related to display.
Comment 6 Lakshmi 2019-06-04 09:46:02 UTC
Any update on this bug Imre?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.