Bug 97978

Summary: [SKL / KBL] guc submission is causing IGT tests timeout (igt/gem_ringfill)
Product: DRI Reporter: Elio <elio.martinez.monroy>
Component: DRM/IntelAssignee: Elio <elio.martinez.monroy>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: high CC: anusha.srivatsa, carlos.santa, intel-gfx-bugs, jeff.mcgee, michal.winiarski, michel.thierry, rodrigo.vivi
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: KBL, SKL i915 features: firmware/guc
Attachments:
Description Flags
Dmesg with load and submission value 2
none
Dmesg with guc loaded and submission=0
none
Dmesg without guc
none
command line to execute basic IGT
none
SKL_IGT-basic-resulsts_with-and-without-guc-submission.xls
none
attachment-4054-0.html
none
timeout log
none
guc log
none
dmesg none

Description Elio 2016-09-29 17:51:00 UTC
Created attachment 126881 [details]
Dmesg with load and submission value 2

i'm facing some issue running basic IGT tests with GUC 6.1 enabled, sharing my configuration.
GuC 6.1: https://01.org/linuxgraphics/downloads/skylake-guc-6.1
Kernel: 4.8.0-rc7_3c9e639/
Commit git_commit=3c9e639197bb52280334830e611082d5b6bfaceb
Intel Gpu Tools : intel-gpu-tools-1.16-36-gd16318a
Hardware:  : Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz
Behavior:

In a 244 test suites, after i got the first time out, all the others tests still facing the same behavior taking more than 10 minutes to execute, this causes a time out signal failing the test. Something completely different if i disable guc deleting values from grub.( i915.enable_guc_loading=2 i915.enable_guc_submission=2)

Steps to reproduce:

Enable guc using grub parameters after installing firmware (i915.enable_guc_loading=2 i915.enable_guc_submission=2).

Execute IGT Basic
<command line attached>


Example of time out:

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 4238 root      20   0   57984   2276   1888 R 100.0  0.0   7:31.08 kms_addfb_basic
    1 root      20   0  119880   5996   3944 S   0.0  0.0   0:02.84 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
    7 root      20   0       0      0      0 S   0.0  0.0   0:01.11 rcu_sched
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh



Values so far with Guc 6.1 enabled:

Launch IGT basic tests with GuC loaded		pass 220 skip 14 timeout 7 incomplete 1 fail 1 dmesg-warn 1


Values without GUC

Launch IGT basic tests with GuC not loaded		pass 229 skip 13 dmesg-war 1 fail 1 


Attaching dmesg during igt execution with timeouts.

This doesnt happen if subimission=0 on grub parameters. And this load the firmware with value 2 on load parameter.
Comment 1 Elio 2016-09-29 17:52:07 UTC
Created attachment 126882 [details]
Dmesg with guc loaded and submission=0
Comment 2 Elio 2016-09-29 17:52:41 UTC
Created attachment 126883 [details]
Dmesg without guc
Comment 3 Elio 2016-09-29 17:53:15 UTC
Created attachment 126884 [details]
command line to execute basic IGT
Comment 4 cprigent 2016-09-30 09:33:05 UTC
Created attachment 126891 [details]
SKL_IGT-basic-resulsts_with-and-without-guc-submission.xls

I'm also reproducing it while executing IGT Basic tests on SKL:

With kernel boot command lines:  i915.enable_guc_loading=2 i915.enable_guc_submission=2
Timeout: 96
Pass: 118
Dmesg-warn: 8
Skip: 22
Duration: 58317.4993

With kernel boot command lines:  i915.enable_guc_loading=2 i915.enable_guc_submission=0
Timeout: 0
Pass: 172
Dmesg-warn: 16
Skip: 56
Duration: 556.0889204

Platform SKL Gigabyte
CPU: Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz (family 6, model 94, stepping 3)
GPU: IntelĀ® HD Graphics 530 - Intel Corporation Sky Lake Integrated Graphics (rev 06)
Motherboard version: H170N-WIFI-CF
Memory: 2x 4GB Kingston 9905622-055.A00G

Software
Bios: F3
Linux distribution: Ubuntu 16.04 64 bits
Kernel: 4.8.0-rc8 aab15c2 from http://cgit.freedesktop.org/drm-intel/
   commit aab15c274da587bcab19376d2caa9d6626440335
   Author: Jani Nikula <jani.nikula@intel.com>
   Date:   Mon Sep 26 15:11:53 2016 +0300
   drm-intel-nightly: 2016y-09m-26d-12h-11m-33s UTC integration manifest
libdrm-2.4.70-14 0659558 from git://anongit.freedesktop.org/mesa/drm
mesa: mesa-12.0.0 8b06176 from git://anongit.freedesktop.org/mesa/mesa
cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo
xorg-server-1.18.99.901-14 ba199cb from git://git.freedesktop.org/git/xorg/xserver
xf86-video-intel 2.99.917-708 8f33f80 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
libva-1.7.2-38 3b7e499 from git://git.freedesktop.org/git/vaapi/libva 
vaapi-intel-driver: 1.7.2-101 302cf63 from git://git.freedesktop.org/git/vaapi/intel-driver
IGT: intel-gpu-tools-1.16-30 32b2021 from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git
DMC 1.26 from https://01.org/sites/default/files/downloads/intelr-graphics-linux/skldmcver126.tar_1.bz2
GUC 6.1 from https://01.org/sites/default/files/downloads/intelr-graphics-linux/sklgucver61.tar.bz2
Comment 5 cprigent 2016-09-30 09:36:17 UTC
Updated to highest blocker. It is causing our DUTs to be very slow when executing full IGT.
Comment 6 cprigent 2016-10-03 07:14:30 UTC
Yann found this is due to test gem_ringfill. It is not reproduced when blacklisting it.
Comment 7 Jari Tahvanainen 2016-10-05 08:40:16 UTC
Reduced from Highest to High since failure is not visible with default config/settings.
Comment 8 Michel Thierry 2016-10-05 10:05:24 UTC
If you have time, can you run this experiment?

1. grub parameters:
i915.enable_guc_loading=2 i915.enable_guc_submission=2 i915.guc_log_level=2 drm.debug=0xe

2. run these tests (stop if one of the tests already timeout):
gem_ringfill --run-subtest basic-default
gem_ringfill --run-subtest basic-default-interruptible
gem_ringfill --run-subtest basic-default-hang
gem_ringfill --run-subtest basic-default-forked

3. get these logs:
dmesg
cat /sys/kernel/debug/dri/0/i915_guc_log_dump > ~/guc_log_dump.log
cat /sys/kernel/debug/dri/0/i915_guc_info > ~/timeout_info.log
cat /sys/kernel/debug/dri/0/i915_gem_seqno >> ~/timeout_info.log
cat /sys/kernel/debug/dri/0/i915_gem_request >> ~/timeout_info.log

And attach those 3 files (dmesg, guc_log & timeout_info). 

Thanks
Comment 9 Elio 2016-10-05 19:32:24 UTC
Aparently this problem seems to be present with following configuration on KBL as well:


Platform: KABY LAKE-U
Processor : Genuine Intel(R) CPU 0000 @ 1.80GHz (cpu family: 6, model: 142, stepping: 9)
MCP : KBL-U J0 2+3e
QDF : QL9J
PCH: PCH-LP C1
CRB : KABY LAKE U DDR3L RVP7
Rework: O-16

DMC 1.01 from https://01.org/sites/default/files/downloads/intelr-graphics-linux/kbldmcver101.tar.bz2
GuC 9.14 from http://rdvivi-hillsboro.jf.intel.com/firmware/kbl_guc_ver9_14.tar.bz2 


Kernel:
 

commit b34b27fe61d8fe953e8bd28695c9407082e4667e
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Tue Oct 4 11:12:04 2016 +0100

    drm-intel-nightly: 2016y-10m-04d-10h-10m-17s UTC integration manifest


Kernel version : 4.8.0-rc8-cedbdec
Architecture : source amd64 all


Component         : drm
	url       : http://cgit.freedesktop.org/mesa/drm
	tag       : libdrm-2.4.68
	commit    : fc09c5ab84240e9b6bd0bed01685ef004f56c4fa
	
Component         : mesa
	url       : http://cgit.freedesktop.org/mesa/mesa
	tag       : mesa-12.0.1
	commit    : 04277f058d00238937e664cf546c43b16cea7b2b
	
Component         : xf86-video-intel
	url       : http://cgit.freedesktop.org/xorg/driver/xf86-video-intel
	tag       : 2.99.917-701-g205146b
	commit    : 205146b0fdc8db016e5cfeeae5a6b25df3470ebc
	

Component         : libva
	url       : http://cgit.freedesktop.org/libva
	tag       : libva-1.7.2.pre1
	commit    : 9927bd2fbbb1d923fd6a8932a3cdbb5c9185ee22
	

Component         : intel-driver
	url       : http://cgit.freedesktop.org/vaapi/intel-driver
	tag       : 1.7.2.pre1
	commit    : ce444fb412966ca6afbb1331b7cae8ab621c1108
	
Component         : cairo
	url       : http://cgit.freedesktop.org/cairo
	tag       : 1.15.2
	commit    : db8a7f1697c49ae4942d2aa49eed52dd73dd9c7a
	

Component         : xserver
	url       : http://cgit.freedesktop.org/xorg/xserver
	tag       : xorg-server-1.18.3
	commit    : 9454cd51da9b38b974cff7c8b7125901f6403848
	

Component         : macros
	url       : https://cgit.freedesktop.org/xorg/util/macros
	tag       : util-macros-1.19.0-2-gd7acec2
	commit    : d7acec2
	

Component         : intel-gpu-tools
	url       : https://cgit.freedesktop.org/xorg/app/intel-gpu-tools
	tag       : intel-gpu-tools-1.16
	commit    : a28e9e38a9efc6daf5a08d60d29adcd3e328fe6f

I'll attach logs with previous instructions as soon as possible
Comment 10 Michel Thierry 2016-10-05 19:32:31 UTC
Created attachment 127034 [details]
attachment-4054-0.html

On international travel from Thursday 6/Oct to Tuesday 11/Oct. Expect delays.
Comment 11 Elio 2016-10-05 19:53:50 UTC
The failing test from the last 4 is gem_ringfill --run-subtest basic-default-interruptible

Sharing logs.

Please let me know if there is something else needed
Comment 12 Elio 2016-10-05 19:54:27 UTC
Created attachment 127035 [details]
timeout log
Comment 13 Elio 2016-10-05 19:54:47 UTC
Created attachment 127036 [details]
guc log
Comment 14 Elio 2016-10-05 19:55:07 UTC
Created attachment 127037 [details]
dmesg
Comment 15 yann 2016-10-06 13:21:22 UTC
Elio, please try Michal's patch: https://patchwork.freedesktop.org/series/13388/
Comment 16 knr 2016-10-06 13:24:42 UTC
https://patchwork.freedesktop.org/series/13388/
Fixes the problem on SKL, but the issue does not depend on HW/GuC version
Comment 17 knr 2016-10-06 14:12:47 UTC
Patch from Chris also fixes the issue:
https://patchwork.freedesktop.org/patch/114110/

You should probably ignore my version.
Comment 18 Elio 2016-10-06 19:09:18 UTC
https://patchwork.freedesktop.org/patch/114110/

Seems to solve the problem. 

IGT basic test runs without time outs.

Configuration:

GuC 6.1: https://01.org/linuxgraphics/downloads/skylake-guc-6.1
Kernel: 4.8.0-rc7_3c9e639/
Commit git_commit=3c9e639197bb52280334830e611082d5b6bfaceb
Intel Gpu Tools : intel-gpu-tools-1.16-36-gd16318a
Hardware:  : Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz

[244/244] skip: 13, pass: 230, dmesg-warn: 1
Thank you for running Piglit!
Results have been written to /home/gfx/intel-gpu-tools/scripts/results_with_guc_enabled_2_2
HTML summary has been written to results_with_guc_enabled_2_2//html/index.html

real    10m38.201s
user    0m45.000s
sys     4m24.148s

Grub parameters added:

GRUB_CMDLINE_LINUX_DEFAULT="i915.enable_guc_loading=2 i915.enable_guc_submission=2 drm.debug=0xe 

This issue is going to be closed as soon as patch should be merged
Comment 19 Elio 2016-10-06 20:14:05 UTC
Confirmed. Same patch works on KBL as well
Comment 20 Chris Wilson 2016-10-07 07:27:45 UTC
commit 5ba899082cbffb779ccb39420fe1718850daf857
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Oct 7 07:53:27 2016 +0100

    drm/i915/guc: Unwind GuC workqueue reservation if request construction fails
    
    We reserve space in the GuC workqueue for submitting the request in the
    future. However, if we fail to construct the request, we need to give
    that reserved space back to the system.
    
    Fixes: dadd481bfe55 ("drm/i915/guc: Prepare for nonblocking execbuf submission")
Comment 21 cprigent 2016-10-25 09:41:57 UTC
I launched IGT Basic on HSW, IVB, BDW with kernel boot command lines:  i915.enable_guc_loading=2 i915.enable_guc_submission=2
There is no timeout.
I see in kernel log: 
[drm:intel_device_info_dump [i915]] i915 device info: has_guc: no
[drm:intel_guc_setup [i915]] GuC fw status: path (null), fetch NONE, load NONE

Kernel: 4.9.0-rc2 194359e from http://cgit.freedesktop.org/drm-intel/
  commit 194359e4a31ff988c7a290093820c5ef28d3752b
  Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
  Date:   Mon Oct 24 17:44:02 2016 -0200
  drm-intel-nightly: 2016y-10m-24d-19h-42m-14s UTC integration manifest
libdrm-2.4.71 9e24d0c from git://anongit.freedesktop.org/mesa/drm
mesa: mesa-12.0.0 8b06176 from git://anongit.freedesktop.org/mesa/mesa
cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo
xorg-server-1.18.99.901-80 5dcb066 from git://git.freedesktop.org/git/xorg/xserver
xf86-video-intel 2.99.917-720 388fd4a from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
libva-1.7.2-38 3b7e499 from git://git.freedesktop.org/git/vaapi/libva 
vaapi-intel-driver: 1.7.2-140 852cea1 from git://git.freedesktop.org/git/vaapi/intel-driver
IGT: intel-gpu-tools-1.16-96 93437cb from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git

So closed.
Comment 22 cprigent 2016-10-25 11:33:33 UTC
Sorry, I commented the wrong bug. So re-opened
Comment 23 cprigent 2016-10-25 12:47:10 UTC
Not reproduced on KBL

Platform: KABY LAKE-U
Processor : Genuine Intel(R) CPU 0000 @ 1.80GHz (cpu family: 6, model: 142, stepping: 9)
MCP : KBL-U J0 2+3e
QDF : QL9J
PCH: PCH-LP C1
CRB : KABY LAKE U DDR3L RVP7
Rework: O-16

Software
BIOS: 45.1 3KBLSE2R1.R00.X045.P01.1606291634 from https://ubit-artifactory-ba.intel.com/artifactory/owr-repos/Submissions/ifwi/KBL_ORANGE_IFWI_2016_WW27_3_03_SR'17/
ME FW: 11.6.0.1065
EC FW: 1.24
KSC: 1.24
Linux distribution: Ubuntu 16.04 64 bits
DMC 1.01 from https://01.org/sites/default/files/downloads/intelr-graphics-linux/kbldmcver101.tar.bz2
GuC 9.14 from http://rdvivi-hillsboro.jf.intel.com/firmware/kbl_guc_ver9_14.tar.bz2 
Kernel: 4.9.0-rc2 194359e from http://cgit.freedesktop.org/drm-intel/
  commit 194359e4a31ff988c7a290093820c5ef28d3752b
  Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
  Date:   Mon Oct 24 17:44:02 2016 -0200
  drm-intel-nightly: 2016y-10m-24d-19h-42m-14s UTC integration manifest
libdrm-2.4.71 9e24d0c from git://anongit.freedesktop.org/mesa/drm
mesa: mesa-12.0.0 8b06176 from git://anongit.freedesktop.org/mesa/mesa
cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo
xorg-server-1.18.99.901-80 5dcb066 from git://git.freedesktop.org/git/xorg/xserver
xf86-video-intel 2.99.917-720 388fd4a from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
libva-1.7.2-38 3b7e499 from git://git.freedesktop.org/git/vaapi/libva 
vaapi-intel-driver: 1.7.2-140 852cea1 from git://git.freedesktop.org/git/vaapi/intel-driver
IGT: intel-gpu-tools-1.16-96 93437cb from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git

Let's check on APL before closing it
Comment 24 cprigent 2016-10-25 13:22:59 UTC
Also not reproduced on SKL

Platform SKL: NUC6i3SYB
CPU: Intel(R) Core(TM) i3-6100U CPU @ 2.30GHZ (family 6, model 78, stepping 3)
Motherboard version: H81132-502
GPU: IntelĀ® HD Graphics 520 - Intel Corporation Sky Lake Integrated Graphics (rev 07)
Memory: one 8GB card Kingston KVR21S15D8/8
SSD: Samsung 850 EVO M.2 120 Go

Software
Bios: SYSKLi35.86A.0045.2016.0527.1055 from https://downloadcenter.intel.com/downloads/eula/26097/BIOS-Update-SYSKLi35-86A-?httpDown=https%3A%2F%2Fdownloadmirror.intel.com%2F26097%2Feng%2FSY0045.bio
Linux distribution: Ubuntu 16.04 64 bits
DMC 1.26 from https://01.org/sites/default/files/downloads/intelr-graphics-linux/skldmcver126.tar_1.bz2
GUC 6.1 from https://01.org/sites/default/files/downloads/intelr-graphics-linux/sklgucver61.tar.bz2
Kernel: 4.9.0-rc2 194359e from http://cgit.freedesktop.org/drm-intel/
  commit 194359e4a31ff988c7a290093820c5ef28d3752b
  Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
  Date:   Mon Oct 24 17:44:02 2016 -0200
  drm-intel-nightly: 2016y-10m-24d-19h-42m-14s UTC integration manifest
libdrm-2.4.71 9e24d0c from git://anongit.freedesktop.org/mesa/drm
mesa: mesa-12.0.0 8b06176 from git://anongit.freedesktop.org/mesa/mesa
cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo
xorg-server-1.18.99.901-80 5dcb066 from git://git.freedesktop.org/git/xorg/xserver
xf86-video-intel 2.99.917-720 388fd4a from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
libva-1.7.2-38 3b7e499 from git://git.freedesktop.org/git/vaapi/libva 
vaapi-intel-driver: 1.7.2-140 852cea1 from git://git.freedesktop.org/git/vaapi/intel-driver
IGT: intel-gpu-tools-1.16-96 93437cb from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git

External screens: Ctl IP2152 (HDMI) and LG 23MB35PYI (DP)

So closed as fixed.
Comment 25 Luis Botello 2016-11-24 21:18:47 UTC
Issue is not seen on KBL with the following config:

 Software information
============================================
Kernel version                  : 4.9.0-rc4drm-intel-nightly-2d36d79
Linux distribution              : Ubuntu 16.04.1 LTS
Architecture                    : 64-bit
xf86-video-intel version        : 2.99.917
Xorg-Xserver version            : 1.18.3
DRM version                     : 2.4.68
Cairo version                   : 1.15.2
Intel GPU Tools version         : Tag [intel-gpu-tools-1.16] / Commit [a28e9e3]
Kernel driver in use            : i915
Bios revision                   : 52.1
KSC revision                    : 1.24

 Hardware information
============================================
Motherboard model               : KabylakeClientplatform
Motherboard type                : KabylakeUDDR3LRVP7 Laptop
CPU information                 : Genuine Intel(R) CPU 0000 @ 1.80GHz
GPU Card                        : Intel Corporation Device 5926 (rev 03) (prog-if 00 [VGA controller])

 Firmwares information
============================================
DMC fw loaded                   : yes
DMC version                     : 1.1
GUC fw loaded                   : SUCCESS
GUC version found               : 9.14
HUC fw loaded                   : SUCCESS
HUC version found               : 2.00.18

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.