Bug 97626 - [KBL] GPU Hang when launching Synmark2 with GuC 9.14 loaded
Summary: [KBL] GPU Hang when launching Synmark2 with GuC 9.14 loaded
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Jeff McGee
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-07 12:02 UTC by cprigent
Modified: 2016-12-14 18:29 UTC (History)
7 users (show)

See Also:
i915 platform: KBL
i915 features: firmware/guc, GPU hang


Attachments
log-synmark2 (15.13 KB, text/plain)
2016-09-07 12:02 UTC, cprigent
no flags Details
kern.log (349.81 KB, text/plain)
2016-09-07 12:12 UTC, cprigent
no flags Details
GPU crash dump (758.75 KB, text/plain)
2016-09-08 08:27 UTC, cprigent
no flags Details
All sysmark logs and dmesg for each scenario (80.39 KB, application/x-gzip)
2016-10-05 18:10 UTC, Elio
no flags Details
attachment-17355-0.html (2.66 KB, text/html)
2016-10-21 15:12 UTC, Jani Saarinen
no flags Details

Description cprigent 2016-09-07 12:02:41 UTC
Created attachment 126273 [details]
log-synmark2

Platform: KABY LAKE-U
Processor : Genuine Intel(R) CPU 0000 @ 1.80GHz (cpu family: 6, model: 142, stepping: 9)
MCP : KBL-U J0 2+3e
QDF : QL9J
PCH: PCH-LP C1
CRB : KABY LAKE U DDR3L RVP7
Rework: O-16

Software
BIOS: 45.1 3KBLSE2R1.R00.X045.P01.1606291634 from https://ubit-artifactory-ba.intel.com/artifactory/owr-repos/Submissions/ifwi/KBL_ORANGE_IFWI_2016_WW27_3_03_SR'17/
ME FW: 11.6.0.1065
EC FW: 1.24
KSC: 1.24
Linux distribution: Ubuntu 16.04 64 bits
Kernel: 4.8.0-rc4 9baa666 from http://cgit.freedesktop.org/drm-intel/
  commit 9baa666b3e48f71b46c5f63541f57d2a95a1b1c0
  Author: Chris Wilson <chris@chris-wilson.co.uk>
  Date:   Sat Sep 3 13:12:38 2016 +0100
  drm-intel-nightly: 2016y-09m-03d-12h-12m-15s UTC integration manifest
libdrm-2.4.70-6 4462303 from git://anongit.freedesktop.org/mesa/drm
mesa: mesa-12  from git://anongit.freedesktop.org/mesa/mesa
cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo
xorg-server-1.18.0-546 deae9c7 from git://git.freedesktop.org/git/xorg/xserver
2.99.917-701 205146b from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
libva-1.7.0-47 2ebf897 from git://git.freedesktop.org/git/vaapi/libva 
vaapi-intel-driver: 1.7.0-95 1817bee from git://git.freedesktop.org/git/vaapi/intel-driver
DMC 1.01 from https://01.org/sites/default/files/downloads/intelr-graphics-linux/kbldmcver101.tar.bz2
GuC 9.14 from http://rdvivi-hillsboro.jf.intel.com/firmware/kbl_guc_ver9_14.tar.bz2 
Synmark2 from http://benchsrv.fi.intel.com/archive/benchmarks/

Precondition:
--------------
Boot with kernel boot command lines: i915.enable_guc_loading=2 i915.enable_guc_submission=2

Steps:
------
1. Download Synmark2:
mkdir benchmark
cd benchmark
wget http://benchsrv.fi.intel.com/archive/benchmarks/SynMark2.tar.gz
tar -xvf SynMark2.tar.gz
2. From, Terminal 1:
startx
3. From Terminal 2:
cd SynMark2
sudo -E ./synmark-test.sh

Actual result:
--------------
3. Green square is displayed a couple of seconds (no rotation) then tool is ended and returns errors:
intel_do_flush_locked failed: Input/output error
libGL: OpenDriver: trying /opt/X11R7/lib/dri/tls/i965_dri.so
libGL: OpenDriver: trying /opt/X11R7/lib/dri/i965_dri.so


Expected result:
----------------
3. Benchmark is launched without problem when GuC is loaded

Info:
-----
Benchmark is launched without problem after removing i915.enable_guc_loading=2 i915.enable_guc_submission=2 from Grub
Comment 1 Chris Wilson 2016-09-07 12:09:05 UTC
GPU hang, let's see the error state and start pondering what more we need to see in it for debugging GuC.
Comment 2 cprigent 2016-09-07 12:12:08 UTC
Created attachment 126275 [details]
kern.log
Comment 3 cprigent 2016-09-08 08:27:48 UTC
Created attachment 126293 [details]
GPU crash dump
Comment 4 cprigent 2016-09-08 09:29:29 UTC
From error dump, hung is happening in render ring batch with active head at 0xff2057e4, with 0x7b000005 (3DPRIMITIVE) as IPEHR.

Batch extract (around 0xff2057e4):

0xff2057c0:      0x780c0000: 3D UNKNOWN: 3d_965 opcode = 0x780c
0xff2057c4:      0x00000000: MI_NOOP
Bad length 7 in (null), expected 6-6
0xff2053c8:      0x7b000005: 3DPRIMITIVE: fail sequential
0xff2053cc:      0x0000000f:    vertex count
0xff2053d0:      0x00000003:    start vertex
0xff2053d4:      0x00000000:    instance count
0xff2053d8:      0x00000001:    start instance
0xff2053dc:      0x00000000:    index bias
0xff2053e0:      0x00000000: MI_NOOP
0xff2057e4:      0x05000000: MI_BATCH_BUFFER_END
0xff2057e8:      0x00000000:    
0xff2057ec:      0x00000000:    
0xff2057f0:      0x00000000:    

Should we assign to Mesa team? Or does it need investigation from GuC side?
Comment 5 Eero Tamminen 2016-09-29 11:39:40 UTC
Which SynMark tests fail, most of them or just some specific ones?

Do other test-suites than SynMark fail with GuC?

If not, which SynMark2 version you're using?  v6.0?  v7.0?
Comment 6 anusha 2016-09-29 23:14:25 UTC
synmark-test.sh results in a GPU hang. synmark2testOgl.sh works fine. Version is 6.0
Comment 7 yann 2016-09-30 07:44:43 UTC
(In reply to anusha from comment #6)
> synmark-test.sh results in a GPU hang. synmark2testOgl.sh works fine.
> Version is 6.0

Please use v7.0 and confirm if you are seeing or not GPU hang. If it is occurring again then attach gpu crash dump.
Comment 8 Eero Tamminen 2016-09-30 07:57:20 UTC
(In reply to anusha from comment #6)
> synmark-test.sh results in a GPU hang. synmark2testOgl.sh works fine.

If you're using the same scripts as we are, the only difference between those two should be in which order they run the 40+ SynMark tests (synmark2testOgl.sh tries also couple of OpenCL tests which naturally will fail).

In which SynMark test the GPU hang happens?  Is it always the same one?

Do you get hangs with any other test-suite (GpuTest, GfxBench, GLBenchmark, Unigine demos...)?


Is there a difference if you run tests in fullscreen or in windowed mode (i.e. with Unity composition)?
Comment 9 Elio 2016-10-05 18:06:04 UTC
Last configuration tested:

Platform: KABY LAKE-U
Processor : Genuine Intel(R) CPU 0000 @ 1.80GHz (cpu family: 6, model: 142, stepping: 9)
MCP : KBL-U J0 2+3e
QDF : QL9J
PCH: PCH-LP C1
CRB : KABY LAKE U DDR3L RVP7
Rework: O-16

DMC 1.01 from https://01.org/sites/default/files/downloads/intelr-graphics-linux/kbldmcver101.tar.bz2
GuC 9.14 from http://rdvivi-hillsboro.jf.intel.com/firmware/kbl_guc_ver9_14.tar.bz2 
Synmark2 from http://benchsrv.fi.intel.com/archive/benchmarks/  ***in this case i tested with Version 7 ****

Kernel:
 

commit b34b27fe61d8fe953e8bd28695c9407082e4667e
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Tue Oct 4 11:12:04 2016 +0100

    drm-intel-nightly: 2016y-10m-04d-10h-10m-17s UTC integration manifest


Kernel version : 4.8.0-rc8-cedbdec
Architecture : source amd64 all


Component         : drm
	url       : http://cgit.freedesktop.org/mesa/drm
	tag       : libdrm-2.4.68
	commit    : fc09c5ab84240e9b6bd0bed01685ef004f56c4fa
	
Component         : mesa
	url       : http://cgit.freedesktop.org/mesa/mesa
	tag       : mesa-12.0.1
	commit    : 04277f058d00238937e664cf546c43b16cea7b2b
	
Component         : xf86-video-intel
	url       : http://cgit.freedesktop.org/xorg/driver/xf86-video-intel
	tag       : 2.99.917-701-g205146b
	commit    : 205146b0fdc8db016e5cfeeae5a6b25df3470ebc
	

Component         : libva
	url       : http://cgit.freedesktop.org/libva
	tag       : libva-1.7.2.pre1
	commit    : 9927bd2fbbb1d923fd6a8932a3cdbb5c9185ee22
	

Component         : intel-driver
	url       : http://cgit.freedesktop.org/vaapi/intel-driver
	tag       : 1.7.2.pre1
	commit    : ce444fb412966ca6afbb1331b7cae8ab621c1108
	
Component         : cairo
	url       : http://cgit.freedesktop.org/cairo
	tag       : 1.15.2
	commit    : db8a7f1697c49ae4942d2aa49eed52dd73dd9c7a
	

Component         : xserver
	url       : http://cgit.freedesktop.org/xorg/xserver
	tag       : xorg-server-1.18.3
	commit    : 9454cd51da9b38b974cff7c8b7125901f6403848
	

Component         : macros
	url       : https://cgit.freedesktop.org/xorg/util/macros
	tag       : util-macros-1.19.0-2-gd7acec2
	commit    : d7acec2
	

Component         : intel-gpu-tools
	url       : https://cgit.freedesktop.org/xorg/app/intel-gpu-tools
	tag       : intel-gpu-tools-1.16
	commit    : a28e9e38a9efc6daf5a08d60d29adcd3e328fe6f
	

As requested, test.conf was modified with FALSE

Tested 3 different scenarios

1.- Guc parameters in grub = i915.enable_guc_loading=2 i915.enable_guc_submission=2  GPU hang on test OglBatch0

2.- Guc parameters in grub = i915.enable_guc_loading=0 i915.enable_guc_submission=0  GPU hand on test OglDrvShComp
Comment 10 Elio 2016-10-05 18:10:02 UTC
3.- No guc parameters in grub. All test finished without problems.

Attaching 3 different Synmark Logs and Dmesg
Comment 11 Elio 2016-10-05 18:10:58 UTC
Created attachment 127033 [details]
All sysmark logs and dmesg for each scenario
Comment 12 Eero Tamminen 2016-10-06 07:42:56 UTC
(In reply to Elio from comment #11)
> Created attachment 127033 [details]
> All sysmark logs and dmesg for each scenario

So, do all (or at least most) of the SynMark tests fail with GuC?  Please run them one by one.

And if most of those fail, test also other benchmarks.  It would be really peculiar if most SynMark tests fail, but no other benchmarks fail.
Comment 13 cprigent 2016-10-06 16:21:59 UTC
(In reply to Eero Tamminen from comment #12)
> (In reply to Elio from comment #11)
> > Created attachment 127033 [details]
> > All sysmark logs and dmesg for each scenario
> 
> So, do all (or at least most) of the SynMark tests fail with GuC?  Please
> run them one by one.
Elio confirmed it happens with OglBatch0 (comment 9).

> And if most of those fail, test also other benchmarks.  It would be really
> peculiar if most SynMark tests fail, but no other benchmarks fail.
OK, we can have a try

Eero, do you also reproduce it? I shared with you the GuC installation procedure.
Comment 14 Eero Tamminen 2016-10-07 08:46:11 UTC
(In reply to cprigent from comment #13)
> Eero, do you also reproduce it? I shared with you the GuC installation
> procedure.

While I can comment on what needs to be tested in order to narrow down this bug (e.g. to what component is responsible), I'm not planning on testing it myself (don't have the time right now).
Comment 15 cprigent 2016-10-14 13:38:18 UTC
Good news, I don't reproduce it with a fresh setup.

Platform: KABY LAKE-U
Processor : Genuine Intel(R) CPU 0000 @ 1.80GHz (cpu family: 6, model: 142, stepping: 9)
MCP : KBL-U J0 2+3e
QDF : QL9J
PCH: PCH-LP C1
CRB : KABY LAKE U DDR3L RVP7

Software
BIOS: 45.1 3KBLSE2R1.R00.X045.P01.1606291634 from https://ubit-artifactory-ba.intel.com/artifactory/owr-repos/Submissions/ifwi/KBL_ORANGE_IFWI_2016_WW27_3_03_SR'17/
ME FW: 11.6.0.1065
EC FW: 1.24
KSC: 1.24
Linux distribution: Ubuntu 16.04 64 bits
DMC 1.01 from https://01.org/sites/default/files/downloads/intelr-graphics-linux/kbldmcver101.tar.bz2
GuC 9.14 from http://rdvivi-hillsboro.jf.intel.com/firmware/kbl_guc_ver9_14.tar.bz2 
Kernel: 4.8.0 f35ed31 from http://cgit.freedesktop.org/drm-intel/
  commit f35ed31aea66b3230c366fcba5f3456ae2cb956e
  Author: Jani Nikula <jani.nikula@intel.com>
  Date:   Mon Oct 10 14:29:09 2016 +0300
  drm-intel-nightly: 2016y-10m-10d-11h-28m-51s UTC integration manifest
libdrm-2.4.71 a44c9c3 from git://anongit.freedesktop.org/mesa/drm
mesa: mesa-12.0.0 8b06176 from git://anongit.freedesktop.org/mesa/mesa
cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo
xorg-server-1.18.99.901-76 97a8353 from git://git.freedesktop.org/git/xorg/xserver
xf86-video-intel 2.99.917-712 696f58f from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
libva-1.7.2-38 3b7e499 from git://git.freedesktop.org/git/vaapi/libva 
vaapi-intel-driver: 1.7.2-133 dd73514 from git://git.freedesktop.org/git/vaapi/intel-driver
SynMark2 from http://benchsrv.fi.intel.com/archive/benchmarks/SynMark2-7.0.tar.gz

External screen: DELL U2311Hb (DP)
Comment 16 cprigent 2016-10-14 13:38:28 UTC
So closed
Comment 17 Eero Tamminen 2016-10-14 14:12:43 UTC
(In reply to cprigent from comment #15)
> Good news, I don't reproduce it with a fresh setup.

What changed with the "fresh setup" compared to the other setup?
Comment 18 cprigent 2016-10-14 15:11:36 UTC
Bad news. I just reproduced it.
I didn't save the logs associated to comment 15 but I'm quite sure the GuC was not loaded.
Comment 19 cprigent 2016-10-14 15:59:48 UTC
(In reply to Eero Tamminen from comment #12)
> And if most of those fail, test also other benchmarks.  It would be really
> peculiar if most SynMark tests fail, but no other benchmarks fail.

GuC Loaded:
Heaven: Not reproduced (launched twice)
Valley: DUT freezed (need to unplug charger, logs lost after reboot), then not reproduced the 2nd try
gfxbench3_desktop: DUT also freezed (motherboard indicates a crash, need to unplug charger, logs lost after reboot)

GuC not Loaded:
Synmark2: no problem
Heaven: no problem
Valley: no problem
gfxbench3_desktop: no problem
Comment 20 Elio 2016-10-21 15:11:09 UTC
Re-tested with latest nightly kernel, same behavior, should we exclude or blacklist those chrashing tests?
commit 21787266bd182df4c0d2067cf1b5c2379f61c24d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Oct 19 16:40:09 2016 +0100

    drm-intel-nightly: 2016y-10m-19d-15h-39m-22s UTC integration manifest


Kernel version : 4.9.0-rc1-fb0d8ec
Architecture : source amd64 all
Comment 21 Jani Saarinen 2016-10-21 15:12:06 UTC
Created attachment 127451 [details]
attachment-17355-0.html

Hi
Thank you for your email.
I am OoO returning 24th of Oct 2016.
If any urgent contact Kimmo Nikkanen.

Br,
Jani
Comment 22 Elio 2016-12-14 17:42:36 UTC
Latest configuration for GFX is working without problems + latest patches for firmware. Guc 9.14 and 2.00.18 Huc:

 (Graphic Stack) Intel® Graphics for Linux* | 01.org

============================================
 Software information
============================================
Kernel version                  : 4.9.0-rc8latestfirmware+
Linux distribution              : Ubuntu 16.04.1 LTS
Architecture                    : 64-bit
Gfx stack code                  : 2504012391
Mesa version                    : 13.0.2 (git-c9e993b
xf86-video-intel version        :
Xorg-Xserver version            : 1.19.0
DRM version                     : 2.4.74
VAAPI version                   : Intel i965 driver for Intel(R) Kabylake - 1.7.3
Cairo version                   : 1.15.2
Intel GPU Tools version         : Tag [intel-gpu-tools-1.17] / Commit [e631bb5]
Kernel driver in use            : i915
Hardware acceleration           : Enabled
Bios revision                   : 52.10
Bios release date               : 10/05/2016
KSC revision                    : 1.24

============================================
 Firmwares information
============================================
DMC fw loaded                   : yes
DMC version                     : 1.1
GUC fw loaded                   : SUCCESS
GUC version wanted              : 9.14
GUC version found               : 9.14

 Kernel parameters
============================================
 drm.debug=0xe i915.enable_guc_loading=2 i915.enable_guc_submission=0 quiet splash


Works for me without problems including performance behavior.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.