Bug 103231 - [KBL] GPU Hang observed while running CtsDeqpTestCases on KabyLake
Summary: [KBL] GPU Hang observed while running CtsDeqpTestCases on KabyLake
Status: CLOSED INVALID
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) other
: highest blocker
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-12 05:17 UTC by samiuddi
Modified: 2017-10-26 16:58 UTC (History)
5 users (show)

See Also:
i915 platform: KBL
i915 features: GPU hang


Attachments
GPU crash dump (/sys/class/drm/card0/error) (19.54 KB, text/plain)
2017-10-12 05:17 UTC, samiuddi
no flags Details
dmesg (515.00 KB, text/plain)
2017-10-12 05:18 UTC, samiuddi
no flags Details

Description samiuddi 2017-10-12 05:17:40 UTC
Created attachment 134804 [details]
GPU crash dump (/sys/class/drm/card0/error)

GPU Hang is observed while running CtsDeqpTestCases on KabyLake(KBL) platform in android IA.

The hang occurs in test case
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23

when we run this test case individually it passes but it fails when we run complete DEQP.

The gpu crash dump and dmesg are attached.
Comment 1 samiuddi 2017-10-12 05:18:39 UTC
Created attachment 134805 [details]
dmesg
Comment 2 Jani Saarinen 2017-10-12 06:50:48 UTC
Kernel: Linux version 4.14.0-rc3-00002-g3b3aeb1c

[   15.179300] i915 0000:00:02.0: Direct firmware load for i915/kbl_guc_ver9_14.bin failed with error -2
[   15.179300] i915 0000:00:02.0: Falling back to user helper
[   15.179569] [drm] Failed to fetch valid uC firmware from i915/kbl_guc_ver9_14.bin (error -11)
[   15.181012] [drm:intel_uc_init_hw] *ERROR* GuC init failed
[   15.181012] [drm] Falling back from GuC submission to execlist mode
[   15.181012] [drm] GuC firmware loading disabled
[   15.196436] read descriptors
[   15.196438] read strings
[   15.261719] type=1400 audit(1507720816.960:13): avc: denied { wake_alarm } for pid=3127 comm="healthd" capability=35 scontext=u:r:healthd:s0 tcontext=u:r:healthd:s0 tclass=capability2 permissive=1
[   15.264518] vdc: 200 3164 Command succeeded
[   15.378108] [drm] RC6 on
[   15.380994] Unexpected send: action=0x6
[   15.385324] ------------[ cut here ]------------
[   15.390505] WARNING: CPU: 1 PID: 3131 at /home/samiuddi/KB/kernel/icl/drivers/gpu/drm/i915/intel_uc.c:499 intel_guc_send_nop+0x19/0x30
[   15.404018] Modules linked in: rfkill_gpio
[   15.408600] CPU: 1 PID: 3131 Comm: surfaceflinger Tainted: G     U          4.14.0-rc3-00002-g3b3aeb1c #1
[   15.419291] Hardware name: Intel Corporation Kabylake Client platform/Kabylake R DDR4 RVP, BIOS KBLSE2R1.R00.X081.P01.1703230239 03/23/2017
[   15.433285] task: ffff9c38f25a2080 task.stack: ffffbc45827a0000
[   15.439900] RIP: 0010:intel_guc_send_nop+0x19/0x30
[   15.445252] RSP: 0018:ffffbc45827a39a8 EFLAGS: 00010246
Comment 3 Jani Saarinen 2017-10-12 07:15:53 UTC
Some discussion has been offline. Please test with no guc too.
Comment 4 Chris Wilson 2017-10-12 07:50:28 UTC
More importantly, please use the upstream kernel before filing a bug against it.
Comment 5 Pallavi G 2017-10-12 07:52:23 UTC
we are using 4.14 RC2 kernel
Comment 6 Pallavi G 2017-10-12 08:02:18 UTC
drm-tip is on 4.14 RC4 we will update with that kernel.
If issue still exists we will reopen this BUG. Please let me know is that ok
Comment 7 Chris Wilson 2017-10-12 08:36:20 UTC
(In reply to Pallavi G from comment #5)
> we are using 4.14 RC2 kernel

The error state shows that you have patches above that, patches that I know affect execution...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.