Bug 110338 - [CI][SHARDS] igt@gem_exec_schedule@wide-render - incomplete - System hang
Summary: [CI][SHARDS] igt@gem_exec_schedule@wide-render - incomplete - System hang
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Mika Kuoppala
QA Contact: Intel GFX Bugs mailing list
Whiteboard: ReadyForDev
: 110464 (view as bug list)
Depends on:
Reported: 2019-04-05 14:36 UTC by Lakshmi
Modified: 2019-04-17 14:43 UTC (History)
2 users (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Description Lakshmi 2019-04-05 14:36:11 UTC

<6> [1621.526807] [IGT] gem_exec_schedule: executing
<5> [1621.534836] Setting dangerous option reset - tainting kernel
<5> [1621.540859] Setting dangerous option reset - tainting kernel
<5> [1621.541193] Setting dangerous option reset - tainting kernel
<6> [1621.545370] [IGT] gem_exec_schedule: starting subtest wide-render
<7> [1621.546691] [drm:vgem_gem_dumb_create [vgem]] Created object of size 1
<7> [1621.599263] [drm:vgem_gem_dumb_create [vgem]] Created object of size 1
<7> [1624.441222] [drm:edp_panel_vdd_off_sync [i915]] Turning eDP port A VDD off
<7> [1624.441375] [drm:edp_panel_vdd_off_sync [i915]] PP_STATUS: 0x80000008 PP_CONTROL: 0x00000067
<7> [1624.441400] [drm:intel_power_well_disable [i915]] disabling DC off
<7> [1624.441429] [drm:skl_enable_dc6 [i915]] Enabling DC6
<7> [1624.441452] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02
Comment 1 CI Bug Log 2019-04-05 14:37:01 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@gem_exec_schedule@wide-render - incomplete - No proper logs, System hang
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5871/shard-iclb6/igt@gem_exec_schedule@wide-render.html
Comment 2 Chris Wilson 2019-04-05 14:44:42 UTC
<0>[ 1627.616043]   <idle>-0       4.Ns1 1627655401us : execlists_submission_tasklet: rcs0 awake?=1, active=5
<0>[ 1627.616072]   <idle>-0       4dNs2 1627655401us : process_csb: rcs0 cs-irq head=0, tail=1
<0>[ 1627.616099]   <idle>-0       4dNs2 1627655401us : process_csb: rcs0 csb[1]: status=0x10000014:0x00008040, active=0x5
<0>[ 1627.616128]   <idle>-0       4dNs2 1627655402us : process_csb: rcs0 out[0]: ctx=32832.1, fence ce4:10 (current 10), prio=5
<0>[ 1627.616157]   <idle>-0       4dNs2 1627655402us : process_csb: rcs0 completed ctx=32832
<0>[ 1627.616187]   <idle>-0       4dNs2 1627655403us : __i915_request_submit: rcs0 fence 8e6:12 -> current 10
<0>[ 1627.616216]   <idle>-0       4dNs2 1627655404us : __execlists_submission_tasklet: rcs0 in[1]:  ctx=128.1, fence 8e6:12 (current 10), prio=5
<0>[ 1627.616246]   <idle>-0       4dNs2 1627655405us : __execlists_submission_tasklet: rcs0 in[0]:  ctx=96.2, fence 8e5:12 (current 10), prio=5
<0>[ 1627.616277]   <idle>-0       4.Ns1 1627655418us : execlists_submission_tasklet: rcs0 awake?=1, active=1
<0>[ 1627.616305]   <idle>-0       4dNs2 1627655419us : process_csb: rcs0 cs-irq head=1, tail=2
<0>[ 1627.616332]   <idle>-0       4dNs2 1627655419us : process_csb: rcs0 csb[2]: status=0x10000018:0x00008020, active=0x1
<0>[ 1627.616361]   <idle>-0       4dNs2 1627655420us : process_csb: rcs0 out[0]: ctx=96.2, fence 8e5:12 (current 12), prio=5
<0>[ 1627.616389]   <idle>-0       4dNs2 1627655753us : process_csb: process_csb:1098 GEM_BUG_ON(buf[2 * head + 1] != port->context_id)
Comment 3 Francesco Balestrieri 2019-04-08 09:58:30 UTC
According to Chris this may be a resurgence of Bug 108315. Mika, before I reopen that one can you take a look and see if you agree?

Also, these patches may help with this issue so they should be looked at

Comment 4 Jani Saarinen 2019-04-11 06:16:31 UTC
There has been also BIOS updates for shards that should improve stability.
Comment 5 Lakshmi 2019-04-11 07:53:24 UTC
The impact of this bug is that a panic can occur on high load and the probability of this panic is very low as this failure is seen only once. 
Setting the priority to Medium.
Comment 6 Chris Wilson 2019-04-11 08:28:02 UTC
Applied Mika's patches to increase CSB size that should put us on an equal footing with HW validation paths...

commit 632c7ad6f4509767fcbe76ac034747b4a30a13d9 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Fri Apr 5 21:46:57 2019 +0100

    drm/i915/icl: Switch to using 12 deep CSB status FIFO
    Now when we can support variable csb fifo sizes, disable legacy mode.
    By disabling legacy we hope to get better hw testing coverage by
    assuming everyone else have switched over.
    v2: rebase
    References: https://bugs.freedesktop.org/show_bug.cgi?id=110338
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Cc: Kelvin Gardiner <kelvin.gardiner@intel.com>
    Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190405204657.12887-2-chris@chris-wilson.co.uk

* fingers crossed.
Comment 7 Chris Wilson 2019-04-17 14:43:09 UTC
*** Bug 110464 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.