Bug 109567 - [CI][BAT] igt@i915_selftest@live_execlists - incomplete - possible circular locking dependency detected
Summary: [CI][BAT] igt@i915_selftest@live_execlists - incomplete - possible circular l...
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-06 15:08 UTC by Martin Peres
Modified: 2019-05-06 10:13 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features: display/Other


Attachments

Description Martin Peres 2019-02-06 15:08:19 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5542/fi-icl-u3/igt@i915_selftest@live_execlists.html

<0>[  678.302656] ---------------------------------
<0>[  678.302659] Kernel Offset: disabled
<4>[  678.302662] CPU: 2 PID: 32 Comm: khungtaskd Tainted: G     U  W         5.0.0-rc5-CI-CI_DRM_5542+ #1
<4>[  678.302664] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.2402.AD3.1810170014 10/17/2018
<4>[  678.302665] Call Trace:
<4>[  678.302670]  dump_stack+0x67/0x9b
<4>[  678.302677]  panic+0x12b/0x29d
<4>[  678.302683]  watchdog+0x36c/0x610
<4>[  678.302687]  ? hungtask_pm_notify+0x40/0x40
<4>[  678.302690]  kthread+0x119/0x130
<4>[  678.302692]  ? kthread_park+0x80/0x80
<4>[  678.302696]  ret_from_fork+0x3a/0x50
<4>[  678.302701] 
<4>[  678.302702] ======================================================
<4>[  678.302702] WARNING: possible circular locking dependency detected
<4>[  678.302703] 5.0.0-rc5-CI-CI_DRM_5542+ #1 Tainted: G     U           
<4>[  678.302703] ------------------------------------------------------
<4>[  678.302704] rs:main Q:Reg/439 is trying to acquire lock:
<4>[  678.302704] 0000000076c65446 ((console_sem).lock){-.-.}, at: down_trylock+0xa/0x30
<4>[  678.302706] 
<4>[  678.302706] but task is already holding lock:
<4>[  678.302706] 000000001735c509 (&rq->lock){-.-.}, at: try_to_wake_up+0x1e0/0x5f0
<4>[  678.302708] 
<4>[  678.302708] which lock already depends on the new lock.
<4>[  678.302709] 
<4>[  678.302709] 
<4>[  678.302710] the existing dependency chain (in reverse order) is:
<4>[  678.302710] 
<4>[  678.302710] -> #2 (&rq->lock){-.-.}:
<4>[  678.302712]        task_fork_fair+0x36/0x160
<4>[  678.302712]        sched_fork+0x118/0x220
<4>[  678.302713]        copy_process.part.6+0x7b3/0x2220
<4>[  678.302713]        _do_fork+0xe2/0x6b0
<4>[  678.302713]        kernel_thread+0x20/0x30
<4>[  678.302714]        rest_init+0x1d/0x250
<4>[  678.302714]        start_kernel+0x499/0x4b9
<4>[  678.302715]        secondary_startup_64+0xa4/0xb0
<4>[  678.302715] 
<4>[  678.302715] -> #1 (&p->pi_lock){-.-.}:
<4>[  678.302717]        try_to_wake_up+0x37/0x5f0
<4>[  678.302717]        up+0x3b/0x50
<4>[  678.302717]        __up_console_sem+0x2e/0x50
<4>[  678.302718]        console_unlock+0x311/0x600
<4>[  678.302718]        vprintk_emit+0xfe/0x320
<4>[  678.302719]        printk+0x4d/0x69
<4>[  678.302719]        drm_dbg+0x7f/0x90
<4>[  678.302719]        drm_helper_probe_single_connector_modes+0x35a/0x6e0
<4>[  678.302720]        drm_setup_crtcs+0x156/0xc90
<4>[  678.302720]        __drm_fb_helper_initial_config_and_unlock+0x3e/0x570
<4>[  678.302721]        gen11_dsi_pre_enable+0x112f/0x1350 [i915]
<4>[  678.302721]        async_run_entry_fn+0x34/0x160
<4>[  678.302721]        process_one_work+0x245/0x610
<4>[  678.302722]        worker_thread+0x37/0x380
<4>[  678.302722]        kthread+0x119/0x130
<4>[  678.302722]        ret_from_fork+0x3a/0x50
<4>[  678.302723] 
<4>[  678.302723] -> #0 ((console_sem).lock){-.-.}:
<4>[  678.302724]        _raw_spin_lock_irqsave+0x33/0x50
<4>[  678.302725]        down_trylock+0xa/0x30
<4>[  678.302725]        __down_trylock_console_sem+0x20/0x80
<4>[  678.302726]        console_trylock+0xe/0x60
<4>[  678.302726]        vprintk_emit+0xf1/0x320
<4>[  678.302726]        printk+0x4d/0x69
<4>[  678.302727]        __warn_printk+0x46/0x90
<4>[  678.302727]        native_smp_send_reschedule+0x2f/0x40
<4>[  678.302727]        check_preempt_curr+0x81/0xa0
<4>[  678.302728]        ttwu_do_wakeup+0x14/0x220
<4>[  678.302728]        try_to_wake_up+0x216/0x5f0
<4>[  678.302729]        wake_up_q+0x4f/0x70
<4>[  678.302729]        futex_wake+0x152/0x170
<4>[  678.302729]        do_futex+0x45e/0xb10
<4>[  678.302730]        __se_sys_futex+0x12e/0x180
<4>[  678.302730]        do_syscall_64+0x55/0x190
<4>[  678.302730]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  678.302731] 
<4>[  678.302731] other info that might help us debug this:
<4>[  678.302731] 
<4>[  678.302732] Chain exists of:
<4>[  678.302732]   (console_sem).lock --> &p->pi_lock --> &rq->lock
<4>[  678.302734] 
<4>[  678.302735]  Possible unsafe locking scenario:
<4>[  678.302735] 
<4>[  678.302735]        CPU0                    CPU1
<4>[  678.302736]        ----                    ----
<4>[  678.302736]   lock(&rq->lock);
<4>[  678.302737]                                lock(&p->pi_lock);
<4>[  678.302738]                                lock(&rq->lock);
<4>[  678.302739]   lock((console_sem).lock);
<4>[  678.302740] 
<4>[  678.302740]  *** DEADLOCK ***
<4>[  678.302740]
Comment 2 Martin Peres 2019-04-23 11:56:31 UTC
Raising to highest, since this is seen in BAT and circular locking dependencies issues have a high user impact!
Comment 3 Martin Peres 2019-04-23 11:58:24 UTC
Changing the feature to display/Other since this is not only seen on DSI systems (also seen on shard-icl).
Comment 4 Jani Saarinen 2019-04-24 19:45:13 UTC
This has been seen only on icl-y lately and 2 months back on shards, why highest?
Comment 5 James Ausmus 2019-04-24 22:36:20 UTC
The ICL-U failures seen in the filter are not the circular locking issue that exists on the ICL-Y MIPI-DSI code paths.

The most recent ICL-U failures are almost 2 months old, and are GPU hangs, rather than circular locking issues. Recommend we split a separate bug for the ICL-U GPU hangs, and keep this one only for ICL-Y circular locking, then update the CI filters appropriately.
Comment 6 Jani Saarinen 2019-05-06 10:11:18 UTC
on ICL-Y disabled one setting on BIOS to remove irq storm seen on ICL-U and now made some change on ICL-Y as no newer BIOS available.
Comment 7 Martin Peres 2019-05-06 10:12:44 UTC
(In reply to James Ausmus from comment #5)
> The ICL-U failures seen in the filter are not the circular locking issue
> that exists on the ICL-Y MIPI-DSI code paths.
> 
> The most recent ICL-U failures are almost 2 months old, and are GPU hangs,
> rather than circular locking issues. Recommend we split a separate bug for
> the ICL-U GPU hangs, and keep this one only for ICL-Y circular locking, then
> update the CI filters appropriately.

Indeed, this issue was visible around every 10 runs, then no reproduction for 239 runs (other failures are unrelated).

I moved the this test's incomplete to https://bugs.freedesktop.org/show_bug.cgi?id=107713, and we can close this bug.

Thanks!
Comment 8 CI Bug Log 2019-05-06 10:13:12 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.
Comment 9 Jani Saarinen 2019-05-06 10:13:36 UTC
So no BIOS update has been delivered to us, but we verified that we implemented the recommended workaround for the IRQ storm: disable i2c 4 & 5 in the BIOS settings.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.