https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5188/shard-skl1/igt@perf@gen8-unprivileged-single-ctx-counters.html Starting subtest: gen8-unprivileged-single-ctx-counters Received signal SIGQUIT. Stack trace: #0 [fatal_sig_handler+0xd6] #1 [killpg+0x40] #2 [waitpid+0x12] #3 [__waitpid+0x46] #4 [igt_wait_helper+0x11] #5 [gen8_test_single_ctx_render_target_writes_a_counter+0x168] #6 [__real_main4067+0x160e] #7 [main+0x27] #8 [__libc_start_main+0xe7] #9 [_start+0x2a]
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * SKL: igt@perf@gen8-unprivileged-single-ctx-counters - timeout - Received signal SIGQUIT. - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5188/shard-skl1/igt@perf@gen8-unprivileged-single-ctx-counters.html
A CI Bug Log filter associated to this bug has been updated: {- SKL: igt@perf@gen8-unprivileged-single-ctx-counters - timeout - Received signal SIGQUIT. -} {+ SKL: igt@perf@gen8-unprivileged-single-ctx-counters| igt@gem_softpin@softpin| igt@gem_pipe_control_store_loop@reused-buffer - timeout - Received signal SIGQUIT. +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6977/shard-skl7/igt@gem_pipe_control_store_loop@reused-buffer.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6978/shard-skl6/igt@gem_softpin@softpin.html
A CI Bug Log filter associated to this bug has been updated: {- SKL: igt@perf@gen8-unprivileged-single-ctx-counters| igt@gem_softpin@softpin| igt@gem_pipe_control_store_loop@reused-buffer - timeout - Received signal SIGQUIT. -} {+ SKL: random tests - timeout - Received signal SIGQUIT. +} No new failures caught with the new filter
This issue is being observed 1-2 times every week since Sep 17, 2019 on different SKL shard machines. From the most recent log it seems like there is a deadlock between some of the threads.
Showing all locks held in the system: <4> [555.972811] 1 lock held by khungtaskd/34: <4> [555.972830] #0: ffffffff82247a60 (rcu_read_lock){....}, at: debug_show_all_locks+0xe/0x1a0 <4> [555.972900] 3 locks held by kworker/3:2/160: <4> [555.972919] #0: ffff88817a08f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620 <4> [555.972972] #1: ffffc90000f17e78 ((work_completion)(&(&engine->heartbeat.work)->work)){+.+.}, at: process_one_work+0x1de/0x620 <4> [555.973024] #2: ffff888172908e70 (&kernel#2){+.+.}, at: heartbeat+0x3f7/0x5f0 [i915] <4> [555.973427] 4 locks held by kworker/3:3/392: <4> [555.973445] #0: ffff88817a08f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620 <4> [555.973497] #1: ffffc9000034be78 ((work_completion)(&(>->requests.retire_work)->work)){+.+.}, at: process_one_work+0x1de/0x620 <4> [555.973547] #2: ffff88817993ac70 (&timeline->mutex){+.+.}, at: intel_gt_retire_requests_timeout+0xe2/0x540 [i915] <4> [555.973857] #3: ffff88816e17d928 (&ce->pin_mutex/1){+.+.}, at: intel_context_unpin+0x54/0x100 [i915] <4> [555.974242] 4 locks held by in:imklog/403: <4> [555.974262] 4 locks held by rs:main Q:Reg/404: <4> [555.974288] 4 locks held by dmesg/963: <4> [555.974308] 3 locks held by kworker/1:1H/1084: <4> [555.974324] #0: ffff888179630f28 ((wq_completion)kblockd){+.+.}, at: process_one_work+0x1de/0x620 <4> [555.974368] #1: ffffc9000070be78 ((work_completion)(&(&hctx->run_work)->work)){+.+.}, at: process_one_work+0x1de/0x620 <4> [555.974415] #2: ffff888176da6e58 (hctx->srcu){....}, at: hctx_lock+0x5b/0xe0 <4> [555.974480] 5 locks held by perf/1180: <4> [555.974499] #0: ffff88815810ba80 (&perf->lock){+.+.}, at: i915_perf_open_ioctl+0x242/0x6f0 [i915] <4> [555.974873] #1: ffff888170b93478 (&ctx->engines_mutex){+.+.}, at: lrc_configure_all_contexts+0x26b/0x880 [i915] <4> [555.975330] #2: ffff88816e17d928 (&ce->pin_mutex){+.+.}, at: lrc_configure_all_contexts+0x2d7/0x880 [i915] <4> [555.975693] #3: ffff888172908e70 (&kernel#2){+.+.}, at: i915_request_create+0x16/0x1b0 [i915] <4> [555.976136] #4: ffff88817993ac70 (&timeline->mutex/1){+.+.}, at: intel_context_prepare_remote_request+0x3f/0x180 [i915] <4> [555.976485] Looks like lock inversion between the loop to update all contexts and timeline retiring.
*** Bug 112364 has been marked as a duplicate of this bug. ***
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * BLB GLK: Random tests - timeout - Showing all locks held in the system - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_5357/shard-glk5/igt@perf@oa-exponents.html - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5303/shard-glk5/igt@perf@oa-exponents.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_408/fi-blb-e6850/igt@gem_linear_blits@normal.html - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5300/shard-glk2/igt@perf@gen8-unprivileged-single-ctx-counters.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15390/shard-glk2/igt@perf@gen8-unprivileged-single-ctx-counters.html
A CI Bug Log filter associated to this bug has been updated: {- BLB GLK: Random tests - timeout - Showing all locks held in the system -} {+ BLB APL SKL KBL GLK ICL: Random tests - timeout - Received signal SIGQUIT, Showing all locks held in the system +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7417/shard-iclb5/igt@perf@oa-exponents.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7418/shard-iclb1/igt@perf@oa-exponents.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7419/shard-apl7/igt@perf@oa-exponents.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7419/shard-kbl7/igt@perf@oa-exponents.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7419/shard-skl8/igt@perf@oa-exponents.html
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * APL KBL: igt@runner@aborted - fail - Previous test: perf (oa-exponents) - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_5366/shard-apl6/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15413/shard-kbl2/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15322/shard-kbl1/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15422/shard-kbl2/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15423/shard-kbl6/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15426/shard-kbl6/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15428/shard-kbl1/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_5371/shard-kbl6/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7419/shard-apl7/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7419/shard-kbl7/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7420/shard-kbl2/igt@runner@aborted.html
commit 58b4c1a07ada7fb91ea757bdb9bd47df02207357 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 26 06:55:21 2019 +0000 drm/i915: Reduce nested prepare_remote_context() to a trylock On context retiring, we may invoke the kernel_context to unpin this context. Elsewhere, we may use the kernel_context to modify this context. This currently leads to an AB-BA lock inversion, so we need to back-off from the contended lock, and repeat. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111732 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: a9877da2d629 ("drm/i915/oa: Reconfigure contexts on the fly") Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191126065521.2331017-1-chris@chris-wilson.co.uk
A CI Bug Log filter associated to this bug has been updated: {- BLB APL SKL KBL GLK ICL: Random tests - timeout - Received signal SIGQUIT, Showing all locks held in the system -} {+ BLB APL SKL KBL GLK ICL: Random tests - timeout - Received signal SIGQUIT, Showing all locks held in the system +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7423/shard-skl10/igt@perf@enable-disable.html
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.