Bug 110550 - [CI][SHARDS] igt@i915_selftest@mock_requests - incomplete - ODEBUG: free active (active state 0) object type: work_struct hint: __i915_gem_free_work+0x0/0x90 [i915]
Summary: [CI][SHARDS] igt@i915_selftest@mock_requests - incomplete - ODEBUG: free acti...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-29 14:30 UTC by Lakshmi
Modified: 2019-07-04 12:48 UTC (History)
1 user (show)

See Also:
i915 platform: SKL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lakshmi 2019-04-29 14:30:27 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6004/shard-skl6/igt@i915_selftest@mock_requests.html

<4> [135.075742]  ? i915_gem_free_object+0x110/0x110 [i915]
<4> [135.076399]  ? i915_gem_context_free+0xc1/0x240 [i915]
<4> [135.076878]  ? i915_gem_context_free+0xc1/0x240 [i915]
<4> [135.077208]  i915_gem_context_free+0xc1/0x240 [i915]
<3> [135.235091] ODEBUG: free active (active state 0) object type: work_struct hint: __i915_gem_free_work+0x0/0x90 [i915]
<4> [135.235654] CPU: 0 PID: 1037 Comm: i915_selftest Tainted: G     U  W         5.1.0-rc6-CI-CI_DRM_6004+ #1
<4> [135.237482]  i915_request_mock_selftests+0x2a/0x70 [i915]
<4> [135.238561]  i915_mock_selftests+0x27/0x50 [i915]
<4> [135.238989]  i915_init+0x12/0x73 [i915]
<4> [135.240929] i915_selftest/1037 is trying to acquire lock:
<4> [135.241228]        i915_request_mock_selftests+0x2a/0x70 [i915]
<4> [135.241236]        i915_mock_selftests+0x27/0x50 [i915]
<4> [135.241240]        i915_init+0x12/0x73 [i915]
<4> [135.241358] 1 lock held by i915_selftest/1037:
<4> [135.241386] CPU: 0 PID: 1037 Comm: i915_selftest Tainted: G     U  W         5.1.0-rc6-CI-CI_DRM_6004+ #1
<4> [135.241455]  ? __i915_gem_free_objects+0x720/0x720 [i915]
<4> [135.241462]  ? __i915_gem_free_objects+0x720/0x720 [i915]
<4> [135.241491]  i915_request_mock_selftests+0x2a/0x70 [i915]
<4> [135.241503]  i915_mock_selftests+0x27/0x50 [i915]
<4> [135.241506]  i915_init+0x12/0x73 [i915]
Comment 2 Chris Wilson 2019-04-29 15:37:34 UTC
First time without a prior bug, the essence of the bug is that we have an object freed via rcu at the same time as we are trying to flush the free workqueue. Which the workqueue code objects to, for no clear reason. In this case, maybe mock is a little too quick with its drain? Normally we only drain on module unload.
Comment 3 Chris Wilson 2019-05-01 19:44:00 UTC
commit dc76e5764a46ffb2e7f502a86b3288b5edcce191 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 1 14:57:51 2019 +0100

    drm/i915: Complete both freed-object passes before draining the workqueue
    
    The workqueue code complains viciously if we try to queue more work onto
    the queue while attampting to drain it. As we asynchronously free
    objects and defer their enqueuing with RCU, it is quite tricky to
    quiesce the system before attempting to drain the workqueue. Yet drain
    we must to ensure that the worker is idle before unloading the module.
    
    Give the freed object drain 3 whole passes with multiple rcu_barrier()
    to give the defer freeing of several levels each protected by RCU and
    needing a grace period before its parent can be freed, ultimately
    resulting in a GEM object being freed after another RCU period.
    
    A consequence is that it will make module unload even slower.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110550
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Matthew Auld <matthew.auld@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190501135753.8711-1-chris@chris-wilson.co.uk
Comment 4 Lakshmi 2019-05-23 14:50:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6092/shard-skl6/igt@i915_selftest@mock_objects.html

<4> [2092.272416]  ? i915_gem_free_object+0x110/0x110 [i915]
<3> [2092.307531] ODEBUG: free active (active state 0) object type: work_struct hint: __i915_gem_free_work+0x0/0x90 [i915]
<4> [2092.308094] CPU: 3 PID: 4341 Comm: i915_selftest Tainted: G     U  W         5.1.0-CI-CI_DRM_6092+ #1
<4> [2092.309901]  i915_gem_object_mock_selftests+0x34/0x40 [i915]
<4> [2092.311000]  i915_mock_selftests+0x27/0x50 [i915]
<4> [2092.311461]  i915_init+0x12/0x73 [i915]
<4> [2092.313570] i915_selftest/4341 is trying to acquire lock:
<4> [2092.313770]        drm_dbg+0x7f/0x90
<4> [2092.313903]        i915_gem_object_mock_selftests+0x34/0x40 [i915]
<4> [2092.313912]        i915_mock_selftests+0x27/0x50 [i915]
<4> [2092.313916]        i915_init+0x12/0x73 [i915]
<4> [2092.314053] 1 lock held by i915_selftest/4341:
<4> [2092.314084] CPU: 3 PID: 4341 Comm: i915_selftest Tainted: G     U  W         5.1.0-CI-CI_DRM_6092+ #1
<4> [2092.314159]  ? __i915_gem_free_objects+0x720/0x720 [i915]
<4> [2092.314167]  ? __i915_gem_free_objects+0x720/0x720 [i915]
<4> [2092.314198]  i915_gem_object_mock_selftests+0x34/0x40 [i915]
<4> [2092.314210]  i915_mock_selftests+0x27/0x50 [i915]
<4> [2092.314214]  i915_init+0x12/0x73 [i915]
Comment 5 CI Bug Log 2019-05-23 14:51:39 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL: igt@i915_selftest@mock_requests - incomplete - ODEBUG: free active (active state 0) object type: work_struct hint: __i915_gem_free_work+0x0/0x90 [i915] -}
{+ SKL: igt@i915_selftest@mock_requests|objects - incomplete - ODEBUG: free active (active state 0) object type: work_struct hint: __i915_gem_free_work+0x0/0x90 [i915] +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6092/shard-skl6/igt@i915_selftest@mock_objects.html
Comment 6 Chris Wilson 2019-07-04 12:48:55 UTC
Another stab,

commit 4fda44bf16b79a0b78fe36c6b9859e9ce2d09f43 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 3 18:19:13 2019 +0100

    drm/i915: Flush the workqueue before draining
    
    Trying to drain a workqueue while we may still be adding to it from
    background tasks is, according to kernel/workqueue.c, verboten. So, add
    a flush_workqueue() at the start of our cleanup procedure.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=110550
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190703171913.16585-4-chris@chris-wilson.co.uk


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.