Bug 111381

Summary: [CI][DRMTIP] igt@sw_sync@sync_multi_producer_single_consumer - incomplete - BUG: kernel NULL pointer dereference, address: 0000000000000000
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: CFL i915 features:

Description Lakshmi 2019-08-12 14:04:48 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_341/fi-cfl-8109u/igt@sw_sync@sync_multi_producer_single_consumer.html
<6> [303.027820] Console: switching to colour dummy device 80x25
<6> [303.027989] [IGT] sw_sync: executing
<6> [303.041820] [IGT] sw_sync: starting subtest sync_multi_producer_single_consumer
<1> [303.379711] BUG: kernel NULL pointer dereference, address: 0000000000000000
<1> [303.379721] #PF: supervisor read access in kernel mode
<1> [303.379728] #PF: error_code(0x0000) - not-present page
<6> [303.379734] PGD 0 P4D 0 
<4> [303.379739] Oops: 0000 [#1] PREEMPT SMP PTI
<4> [303.379746] CPU: 3 PID: 1287 Comm: sw_sync Tainted: G     U            5.3.0-rc3-gc590f3dd9c36-drmtip_341+ #1
<4> [303.379756] Hardware name: Intel Corporation NUC8i3BEH/NUC8BEB, BIOS BECFL357.86A.0056.2018.1128.1717 11/28/2018
<4> [303.379769] RIP: 0010:dma_fence_signal_locked+0x3e/0x1d0
<4> [303.379776] Code: c0 75 7c 4d 85 ed 0f 84 a4 00 00 00 f0 49 0f ba 6d 38 00 41 bf ea ff ff ff 0f 83 b3 00 00 00 49 8b 5d 10 4d 8d 75 10 4c 39 f3 <4c> 8b 23 48 89 dd 74 3f 48 89 ef e8 72 49 e3 ff 84 c0 74 0e 48 8b
<4> [303.379791] RSP: 0018:ffffb14bc096bd78 EFLAGS: 00010017
<4> [303.379796] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
<4> [303.379801] RDX: ffff95de6c485040 RSI: 00000000ffffffff RDI: 0000000000000046
<4> [303.379808] RBP: ffff95de644fd0c8 R08: 0000000000000000 R09: 0000000000000001
<4> [303.379816] R10: 000000003345050d R11: 0000000000000000 R12: ffff95de644fd078
<4> [303.379823] R13: ffff95de64a68008 R14: ffff95de64a68018 R15: 0000000000000000
<4> [303.379831] FS:  00007f663d899700(0000) GS:ffff95de75b80000(0000) knlGS:0000000000000000
<4> [303.379837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [303.379842] CR2: 0000000000000000 CR3: 00000002a3400001 CR4: 00000000003606e0
<4> [303.379848] Call Trace:
<4> [303.379854]  sync_timeline_signal+0xa3/0x1a0
<4> [303.379859]  sw_sync_ioctl+0x1a8/0x330
<4> [303.379866]  do_vfs_ioctl+0xa0/0x6f0
<4> [303.379872]  ? __fget+0x10f/0x200
<4> [303.379877]  ksys_ioctl+0x35/0x60
<4> [303.379882]  __x64_sys_ioctl+0x11/0x20
<4> [303.379888]  do_syscall_64+0x55/0x1c0
<4> [303.379894]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [303.379901] RIP: 0033:0x7f6649f5a5d7
<4> [303.379907] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
<4> [303.379925] RSP: 002b:00007f663d898c18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4> [303.379934] RAX: ffffffffffffffda RBX: 0000000000000440 RCX: 00007f6649f5a5d7
<4> [303.379941] RDX: 00007f663d898c5c RSI: 0000000040045701 RDI: 0000000000000007
<4> [303.379949] RBP: 00007f663d898c5c R08: 0000000000000000 R09: 00007f663d899700
<4> [303.379957] R10: 0000000000000056 R11: 0000000000000246 R12: 0000000040045701
<4> [303.379964] R13: 0000000000000007 R14: 0000000000001000 R15: 00007fff0fc1cd34
<4> [303.379975] Modules linked in: mei_hdcp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel x86_pkg_temp_thermal snd_intel_nhlt coretemp e1000e crct10dif_pclmul snd_hda_codec crc32_pclmul btusb btrtl btbcm ghash_clmulni_intel snd_hwdep btintel snd_hda_core bluetooth snd_pcm ptp pps_core ecdh_generic ecc mei_me mei prime_numbers
<0> [303.380016] Dumping ftrace buffer:
Comment 1 CI Bug Log 2019-08-12 14:06:46 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* CFL: igt@sw_sync@sync_multi_producer_single_consumer - incomplete - BUG: kernel NULL pointer dereference, address: 0000000000000000
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_341/fi-cfl-8109u/igt@sw_sync@sync_multi_producer_single_consumer.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_340/fi-cfl-8109u/igt@sw_sync@sync_multi_producer_single_consumer.html
Comment 2 Chris Wilson 2019-08-12 14:27:32 UTC
dma_fence_signal_locked+0x3e

int dma_fence_signal_locked(struct dma_fence *fence)
{
        struct dma_fence_cb *cur, *tmp;
        int ret = 0;

        lockdep_assert_held(fence->lock);

        if (WARN_ON(!fence))
                return -EINVAL;

        if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
                ret = -EINVAL;

                /*
                 * we might have raced with the unlocked dma_fence_signal,
                 * still run through all callbacks
                 */
        } else {
                fence->timestamp = ktime_get();
                set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
                trace_dma_fence_signaled(fence);
        }

        list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
                list_del_init(&cur->node);
                cur->func(fence, cur);
        }
        return ret;
}

Doesn't make much sense to have a NULL later in the function; except at say the fence->cb_list. However, my kernel puts 0x3e at the test_and_set_bit().

Sensible guess would be either en element was freed from the cb_list.
Comment 3 Chris Wilson 2019-08-12 15:47:02 UTC
https://patchwork.freedesktop.org/series/65092/
Comment 4 Chris Wilson 2019-08-13 06:59:37 UTC
commit d3c6dd1fb30d3853c2012549affe75c930f4a2f9 (HEAD -> drm-misc-next, drm-misc/for-linux-next, drm-misc/drm-misc-next)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Aug 12 16:42:47 2019 +0100

    dma-buf/sw_sync: Synchronize signal vs syncpt free
    
    During release of the syncpt, we remove it from the list of syncpt and
    the tree, but only if it is not already been removed. However, during
    signaling, we first remove the syncpt from the list. So, if we
    concurrently free and signal the syncpt, the free may decide that it is
    not part of the tree and immediately free itself -- meanwhile the
    signaler goes on to use the now freed datastructure.
    
    In particular, we get struck by commit 0e2f733addbf ("dma-buf: make
    dma_fence structure a bit smaller v2") as the cb_list is immediately
    clobbered by the kfree_rcu.
    
    v2: Avoid calling into timeline_fence_release() from under the spinlock
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111381
    Fixes: d3862e44daa7 ("dma-buf/sw-sync: Fix locking around sync_timeline lists")
    References: 0e2f733addbf ("dma-buf: make dma_fence structure a bit smaller v2")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Sumit Semwal <sumit.semwal@linaro.org>
    Cc: Sean Paul <seanpaul@chromium.org>
    Cc: Gustavo Padovan <gustavo@padovan.org>
    Cc: Christian König <christian.koenig@amd.com>
    Cc: <stable@vger.kernel.org> # v4.14+
    Acked-by: Christian König <christian.koenig@amd.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190812154247.20508-1-chris@chris-wilson.co.uk
Comment 5 CI Bug Log 2019-08-23 11:25:45 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* CFL: igt@gem_ctx_switch@legacy-render - incomplete
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-cml-u/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-cml-u2/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-icl-dsi/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-icl-guc/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-icl-u2/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-icl-u3/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-icl-u4/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14044/fi-skl-gvtdvm/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4806/fi-bsw-n3050/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4810/fi-bsw-n3050/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4816/fi-bsw-n3050/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4818/fi-bsw-n3050/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4828/fi-bsw-kefka/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4828/fi-bsw-n3050/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4820/fi-bsw-n3050/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4820/fi-icl-guc/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4830/fi-bsw-kefka/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4830/fi-bsw-n3050/igt@gem_ctx_switch@legacy-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6755/fi-cfl-8109u/igt@gem_ctx_switch@legacy-render.html
Comment 6 Chris Wilson 2019-08-23 12:05:52 UTC
(In reply to CI Bug Log from comment #5)
> The CI Bug Log issue associated to this bug has been updated.
> 
> ### New filters associated
> 
> * CFL: igt@gem_ctx_switch@legacy-render - incomplete
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-cml-u/
> igt@gem_ctx_switch@legacy-render.html

I don't see the connection.
Comment 7 CI Bug Log 2019-10-29 17:02:21 UTC
A CI Bug Log filter associated to this bug has been updated:

{- CFL: igt@gem_ctx_switch@legacy-render - incomplete -}
{+ CFL: igt@gem_ctx_switch@legacy-render - incomplete +}


  No new failures caught with the new filter
Comment 8 Lakshmi 2019-10-29 17:09:34 UTC
(In reply to CI Bug Log from comment #5)
> The CI Bug Log issue associated to this bug has been updated.
> 
> ### New filters associated
> 
> * CFL: igt@gem_ctx_switch@legacy-render - incomplete
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-cml-u/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-cml-u2/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-icl-dsi/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-icl-guc/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-icl-u2/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-icl-u3/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4795/fi-icl-u4/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14044/fi-skl-gvtdvm/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4806/fi-bsw-n3050/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4810/fi-bsw-n3050/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4816/fi-bsw-n3050/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4818/fi-bsw-n3050/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4828/fi-bsw-kefka/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4828/fi-bsw-n3050/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4820/fi-bsw-n3050/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4820/fi-icl-guc/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4830/fi-bsw-kefka/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4830/fi-bsw-n3050/
> igt@gem_ctx_switch@legacy-render.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6755/fi-cfl-8109u/
> igt@gem_ctx_switch@legacy-render.html

These failures are unrelated to this bug. Otherwise the original issue is fixed according to 10x rule. Closing and archiving the issue.
Comment 9 CI Bug Log 2019-10-29 17:09:54 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.