Bug 108691

Summary: [CI][SHARDS] igt@gem_exec_reuse@baggage - dmesg-warn - ODEBUG: active_state not available
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BXT i915 features:

Description Martin Peres 2018-11-08 09:00:34 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5097/shard-apl5/igt@gem_exec_reuse@baggage.html

<4> [809.271737] ------------[ cut here ]------------
<3> [809.272102] ODEBUG: active_state not available (active state 0) object type: rcu_head hint:           (null)
<4> [809.272195] WARNING: CPU: 3 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x67/0x90
<4> [809.272256] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 lpc_ich r8169 snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei pinctrl_broxton pinctrl_intel prime_numbers
<4> [809.272463] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G     U            4.20.0-rc1-CI-CI_DRM_5097+ #1
<4> [809.272529] Hardware name:  /NUC6CAYB, BIOS AYAPLCEL.86A.0050.2018.0521.1533 05/21/2018
<4> [809.272589] RIP: 0010:debug_print_object+0x67/0x90
<4> [809.272627] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 eb 9d 22 02 8b 53 10 4c 89 e6 48 c7 c7 10 6f 0d 82 48 8b 14 d5 c0 d8 e6 81 e8 c9 7c be ff <0f> 0b 5b 83 05 93 ab e9 00 01 5d 41 5c c3 83 05 88 ab e9 00 01 c3
<4> [809.272758] RSP: 0018:ffff880277b83e60 EFLAGS: 00010082
<4> [809.272799] RAX: 0000000000000000 RBX: ffff880277b83e90 RCX: 0000000000000000
<4> [809.272852] RDX: 0000000000000007 RSI: ffffffff820c6458 RDI: 00000000ffffffff
<4> [809.272904] RBP: ffffffff822478a0 R08: 0000000000000000 R09: 0000000000000000
<4> [809.272957] R10: ffff880277b83df8 R11: ffffffff822459f8 R12: ffffffff820d70fe
<4> [809.273010] R13: ffffffff836e10c8 R14: 0000000000000206 R15: ffffffff8110d8fe
<4> [809.273064] FS:  0000000000000000(0000) GS:ffff880277b80000(0000) knlGS:0000000000000000
<4> [809.273123] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [809.273167] CR2: 00007fa787ee8fd8 CR3: 00000002709aa000 CR4: 00000000003406e0
<4> [809.273220] Call Trace:
<4> [809.273242]  <IRQ>
<4> [809.273264]  debug_object_active_state+0x12c/0x160
<4> [809.273363]  ? i915_gem_free_object+0x110/0x110 [i915]
<4> [809.273405]  ? rcu_process_callbacks+0x3c0/0x850
<4> [809.273443]  rcu_process_callbacks+0x3c0/0x850
<4> [809.273483]  __do_softirq+0xd8/0x4b9
<4> [809.273517]  irq_exit+0xa9/0xc0
<4> [809.273545]  smp_apic_timer_interrupt+0x9c/0x250
<4> [809.273583]  apic_timer_interrupt+0xf/0x20
<4> [809.273616]  </IRQ>
<4> [809.273638] RIP: 0010:default_idle+0x17/0x170
<4> [809.273673] Code: ea 48 89 df 31 f6 5b 5d e9 96 8c c5 ff 90 90 90 90 90 90 41 55 41 54 55 53 e8 e5 af b4 ff 0f 1f 44 00 00 e8 bb 1e 81 ff fb f4 <e8> d4 af b4 ff 89 c5 0f 1f 44 00 00 5b 5d 41 5c 41 5d c3 65 8b 05
<4> [809.273803] RSP: 0018:ffffc90000097ec0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
<4> [809.273860] RAX: ffff880276b22740 RBX: 0000000000000003 RCX: 0000000000000000
<4> [809.273913] RDX: 0000000000000046 RSI: ffffffff8212883a RDI: ffffffff820d6d27
<4> [809.273966] RBP: ffffffff823364f0 R08: 0000000000000001 R09: 0000000000000000
<4> [809.274019] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4> [809.274072] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4> [809.274133]  do_idle+0x1b7/0x260
<4> [809.274164]  cpu_startup_entry+0x14/0x20
<4> [809.274198]  start_secondary+0x19d/0x1f0
<4> [809.274232]  secondary_startup_64+0xa4/0xb0
<4> [809.274272] irq event stamp: 6571677
<4> [809.274304] hardirqs last  enabled at (6571676): [<ffffffff81953d2c>] _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [809.274376] hardirqs last disabled at (6571677): [<ffffffff81953b9d>] _raw_spin_lock_irqsave+0xd/0x50
<4> [809.274444] softirqs last  enabled at (6571124): [<ffffffff8108de6b>] irq_enter+0x6b/0x80
<4> [809.274505] softirqs last disabled at (6571125): [<ffffffff8108df29>] irq_exit+0xa9/0xc0
<4> [809.274566] WARNING: CPU: 3 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x67/0x90
<4> [809.274626] ---[ end trace 3bb17ca0421342a8 ]---
<4> [809.274674] 
<4> [809.274675] ======================================================
<4> [809.274676] WARNING: possible circular locking dependency detected
<4> [809.274677] 4.20.0-rc1-CI-CI_DRM_5097+ #1 Tainted: G     U           
<4> [809.274677] ------------------------------------------------------
<4> [809.274678] swapper/3/0 is trying to acquire lock:
<4> [809.274679] 0000000020a8dad2 ((console_sem).lock){-.-.}, at: down_trylock+0xa/0x30
<4> [809.274681] 
<4> [809.274682] but task is already holding lock:
<4> [809.274683] 00000000e8bedec7 (&obj_hash[i].lock){-.-.}, at: debug_object_active_state+0x97/0x160
<4> [809.274685] 
<4> [809.274686] which lock already depends on the new lock.
<4> [809.274687] 
<4> [809.274687] 
<4> [809.274688] the existing dependency chain (in reverse order) is:
<4> [809.274688] 
<4> [809.274689] -> #3 (&obj_hash[i].lock){-.-.}:
<4> [809.274691]        __debug_object_init+0x74/0x560
<4> [809.274692]        hrtimer_init+0x1b/0x100
<4> [809.274693]        init_dl_task_timer+0x17/0x30
<4> [809.274693]        __sched_fork.isra.16+0xa4/0xf0
<4> [809.274694]        init_idle+0x53/0x270
<4> [809.274694]        sched_init+0x421/0x471
<4> [809.274695]        start_kernel+0x272/0x4c9
<4> [809.274696]        secondary_startup_64+0xa4/0xb0
<4> [809.274696] 
<4> [809.274697] -> #2 (&rq->lock){-.-.}:
<4> [809.274699]        task_fork_fair+0x36/0x160
<4> [809.274700]        sched_fork+0x118/0x220
<4> [809.274700]        copy_process.part.6+0x7d0/0x2250
<4> [809.274701]        _do_fork+0xe2/0x6b0
<4> [809.274702]        kernel_thread+0x20/0x30
<4> [809.274702]        rest_init+0x1d/0x250
<4> [809.274703]        start_kernel+0x4a9/0x4c9
<4> [809.274703]        secondary_startup_64+0xa4/0xb0
<4> [809.274704] 
<4> [809.274704] -> #1 (&p->pi_lock){-.-.}:
<4> [809.274707]        try_to_wake_up+0x38/0x610
<4> [809.274707]        up+0x3b/0x50
<4> [809.274708]        __up_console_sem+0x2e/0x50
<4> [809.274709]        console_unlock+0x30b/0x5f0
<4> [809.274709]        con_font_op+0x31c/0x480
<4> [809.274710]        vt_ioctl+0x8a7/0x11d0
<4> [809.274711]        tty_ioctl+0xee/0x940
<4> [809.274711]        do_vfs_ioctl+0xa0/0x6e0
<4> [809.274712]        ksys_ioctl+0x35/0x60
<4> [809.274712]        __x64_sys_ioctl+0x11/0x20
<4> [809.274713]        do_syscall_64+0x55/0x190
<4> [809.274714]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [809.274714] 
<4> [809.274715] -> #0 ((console_sem).lock){-.-.}:
<4> [809.274717]        _raw_spin_lock_irqsave+0x33/0x50
<4> [809.274718]        down_trylock+0xa/0x30
<4> [809.274718]        __down_trylock_console_sem+0x20/0x80
<4> [809.274719]        console_trylock+0xe/0x60
<4> [809.274720]        vprintk_emit+0xf1/0x320
<4> [809.274720]        printk+0x4d/0x69
<4> [809.274721]        __warn_printk+0x46/0x90
<4> [809.274722]        debug_print_object+0x67/0x90
<4> [809.274722]        debug_object_active_state+0x12c/0x160
<4> [809.274723]        rcu_process_callbacks+0x3c0/0x850
<4> [809.274724]        __do_softirq+0xd8/0x4b9
<4> [809.274724]        irq_exit+0xa9/0xc0
<4> [809.274725]        smp_apic_timer_interrupt+0x9c/0x250
<4> [809.274726]        apic_timer_interrupt+0xf/0x20
<4> [809.274726]        default_idle+0x17/0x170
<4> [809.274727]        do_idle+0x1b7/0x260
<4> [809.274727]        cpu_startup_entry+0x14/0x20
<4> [809.274728]        start_secondary+0x19d/0x1f0
<4> [809.274729]        secondary_startup_64+0xa4/0xb0
<4> [809.274729] 
<4> [809.274730] other info that might help us debug this:
<4> [809.274730] 
<4> [809.274731] Chain exists of:
<4> [809.274731]   (console_sem).lock --> &rq->lock --> &obj_hash[i].lock
<4> [809.274735] 
<4> [809.274735]  Possible unsafe locking scenario:
<4> [809.274736] 
<4> [809.274736]        CPU0                    CPU1
<4> [809.274737]        ----                    ----
<4> [809.274738]   lock(&obj_hash[i].lock);
<4> [809.274739]                                lock(&rq->lock);
<4> [809.274741]                                lock(&obj_hash[i].lock);
<4> [809.274742]   lock((console_sem).lock);
<4> [809.274744] 
<4> [809.274744]  *** DEADLOCK ***
<4> [809.274745] 
<4> [809.274746] 1 lock held by swapper/3/0:
<4> [809.274746]  #0: 00000000e8bedec7 (&obj_hash[i].lock){-.-.}, at: debug_object_active_state+0x97/0x160
<4> [809.274749] 
<4> [809.274750] stack backtrace:
<4> [809.274751] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G     U            4.20.0-rc1-CI-CI_DRM_5097+ #1
<4> [809.274752] Hardware name:  /NUC6CAYB, BIOS AYAPLCEL.86A.0050.2018.0521.1533 05/21/2018
<4> [809.274752] Call Trace:
<4> [809.274753]  <IRQ>
<4> [809.274753]  dump_stack+0x67/0x9b
<4> [809.274754]  print_circular_bug.isra.16+0x1c8/0x2b0
<4> [809.274755]  __lock_acquire+0x183a/0x1b00
<4> [809.274755]  ? lock_acquire+0xa6/0x1c0
<4> [809.274756]  lock_acquire+0xa6/0x1c0
<4> [809.274757]  ? down_trylock+0xa/0x30
<4> [809.274757]  ? vprintk_emit+0xf1/0x320
<4> [809.274758]  _raw_spin_lock_irqsave+0x33/0x50
<4> [809.274758]  ? down_trylock+0xa/0x30
<4> [809.274759]  down_trylock+0xa/0x30
<4> [809.274760]  __down_trylock_console_sem+0x20/0x80
<4> [809.274760]  console_trylock+0xe/0x60
<4> [809.274761]  vprintk_emit+0xf1/0x320
<4> [809.274761]  ? rcu_process_callbacks+0x3ae/0x850
<4> [809.274762]  printk+0x4d/0x69
<4> [809.274763]  __warn_printk+0x46/0x90
<4> [809.274763]  ? lock_acquire+0xa6/0x1c0
<4> [809.274764]  debug_print_object+0x67/0x90
<4> [809.274765]  debug_object_active_state+0x12c/0x160
<4> [809.274765]  ? i915_gem_free_object+0x110/0x110 [i915]
<4> [809.274766]  ? rcu_process_callbacks+0x3c0/0x850
<4> [809.274767]  rcu_process_callbacks+0x3c0/0x850
<4> [809.274767]  __do_softirq+0xd8/0x4b9
<4> [809.274768]  irq_exit+0xa9/0xc0
<4> [809.274768]  smp_apic_timer_interrupt+0x9c/0x250
<4> [809.274769]  apic_timer_interrupt+0xf/0x20
<4> [809.274770]  </IRQ>
<4> [809.274770] RIP: 0010:default_idle+0x17/0x170
<4> [809.274772] Code: ea 48 89 df 31 f6 5b 5d e9 96 8c c5 ff 90 90 90 90 90 90 41 55 41 54 55 53 e8 e5 af b4 ff 0f 1f 44 00 00 e8 bb 1e 81 ff fb f4 <e8> d4 af b4 ff 89 c5 0f 1f 44 00 00 5b 5d 41 5c 41 5d c3 65 8b 05
<4> [809.274772] RSP: 0018:ffffc90000097ec0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
<4> [809.274774] RAX: ffff880276b22740 RBX: 0000000000000003 RCX: 0000000000000000
<4> [809.274775] RDX: 0000000000000046 RSI: ffffffff8212883a RDI: ffffffff820d6d27
<4> [809.274775] RBP: ffffffff823364f0 R08: 0000000000000001 R09: 0000000000000000
<4> [809.274776] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4> [809.274777] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4> [809.274778]  do_idle+0x1b7/0x260
<4> [809.274778]  cpu_startup_entry+0x14/0x20
<4> [809.274779]  start_secondary+0x19d/0x1f0
<4> [809.274779]  secondary_startup_64+0xa4/0xb0
Comment 1 Martin Peres 2018-11-08 09:02:56 UTC
This may be a duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=108690.
Comment 2 Chris Wilson 2018-11-09 09:30:05 UTC
*** Bug 108673 has been marked as a duplicate of this bug. ***
Comment 3 Chris Wilson 2018-11-09 10:48:02 UTC
commit 8811d616dfaa8c6e1905a20ce0543ec401275997 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Nov 9 09:03:11 2018 +0000

    drm/i915: Initialise the obj->rcu head
    
    Make the rcu_head known to the system, in particular for debugobjects.
    And having declared it for debugobjects, we need to tidy up afterwards.
    
    v2: mark the obj->rcu as being destroyed when we reuse its location for
    the freed list.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108691
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181109090311.15321-1-chris@chris-wilson.co.uk
Comment 4 Martin Peres 2018-11-13 15:40:27 UTC
(In reply to Chris Wilson from comment #3)
> commit 8811d616dfaa8c6e1905a20ce0543ec401275997 (HEAD ->
> drm-intel-next-queued, drm-intel/for-linux-next,
> drm-intel/drm-intel-next-queued)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Nov 9 09:03:11 2018 +0000
> 
>     drm/i915: Initialise the obj->rcu head
>     
>     Make the rcu_head known to the system, in particular for debugobjects.
>     And having declared it for debugobjects, we need to tidy up afterwards.
>     
>     v2: mark the obj->rcu as being destroyed when we reuse its location for
>     the freed list.
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108691
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20181109090311.15321-1-
> chris@chris-wilson.co.uk

This indeed fixed it, thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.