Bug 112222 - [CI][BAT]igt@i915_selftest@live_hugepages - dmesg-warn - i915_request leak
Summary: [CI][BAT]igt@i915_selftest@live_hugepages - dmesg-warn - i915_request leak
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: not set not set
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-07 06:32 UTC by Lakshmi
Modified: 2019-11-11 15:25 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lakshmi 2019-11-07 06:32:00 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7273/fi-icl-u3/igt@i915_selftest@live_hugepages.html
<4> [538.454227] ============================================
<4> [538.454229] WARNING: possible recursive locking detected
<4> [538.454232] 5.4.0-rc6-CI-CI_DRM_7273+ #1 Tainted: G     U           
<4> [538.454234] --------------------------------------------
<4> [538.454236] i915_selftest/5387 is trying to acquire lock:
<4> [538.454238] ffff88849e003e58 (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc.constprop.90+0x122/0x7a0
<4> [538.454248] 
but task is already holding lock:
<4> [538.454250] ffff888498269258 (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x74/0x3d0
<4> [538.454253] 
other info that might help us debug this:
<4> [538.454255]  Possible unsafe locking scenario:

<4> [538.454257]        CPU0
<4> [538.454258]        ----
<4> [538.454259]   lock(&(&n->list_lock)->rlock);
<4> [538.454261]   lock(&(&n->list_lock)->rlock);
<4> [538.454263] 
 *** DEADLOCK ***

<4> [538.454265]  May be due to missing lock nesting notation

<4> [538.454267] 3 locks held by i915_selftest/5387:
<4> [538.454269]  #0: ffffffff8223b670 (cpu_hotplug_lock.rw_sem){++++}, at: kmem_cache_destroy+0x5b/0x290
<4> [538.454274]  #1: ffffffff8225e7c0 (slab_mutex){+.+.}, at: kmem_cache_destroy+0x69/0x290
<4> [538.454278]  #2: ffff888498269258 (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x74/0x3d0
<4> [538.454281] 
stack backtrace:
<4> [538.454285] CPU: 0 PID: 5387 Comm: i915_selftest Tainted: G     U            5.4.0-rc6-CI-CI_DRM_7273+ #1
<4> [538.454287] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [538.454289] Call Trace:
<4> [538.454296]  dump_stack+0x67/0x9b
<4> [538.454301]  __lock_acquire+0x10b3/0x15d0
<4> [538.454307]  lock_acquire+0xa7/0x1c0
<4> [538.454311]  ? ___slab_alloc.constprop.90+0x122/0x7a0
<4> [538.454314]  ? __kmem_cache_shutdown+0x155/0x3d0
<4> [538.454318]  _raw_spin_lock+0x2a/0x40
<4> [538.454322]  ? ___slab_alloc.constprop.90+0x122/0x7a0
<4> [538.454326]  ___slab_alloc.constprop.90+0x122/0x7a0
<4> [538.454329]  ? __kmem_cache_shutdown+0x155/0x3d0
<4> [538.454333]  ? cpumask_next_and+0x19/0x20
<4> [538.454336]  ? smp_call_function_many+0x60/0x250
<4> [538.454339]  ? __kmem_cache_shutdown+0x155/0x3d0
<4> [538.454341]  ? __kmem_cache_shutdown+0x155/0x3d0
<4> [538.454345]  ? __slab_alloc.isra.84.constprop.89+0x3d/0x70
<4> [538.454348]  __slab_alloc.isra.84.constprop.89+0x3d/0x70
<4> [538.454352]  __kmalloc+0x2b1/0x320
<4> [538.454355]  __kmem_cache_shutdown+0x155/0x3d0
<4> [538.454358]  ? kmem_cache_destroy+0x69/0x290
<4> [538.454362]  shutdown_cache+0x10/0x1b0
<4> [538.454365]  kmem_cache_destroy+0x27a/0x290
<4> [538.454466]  i915_globals_exit+0x30/0xf47 [i915]
<4> [538.454472]  __x64_sys_delete_module+0x115/0x1d0
<4> [538.454476]  ? trace_hardirqs_off_thunk+0x1a/0x20
<4> [538.454479]  ? do_syscall_64+0xd/0x210
<4> [538.454481]  do_syscall_64+0x4f/0x210
<4> [538.454485]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [538.454488] RIP: 0033:0x7f88121171b7
<4> [538.454492] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
<4> [538.454495] RSP: 002b:00007ffc7e67a408 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
<4> [538.454498] RAX: ffffffffffffffda RBX: 00000000ffffffe7 RCX: 00007f88121171b7
<4> [538.454500] RDX: 000055973e076010 RSI: 0000000000000800 RDI: 000055973e0ab828
<4> [538.454502] RBP: 000055973e0ab7c0 R08: 00007f8812a5b300 R09: 0000000000000000
<4> [538.454504] R10: 00007f8812193cc0 R11: 0000000000000206 R12: 00007ffc7e67a450
<4> [538.454505] R13: 00007f8812a5b280 R14: 000055973e0ad000 R15: 0000000000000000
<3> [538.454519] =============================================================================
<3> [538.454523] BUG i915_request (Tainted: G     U           ): Objects remaining in i915_request on __kmem_cache_shutdown()
<3> [538.454526] -----------------------------------------------------------------------------

<3> [538.454530] INFO: Slab 0x00000000a190acd8 objects=28 used=1 fp=0x00000000b5ba6258 flags=0x8000000000010200
<4> [538.454534] CPU: 0 PID: 5387 Comm: i915_selftest Tainted: G    BU            5.4.0-rc6-CI-CI_DRM_7273+ #1
<4> [538.454537] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [538.454540] Call Trace:
<4> [538.454545]  dump_stack+0x67/0x9b
<4> [538.454551]  slab_err+0xa8/0xd0
<4> [538.454559]  ? __slab_alloc.isra.84.constprop.89+0x62/0x70
<4> [538.454564]  ? __kmalloc+0x248/0x320
<4> [538.454569]  __kmem_cache_shutdown+0x173/0x3d0
<4> [538.454574]  ? kmem_cache_destroy+0x69/0x290
<4> [538.454580]  shutdown_cache+0x10/0x1b0
<4> [538.454585]  kmem_cache_destroy+0x27a/0x290
<4> [538.454677]  i915_globals_exit+0x30/0xf47 [i915]
<4> [538.454683]  __x64_sys_delete_module+0x115/0x1d0
<4> [538.454688]  ? trace_hardirqs_off_thunk+0x1a/0x20
<4> [538.454692]  ? do_syscall_64+0xd/0x210
<4> [538.454696]  do_syscall_64+0x4f/0x210
<4> [538.454700]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [538.454704] RIP: 0033:0x7f88121171b7
<4> [538.454707] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
<4> [538.454711] RSP: 002b:00007ffc7e67a408 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
<4> [538.454714] RAX: ffffffffffffffda RBX: 00000000ffffffe7 RCX: 00007f88121171b7
<4> [538.454717] RDX: 000055973e076010 RSI: 0000000000000800 RDI: 000055973e0ab828
<4> [538.454720] RBP: 000055973e0ab7c0 R08: 00007f8812a5b300 R09: 0000000000000000
<4> [538.454723] R10: 00007f8812193cc0 R11: 0000000000000206 R12: 00007ffc7e67a450
<4> [538.454725] R13: 00007f8812a5b280 R14: 000055973e0ad000 R15: 0000000000000000
<3> [538.454738] INFO: Object 0x0000000062b0583e @offset=24256
<3> [538.454834] INFO: Allocated in __i915_request_create+0x5a/0x4a0 [i915] age=6253 cpu=6 pid=2919
<3> [538.454840] 	__slab_alloc.isra.84.constprop.89+0x3d/0x70
<3> [538.454844] 	kmem_cache_alloc+0x24a/0x2a0
<3> [538.454937] 	__i915_request_create+0x5a/0x4a0 [i915]
<3> [538.455005] 	heartbeat+0x2f4/0x600 [i915]
<3> [538.455011] 	process_one_work+0x26a/0x620
<3> [538.455014] 	worker_thread+0x37/0x380
<3> [538.455019] 	kthread+0x119/0x130
<3> [538.455022] 	ret_from_fork+0x24/0x50
<3> [538.455123] INFO: Freed in i915_request_retire+0x490/0x8c0 [i915] age=7277 cpu=5 pid=5377
<3> [538.455128] 	kmem_cache_free+0x31f/0x390
<3> [538.455165] 	i915_request_retire+0x490/0x8c0 [i915]
<3> [538.455165] 	intel_gt_retire_requests_timeout+0x173/0x540 [i915]
<3> [538.455165] 	retire_work_handler+0x12/0x40 [i915]
<3> [538.455165] 	process_one_work+0x26a/0x620
<3> [538.455165] 	worker_thread+0x37/0x380
<3> [538.455165] 	kthread+0x119/0x130
<3> [538.455165] 	ret_from_fork+0x24/0x50
<3> [538.455491] kmem_cache_destroy i915_request: Slab cache still has objects
<4> [538.455495] CPU: 0 PID: 5387 Comm: i915_selftest Tainted: G    BU            5.4.0-rc6-CI-CI_DRM_7273+ #1
<4> [538.455499] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [538.455501] Call Trace:
<4> [538.455507]  dump_stack+0x67/0x9b
<4> [538.455512]  kmem_cache_destroy+0x223/0x290
<4> [538.455639]  i915_globals_exit+0x30/0xf47 [i915]
<4> [538.455654]  __x64_sys_delete_module+0x115/0x1d0
<4> [538.455663]  ? trace_hardirqs_off_thunk+0x1a/0x20
<4> [538.455671]  ? do_syscall_64+0xd/0x210
<4> [538.455679]  do_syscall_64+0x4f/0x210
<4> [538.455687]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [538.455693] RIP: 0033:0x7f88121171b7
<4> [538.455700] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
<4> [538.455707] RSP: 002b:00007ffc7e67a408 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
<4> [538.455717] RAX: ffffffffffffffda RBX: 00000000ffffffe7 RCX: 00007f88121171b7
<4> [538.455723] RDX: 000055973e076010 RSI: 0000000000000800 RDI: 000055973e0ab828
<4> [538.455730] RBP: 000055973e0ab7c0 R08: 00007f8812a5b300 R09: 0000000000000000
<4> [538.455734] R10: 00007f8812193cc0 R11: 0000000000000206 R12: 00007ffc7e67a450
<4> [538.455737] R13: 00007f8812a5b280 R14: 000055973e0ad000 R15: 0000000000000000
<6> [538.603799] [IGT] i915_selftest: exiting, ret=0
Comment 2 Chris Wilson 2019-11-07 07:15:16 UTC
The heartbeat didn't quite cleanup after itself.
Comment 3 Chris Wilson 2019-11-07 13:11:45 UTC
commit 3466a3def2c01e0ee3e2035f2dc988816cfab460
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Nov 6 22:34:10 2019 +0000

    drm/i915/gt: Cleanup heartbeat systole first
    
    Before we grab the engine wakeref, tidy up the previous heartbeat
    request. If we then abort because the engine powerwell is off, we ensure
    the request is freed as we know we will not have freed it when
    cancelling the work (as the work is running!).
    
    Fixes: 841e86728615 ("drm/i915/gt: Only drop heartbeat.systole if the sole owner")
    References: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20191106223410.30334-1-chris@chris-wilson.co.uk

hopefully that should close the leak.
Comment 4 CI Bug Log 2019-11-11 15:25:07 UTC
A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@i915_selftest@live_hugepages - dmesg-warn - WARNING: possible recursive locking detected, acquire lock:slab_alloc.constprop, holding lock: kmem_cache_shutdown -}
{+ ICL: igt@i915_selftest@live_hugepages - dmesg-warn - WARNING: possible recursive locking detected, acquire lock:slab_alloc.constprop, holding lock: kmem_cache_shutdown +}


  No new failures caught with the new filter


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.