Bug 106003 - [CI] igt@gem_eio@in-flight-internal - dmesg-warn - WARNING: bad unlock balance detected!
Summary: [CI] igt@gem_eio@in-flight-internal - dmesg-warn - WARNING: bad unlock balanc...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-12 12:12 UTC by Martin Peres
Modified: 2018-04-20 13:01 UTC (History)
1 user (show)

See Also:
i915 platform: I915G, I965G, PNV
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-04-12 12:12:35 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-gdg-551/igt@gem_eio@in-flight-internal.html

[ 241.790215] =====================================
[ 241.790220] WARNING: bad unlock balance detected!
[ 241.790227] 4.16.0-rc6-g94f5d9189e61-drmtip_7+ #1 Tainted: G U 
[ 241.790233] -------------------------------------
[ 241.790239] kworker/0:1/17 is trying to release lock (crtc_ww_class_acquire) at:
[ 241.790256] [<ffffffffb15c7360>] drm_modeset_acquire_fini+0x0/0x60
[ 241.790261] but there are no more locks to release!
[ 241.790266] 
other info that might help us debug this:
[ 241.790274] 2 locks held by kworker/0:1/17:
[ 241.790278] #0: ((wq_completion)"events_long"){+.+.}, at: [<000000002f5bee42>] process_one_work+0x19e/0x680
[ 241.790294] #1: ((work_completion)(&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<000000002f5bee42>] process_one_work+0x19e/0x680
[ 241.790309] 
stack backtrace:
[ 241.790316] CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G U 4.16.0-rc6-g94f5d9189e61-drmtip_7+ #1
[ 241.790325] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005
[ 241.790455] Workqueue: events_long i915_hangcheck_elapsed [i915]
[ 241.790464] Call Trace:
[ 241.790474] dump_stack+0x67/0x95
[ 241.790482] ? drm_modeset_backoff+0x1b0/0x1b0
[ 241.790490] print_unlock_imbalance_bug+0xd2/0xe0
[ 241.790498] ? drm_modeset_backoff+0x1b0/0x1b0
[ 241.790504] lock_release+0x23e/0x300
[ 241.790512] drm_modeset_acquire_fini+0x16/0x60
[ 241.790589] intel_finish_reset+0x72/0x160 [i915]
[ 241.790647] i915_reset_device+0x1e9/0x240 [i915]
[ 241.790705] ? __intel_get_crtc_scanline+0x1c0/0x1c0 [i915]
[ 241.790713] ? work_on_cpu_safe+0x50/0x50
[ 241.790771] i915_handle_error+0x1f2/0x470 [i915]
[ 241.790782] ? scnprintf+0x3a/0x70
[ 241.790855] i915_hangcheck_elapsed+0x313/0x4a0 [i915]
[ 241.790865] process_one_work+0x224/0x680
[ 241.790873] worker_thread+0x35/0x380
[ 241.790880] ? process_one_work+0x680/0x680
[ 241.790887] kthread+0x119/0x130
[ 241.790893] ? _kthread_create_on_node+0x30/0x30
[ 241.790901] ret_from_fork+0x3a/0x50
[ 241.815931] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
[ 241.911720] Setting dangerous option reset - tainting kernel
[ 241.928002] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff

This is from ~2 weeks ago, so it may have been fixed already!
Comment 2 Chris Wilson 2018-04-12 12:37:38 UTC
commit 40da1d310e8a155a09cd64b5b3fd3548a9dcfda0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Apr 5 13:37:14 2018 +0100

    drm/i915: Only call finish_reset after a prepare_reset
    
    If we skip the intel_prepare_reset(), we should also skip the
    intel_display_reset(). If we we use a flag set by intel_prepare_reset()
    then we do not have to second guess based on external user controlled
    state whether or not the prepare was called before deciding to finish
    it after the reset. igt/gem_eio is one such example that may tweak
    i915.reset faster than the code is expecting, leading to
    
    [  190.233528] =====================================
    [  190.233534] WARNING: bad unlock balance detected!
    [  190.233540] 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 Tainted: G     U
    [  190.233547] -------------------------------------
    [  190.233553] gem_eio/1348 is trying to release lock (crtc_ww_class_acquire) at:
    [  190.233569] [<ffffffff895c7810>] drm_modeset_acquire_fini+0x0/0x60
    [  190.233575] but there are no more locks to release!
    [  190.233580]
                   other info that might help us debug this:
    [  190.233588] 3 locks held by gem_eio/1348:
    [  190.233592]  #0:  (&f->f_pos_lock){+.+.}, at: [<00000000ab90c784>] __fdget_pos+0x3a/0x50
    [  190.233607]  #1:  (sb_writers#11){.+.+}, at: [<00000000e1529265>] vfs_write+0x188/0x1a0
    [  190.233622]  #2:  (&attr->mutex){+.+.}, at: [<0000000011f40afe>] simple_attr_write+0x36/0xd0
    [  190.233635]
                   stack backtrace:
    [  190.233644] CPU: 0 PID: 1348 Comm: gem_eio Tainted: G     U           4.16.0-rc7-g335ef9849310-drmtip_10+ #1
    [  190.233655] Hardware name: Dell Inc.                 OptiPlex GX280               /0G8310, BIOS A04 02/09/2005
    [  190.233664] Call Trace:
    [  190.233674]  dump_stack+0x67/0x95
    [  190.233682]  ? drm_modeset_backoff+0x1b0/0x1b0
    [  190.233690]  print_unlock_imbalance_bug+0xd2/0xe0
    [  190.233698]  ? drm_modeset_backoff+0x1b0/0x1b0
    [  190.233704]  lock_release+0x23e/0x300
    [  190.233712]  drm_modeset_acquire_fini+0x16/0x60
    [  190.233835]  intel_finish_reset+0x72/0x160 [i915]
    [  190.233894]  i915_reset_device+0x1e9/0x240 [i915]
    [  190.233953]  ? __intel_get_crtc_scanline+0x1c0/0x1c0 [i915]
    [  190.233962]  ? work_on_cpu_safe+0x50/0x50
    [  190.234020]  i915_handle_error+0x1f2/0x470 [i915]
    [  190.234031]  ? __might_fault+0x39/0x90
    [  190.234037]  ? __might_fault+0x39/0x90
    [  190.234099]  i915_wedged_set+0x7f/0xc0 [i915]
    [  190.234107]  simple_attr_write+0xb0/0xd0
    [  190.234117]  full_proxy_write+0x51/0x80
    [  190.234125]  __vfs_write+0x21/0x140
    [  190.234133]  ? rcu_read_lock_sched_held+0x6f/0x80
    [  190.234140]  ? rcu_sync_lockdep_assert+0x29/0x50
    [  190.234147]  ? __sb_start_write+0x152/0x1f0
    [  190.234152]  ? __sb_start_write+0x168/0x1f0
    [  190.234159]  vfs_write+0xbd/0x1a0
    [  190.234166]  SyS_write+0x40/0xa0
    [  190.234173]  ? do_syscall_64+0x19/0x1b0
    [  190.234180]  do_syscall_64+0x6b/0x1b0
    [  190.234188]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
    [  190.234196] RIP: 0033:0x7f84c1b392b7
    [  190.234201] RSP: 002b:00007f84b6755b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
    [  190.234211] RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f84c1b392b7
    [  190.234218] RDX: 0000000000000002 RSI: 000055ec20abc8d6 RDI: 0000000000000046
    [  190.234225] RBP: 000055ec20abc8d6 R08: 0000000000000000 R09: 0000000000000000
    [  190.234231] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002
    [  190.234238] R13: 0000000000000000 R14: 00007f84b0000b20 R15: 000055ec20ce4eb8
    
    Testcase: igt/gem_eio
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180405123714.3638-1-chris@chris-wilson.co.uk
Comment 3 Jani Saarinen 2018-04-20 10:51:42 UTC
Was this fixed, can you see it Martin?
Comment 4 Martin Peres 2018-04-20 13:01:23 UTC
(In reply to Jani Saarinen from comment #3)
> Was this fixed, can you see it Martin?

Yes, we have not seen it in 10 runs when it used to be reproducible in all runs, closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.