108133 – [CI][DRMTIP] igt@gem_eio@wait-wedge-1us - dmesg-warn - GEM_BUG_ON(!intel_engine_is_idle(engine))

Bug 108133 - [CI][DRMTIP] igt@gem_eio@wait-wedge-1us - dmesg-warn - GEM_BUG_ON(!intel_engine_is_idle(engine))

Summary: [CI][DRMTIP] igt@gem_eio@wait-wedge-1us - dmesg-warn - GEM_BUG_ON(!intel_engi...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2018-10-02 15:48 UTC by Martin Peres
Modified:	2018-10-30 16:55 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	BSW/CHT
i915 features:	GEM/Other

Attachments

Description Martin Peres 2018-10-02 15:48:42 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_119/fi-bsw-n3050/igt@gem_eio@wait-wedge-1us.html

<5>[  130.400495] Setting dangerous option reset - tainting kernel
<7>[  130.403639] [drm:i915_reset_device [i915]] resetting chip
<5>[  130.406083] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<3>[  130.415829] reset_all_global_seqno:232 GEM_BUG_ON(!intel_engine_is_idle(engine))
<4>[  130.416213] ------------[ cut here ]------------
<2>[  130.416220] kernel BUG at drivers/gpu/drm/i915/i915_request.c:232!
<4>[  130.416288] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4>[  130.416308] CPU: 1 PID: 2175 Comm: gem_eio Tainted: G     U            4.19.0-rc4-gd3bc4f8ea48e-drmtip_119+ #1
<4>[  130.416336] Hardware name:  /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016
<4>[  130.416464] RIP: 0010:reset_all_global_seqno.part.5+0x1c5/0x260 [i915]
<4>[  130.416484] Code: 8a 85 c1 cf 48 8b 35 62 dd 1b 00 49 c7 c0 f6 cb 5e c0 b9 e8 00 00 00 48 c7 c2 e0 32 5d c0 48 c7 c7 f0 57 4e c0 e8 fb 13 c8 cf <0f> 0b 48 c7 c1 90 6c 60 c0 ba e9 00 00 00 48 c7 c6 e0 32 5d c0 48
<4>[  130.416531] RSP: 0018:ffffa5e380273d70 EFLAGS: 00010282
<4>[  130.416550] RAX: 000000000000000f RBX: ffff97f56d368008 RCX: 0000000000000000
<4>[  130.416570] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff97f57b1d3a78
<4>[  130.416591] RBP: ffff97f566b477c8 R08: 0000000000011d90 R09: ffff97f57b243000
<4>[  130.416611] R10: 0000000000000000 R11: ffff97f57b1d3a78 R12: 0000000000000000
<4>[  130.416631] R13: ffff97f566b40000 R14: ffff97f566b477d8 R15: ffffffffc04e56ac
<4>[  130.416652] FS:  00007f9631ac4980(0000) GS:ffff97f57bb00000(0000) knlGS:0000000000000000
<4>[  130.416675] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  130.416692] CR2: 0000556ca01cad98 CR3: 00000001770f8000 CR4: 00000000001006e0
<4>[  130.416713] Call Trace:
<4>[  130.416800]  i915_drop_caches_set+0x14d/0x260 [i915]
<4>[  130.416826]  simple_attr_write+0xb0/0xd0
<4>[  130.416846]  full_proxy_write+0x51/0x80
<4>[  130.416864]  __vfs_write+0x31/0x180
<4>[  130.416881]  ? rcu_lockdep_current_cpu_online+0x8f/0xd0
<4>[  130.416899]  ? rcu_read_lock_sched_held+0x6f/0x80
<4>[  130.416916]  ? rcu_sync_lockdep_assert+0x29/0x50
<4>[  130.416932]  ? __sb_start_write+0x152/0x1f0
<4>[  130.416947]  ? __sb_start_write+0x168/0x1f0
<4>[  130.416964]  vfs_write+0xbd/0x1b0
<4>[  130.416979]  ksys_write+0x50/0xc0
<4>[  130.416997]  do_syscall_64+0x55/0x190
<4>[  130.417014]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  130.417032] RIP: 0033:0x7f96310472b7
<4>[  130.417046] Code: 44 00 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 83 ec 10 e8 5b fd ff ff 4c 89 e2 41 89 c0 48 89 ee 89 df b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 94 fd ff ff 48
<4>[  130.417093] RSP: 002b:00007ffe0b2a8140 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
<4>[  130.417118] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f96310472b7
<4>[  130.417138] RDX: 0000000000000005 RSI: 00007ffe0b2a81f0 RDI: 0000000000000008
<4>[  130.417159] RBP: 00007ffe0b2a81f0 R08: 0000000000000000 R09: 0000000000000000
<4>[  130.417179] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000005
<4>[  130.417200] R13: 0000000000000003 R14: 00007f9631035628 R15: 00007f9631031d80

Comment 1 Chris Wilson 2018-10-02 15:59:35 UTC

It looks pretty idle, so why did it freak out? Could do with dumping the engine at that point before the BUG_ON.

Comment 2 Chris Wilson 2018-10-05 11:58:38 UTC

I am not convinced, but

commit 88a83f3c2d7a87ce7c9c4171dec8e2fb48070288
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Oct 4 09:21:19 2018 +0100

    drm/i915: Only reset seqno if actually idle
    
    Before we can reset the seqno, we have to be sure the engines are idle.
    In debugfs/i915_drop_caches_set, we do wait_for_idle but allow ourselves
    to be interrupted. We should only proceed to reset the seqno then if we
    were not interrupted, and so also avoid overwriting the error status.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=108133
    Fixes: 6b048706f407 ("drm/i915: Forcibly flush unwanted requests in drop-caches")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181004082119.24970-1-chris@chris-wilson.co.uk

may have an effect. Otherwise this bug is too weird for me.

Comment 3 Lakshmi 2018-10-23 13:01:51 UTC

Update: This issue occurred only once with drmtip_119 (4 weeks / 377 runs ago). Not seen there after.

Comment 4 Martin Peres 2018-10-30 16:55:55 UTC

(In reply to Chris Wilson from comment #2)
> I am not convinced, but
> 
> commit 88a83f3c2d7a87ce7c9c4171dec8e2fb48070288
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Oct 4 09:21:19 2018 +0100
> 
>     drm/i915: Only reset seqno if actually idle
>     
>     Before we can reset the seqno, we have to be sure the engines are idle.
>     In debugfs/i915_drop_caches_set, we do wait_for_idle but allow ourselves
>     to be interrupted. We should only proceed to reset the seqno then if we
>     were not interrupted, and so also avoid overwriting the error status.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=108133
>     Fixes: 6b048706f407 ("drm/i915: Forcibly flush unwanted requests in
> drop-caches")
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>     Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20181004082119.24970-1-
> chris@chris-wilson.co.uk
> 
> may have an effect. Otherwise this bug is too weird for me.

It's been a month, and it is a drmtip run. Let's close this bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.