110824 – [CI][SHARDS] igt@runner@aborted - fail - GEM_BUG_ON(intel_context_is_pinned(ce))

Bug 110824 - [CI][SHARDS] igt@runner@aborted - fail - GEM_BUG_ON(intel_context_is_pinned(ce))

Summary: [CI][SHARDS] igt@runner@aborted - fail - GEM_BUG_ON(intel_context_is_pinned(ce))

Status:	RESOLVED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	high normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2019-06-03 11:45 UTC by Martin Peres
Modified:	2019-07-02 11:29 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	SNB
i915 features:	GEM/Other

Attachments

Description Martin Peres 2019-06-03 11:45:55 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6170/fi-snb-2600/igt@runner@aborted.html

<7>[  216.022345] [drm:i915_gem_init_ggtt [i915]] clearing unused GTT space: [1000, 80000000]
<7>[  216.028119] [drm:i915_gem_contexts_init [i915]] logical context support initialized
<7>[  216.028860] [drm:intel_init_gt_powersave [i915]] Overclocking supported, max: 1350MHz, overclock: 1350MHz
<7>[  216.029849] [drm:i915_gem_init [i915]] Wrong MCH_SSKPD value: 0x16040307 This can cause underruns.
<3>[  216.232923] i915 0000:00:02.0: Failed to idle engines, declaring wedged!
<0>[  216.232986] Dumping ftrace buffer:
<0>[  216.233031] ---------------------------------
[...]
<0>[  216.282755] ---------------------------------
<7>[  216.282911] __i915_gem_set_wedged rcs0
<7>[  216.282939] __i915_gem_set_wedged 	Awake? 1
<7>[  216.282943] __i915_gem_set_wedged 	Hangcheck: 254 ms ago
<7>[  216.282947] __i915_gem_set_wedged 	Reset count: 0 (global 0)
<7>[  216.282954] __i915_gem_set_wedged 	Requests:
<7>[  216.282967] __i915_gem_set_wedged 		first   6f:2!  @ 253ms: [i915]
<7>[  216.282972] __i915_gem_set_wedged 		last    6f:2!  @ 253ms: [i915]
<7>[  216.282986] __i915_gem_set_wedged 	CCID: 0x7fff810d
<7>[  216.282989] __i915_gem_set_wedged 	RING_START: 0x00001000
<7>[  216.282991] __i915_gem_set_wedged 	RING_HEAD:  0x000002e8
<7>[  216.283004] __i915_gem_set_wedged 	RING_TAIL:  0x000002e8
<7>[  216.283008] __i915_gem_set_wedged 	RING_CTL:   0x0001f001
<7>[  216.283011] __i915_gem_set_wedged 	RING_MODE:  0x00004240 [idle]
<7>[  216.283014] __i915_gem_set_wedged 	RING_IMR: ffffffff
<7>[  216.283017] __i915_gem_set_wedged 	ACTHD:  0x00000000_000002e8
<7>[  216.283019] __i915_gem_set_wedged 	BBADDR: 0x00000000_7fff71f0
<7>[  216.283022] __i915_gem_set_wedged 	DMA_FADDR: 0x00000000_000012e8
<7>[  216.283025] __i915_gem_set_wedged 	IPEIR: 0x00000000
<7>[  216.283027] __i915_gem_set_wedged 	IPEHR: 0x01000000
<7>[  216.283040] __i915_gem_set_wedged 		E  6f:2!  @ 253ms: [i915]
<7>[  216.283051] __i915_gem_set_wedged HWSP:
<7>[  216.283055] __i915_gem_set_wedged [0000] 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7>[  216.283057] __i915_gem_set_wedged [0020] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7>[  216.283059] __i915_gem_set_wedged *
<7>[  216.283062] __i915_gem_set_wedged [0100] 00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7>[  216.283064] __i915_gem_set_wedged [0120] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7>[  216.283066] __i915_gem_set_wedged *
<7>[  216.283070] __i915_gem_set_wedged Idle? yes
<7>[  216.283072] __i915_gem_set_wedged bcs0
<7>[  216.283074] __i915_gem_set_wedged 	Awake? 1
<7>[  216.283076] __i915_gem_set_wedged 	Hangcheck: 254 ms ago
<7>[  216.283078] __i915_gem_set_wedged 	Reset count: 0 (global 0)
<7>[  216.283080] __i915_gem_set_wedged 	Requests:
<7>[  216.283082] __i915_gem_set_wedged 		first   71:2!  @ 252ms: [i915]
<7>[  216.283085] __i915_gem_set_wedged 		last    71:2!  @ 252ms: [i915]
<7>[  216.283088] __i915_gem_set_wedged 	RING_START: 0x00021000
<7>[  216.283091] __i915_gem_set_wedged 	RING_HEAD:  0x000000f0
<7>[  216.283093] __i915_gem_set_wedged 	RING_TAIL:  0x000000f0
<7>[  216.283096] __i915_gem_set_wedged 	RING_CTL:   0x0001f001
<7>[  216.283099] __i915_gem_set_wedged 	RING_MODE:  0x00000200 [idle]
<7>[  216.283101] __i915_gem_set_wedged 	RING_IMR: ffffffff
<7>[  216.283104] __i915_gem_set_wedged 	ACTHD:  0x00000000_000000f0
<7>[  216.283107] __i915_gem_set_wedged 	BBADDR: 0x00000000_00000000
<7>[  216.283110] __i915_gem_set_wedged 	DMA_FADDR: 0x00000000_000210f0
<7>[  216.283112] __i915_gem_set_wedged 	IPEIR: 0x00000000
<7>[  216.283115] __i915_gem_set_wedged 	IPEHR: 0x01000000
<7>[  216.283118] __i915_gem_set_wedged 		E  71:2!  @ 252ms: [i915]
<7>[  216.283120] __i915_gem_set_wedged HWSP:
<7>[  216.283124] __i915_gem_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7>[  216.283126] __i915_gem_set_wedged *
<7>[  216.283129] __i915_gem_set_wedged [0100] 00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7>[  216.283132] __i915_gem_set_wedged [0120] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7>[  216.283134] __i915_gem_set_wedged *
<7>[  216.283138] __i915_gem_set_wedged Idle? yes
<7>[  216.283141] __i915_gem_set_wedged vcs0
<7>[  216.283143] __i915_gem_set_wedged 	Awake? 1
<7>[  216.283145] __i915_gem_set_wedged 	Hangcheck: 253 ms ago
<7>[  216.283147] __i915_gem_set_wedged 	Reset count: 0 (global 0)
<7>[  216.283149] __i915_gem_set_wedged 	Requests:
<7>[  216.283152] __i915_gem_set_wedged 		first   73:2!  @ 252ms: [i915]
<7>[  216.283155] __i915_gem_set_wedged 		last    73:2!  @ 252ms: [i915]
<7>[  216.283158] __i915_gem_set_wedged 	RING_START: 0x00041000
<7>[  216.283161] __i915_gem_set_wedged 	RING_HEAD:  0x000001e0
<7>[  216.283163] __i915_gem_set_wedged 	RING_TAIL:  0x000000f0
<7>[  216.283166] __i915_gem_set_wedged 	RING_CTL:   0x0001f001
<7>[  216.283169] __i915_gem_set_wedged 	RING_MODE:  0x00000000
<7>[  216.283172] __i915_gem_set_wedged 	RING_IMR: ffffffff
<7>[  216.283174] __i915_gem_set_wedged 	ACTHD:  0x00000000_a02906e4
<7>[  216.283176] __i915_gem_set_wedged 	BBADDR: 0x00000000_a0291bdb
<7>[  216.283179] __i915_gem_set_wedged 	DMA_FADDR: 0x00000000_a0293200
<7>[  216.283181] __i915_gem_set_wedged 	IPEIR: 0x00000008
<7>[  216.283183] __i915_gem_set_wedged 	IPEHR: 0x00000000
<7>[  216.283186] __i915_gem_set_wedged 		E  73:2!  @ 252ms: [i915]
<7>[  216.283188] __i915_gem_set_wedged HWSP:
<7>[  216.283190] __i915_gem_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7>[  216.283192] __i915_gem_set_wedged *
<7>[  216.283194] __i915_gem_set_wedged [0100] 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7>[  216.283197] __i915_gem_set_wedged [0120] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7>[  216.283198] __i915_gem_set_wedged *
<7>[  216.283203] __i915_gem_set_wedged Idle? no
<3>[  216.283907] __intel_engines_record_defaults:1365 GEM_BUG_ON(intel_context_is_pinned(ce))
<4>[  216.283966] ------------[ cut here ]------------
<2>[  216.283969] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:1365!
<4>[  216.284002] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4>[  216.284013] CPU: 4 PID: 3337 Comm: i915_module_loa Tainted: G     U            5.2.0-rc2-CI-CI_DRM_6170+ #1
<4>[  216.284026] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
<4>[  216.284103] RIP: 0010:i915_gem_init+0xa53/0xac0 [i915]
<4>[  216.284113] Code: fc c7 bf e0 48 8b 35 24 11 1d 00 49 c7 c0 a9 89 66 a0 b9 55 05 00 00 48 c7 c2 40 47 61 a0 48 c7 c7 de c3 52 a0 e8 5d 8d c6 e0 <0f> 0b 48 c7 c1 c8 e2 63 a0 ba 14 01 00 00 48 c7 c6 10 48 61 a0 48
<4>[  216.284139] RSP: 0018:ffffc90000353a28 EFLAGS: 00010282
<4>[  216.284149] RAX: 0000000000000010 RBX: ffff88821cc00000 RCX: 0000000000000000
<4>[  216.284161] RDX: 0000000000000000 RSI: 0000000000000058 RDI: 0000000000000000
<4>[  216.284171] RBP: ffff88821cc00068 R08: ffffffffa06689a9 R09: 0000000000000000
<4>[  216.284182] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88821cc00d38
<4>[  216.284193] R13: ffff8882261b4fb8 R14: ffff88820c696c40 R15: 0000000000000000
<4>[  216.284204] FS:  00007ff2fa7b6e40(0000) GS:ffff888227a00000(0000) knlGS:0000000000000000
<4>[  216.284217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  216.284228] CR2: 00007fe7ce5bb4c0 CR3: 000000020ac7c004 CR4: 00000000000606e0
<4>[  216.284239] Call Trace:
<4>[  216.284299]  i915_driver_load+0xdb8/0x18a0 [i915]
<4>[  216.284312]  ? lock_acquire+0xa6/0x1c0
<4>[  216.284322]  ? __pm_runtime_resume+0x4f/0x80
<4>[  216.284342]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4>[  216.284351]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4>[  216.284359]  ? lockdep_hardirqs_on+0xe3/0x1b0
<4>[  216.284413]  i915_pci_probe+0x29/0xa0 [i915]
<4>[  216.284423]  pci_device_probe+0x9e/0x120
<4>[  216.284433]  really_probe+0xea/0x3c0
<4>[  216.284441]  driver_probe_device+0x10b/0x120
<4>[  216.284450]  device_driver_attach+0x4a/0x50
<4>[  216.284458]  __driver_attach+0x97/0x130
<4>[  216.284466]  ? device_driver_attach+0x50/0x50
<4>[  216.284474]  bus_for_each_dev+0x74/0xc0
<4>[  216.284483]  bus_add_driver+0x13f/0x210
<4>[  216.284490]  ? 0xffffffffa0139000
<4>[  216.284498]  driver_register+0x56/0xe0
<4>[  216.284505]  ? 0xffffffffa0139000
<4>[  216.284513]  do_one_initcall+0x58/0x300
<4>[  216.284521]  ? do_init_module+0x1d/0x1f6
<4>[  216.284530]  ? rcu_read_lock_sched_held+0x6f/0x80
<4>[  216.284540]  ? kmem_cache_alloc_trace+0x261/0x290
<4>[  216.284550]  do_init_module+0x56/0x1f6
<4>[  216.284558]  load_module+0x24d1/0x2990
<4>[  216.284573]  ? __se_sys_finit_module+0xd3/0xf0
<4>[  216.284581]  __se_sys_finit_module+0xd3/0xf0
<4>[  216.284593]  do_syscall_64+0x55/0x1c0
<4>[  216.284601]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  216.284610] RIP: 0033:0x7ff2f9e5d839
<4>[  216.284617] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4>[  216.284641] RSP: 002b:00007fffdfd49228 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4>[  216.284653] RAX: ffffffffffffffda RBX: 000055bbebf38fb0 RCX: 00007ff2f9e5d839
<4>[  216.284664] RDX: 0000000000000000 RSI: 000055bbebf34920 RDI: 0000000000000005
<4>[  216.284674] RBP: 000055bbebf34920 R08: 0000000000000000 R09: 0000000000000000
<4>[  216.284683] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000000000
<4>[  216.284693] R13: 000055bbebf384a0 R14: 0000000000000020 R15: 0000000000000016
<4>[  216.284707] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic asix usbnet mii mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel broadcom bcm_phy_lib snd_hda_codec tg3 snd_hwdep snd_hda_core ptp pps_core snd_pcm mei_me mei lpc_ich prime_numbers [last unloaded: i915]
<0>[  216.284757] Dumping ftrace buffer:
<0>[  216.284765]    (ftrace buffer empty)
<4>[  216.284782] ---[ end trace d9c9c85df37bb3b0 ]---
<4>[  216.284865] RIP: 0010:i915_gem_init+0xa53/0xac0 [i915]
<4>[  216.284877] Code: fc c7 bf e0 48 8b 35 24 11 1d 00 49 c7 c0 a9 89 66 a0 b9 55 05 00 00 48 c7 c2 40 47 61 a0 48 c7 c7 de c3 52 a0 e8 5d 8d c6 e0 <0f> 0b 48 c7 c1 c8 e2 63 a0 ba 14 01 00 00 48 c7 c6 10 48 61 a0 48
<4>[  216.284901] RSP: 0018:ffffc90000353a28 EFLAGS: 00010282
<4>[  216.284911] RAX: 0000000000000010 RBX: ffff88821cc00000 RCX: 0000000000000000
<4>[  216.284922] RDX: 0000000000000000 RSI: 0000000000000058 RDI: 0000000000000000
<4>[  216.284932] RBP: ffff88821cc00068 R08: ffffffffa06689a9 R09: 0000000000000000
<4>[  216.284943] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88821cc00d38
<4>[  216.284954] R13: ffff8882261b4fb8 R14: ffff88820c696c40 R15: 0000000000000000
<4>[  216.284965] FS:  00007ff2fa7b6e40(0000) GS:ffff888227a00000(0000) knlGS:0000000000000000
<4>[  216.284988] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  216.284997] CR2: 00007fe7ce5bb4c0 CR3: 000000020ac7c004 CR4: 00000000000606e0

Aborting.
Previous test: i915_module_load (reload-with-fault-injection)
Next test: i915_pm_rpm (module-reload)

Kernel badly tainted (0xc0) (check dmesg for details):
	(0x80) TAINT_DIE: Kernel has died - BUG/OOPS.

Comment 1 CI Bug Log 2019-06-03 11:49:01 UTC

The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SNB: igt@runner@aborted - fail - Previous test: i915_module_load (reload-with-fault-injection)
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6170/fi-snb-2600/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4320/fi-snb-2600/igt@runner@aborted.html

Comment 2 Chris Wilson 2019-06-03 14:19:21 UTC

https://patchwork.freedesktop.org/series/61425/

Comment 3 Chris Wilson 2019-06-06 14:28:29 UTC

commit ac543d7145bf3ad13f67a087196e6879e6993aac (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri May 31 12:32:45 2019 +0100

    drm/i915: Report an earlier wedged event when suspending the engines
    
    On i915_gem_load_power_context() we do care whether or not we succeed in
    completing the switch back to the kernel context (via idling the
    engines). Currently, we detect if an error occurs while we wait, but we
    do not report one if it occurred beforehand (and the status of the
    switch is undefined). Check the current terminally wedged status on
    entering the wait, and report it after flushing the requests, as if it
    had occurred during our own wait.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110824
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190531113245.30042-1-chris@chris-wilson.co.uk

Comment 4 CI Bug Log 2019-07-02 11:28:41 UTC

The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Comment 5 Martin Peres 2019-07-02 11:29:15 UTC

(In reply to Chris Wilson from comment #3)
> commit ac543d7145bf3ad13f67a087196e6879e6993aac (HEAD ->
> drm-intel-next-queued, drm-intel/for-linux-next,
> drm-intel/drm-intel-next-queued)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri May 31 12:32:45 2019 +0100
> 
>     drm/i915: Report an earlier wedged event when suspending the engines
>     
>     On i915_gem_load_power_context() we do care whether or not we succeed in
>     completing the switch back to the kernel context (via idling the
>     engines). Currently, we detect if an error occurs while we wait, but we
>     do not report one if it occurred beforehand (and the status of the
>     switch is undefined). Check the current terminally wedged status on
>     entering the wait, and report it after flushing the requests, as if it
>     had occurred during our own wait.
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110824
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20190531113245.30042-1-
> chris@chris-wilson.co.uk

Only happened once, a month ago. Closing the issue, thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.