Bug 108343

Summary: [CI][DRMTIP] igt@kms_busy@extended-pageflip-hang-newfb-render-f - incomplete - GEM_BUG_ON(!intel_engine_is_idle(engine))
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BDW, ICL i915 features:

Description Martin Peres 2018-10-12 14:32:05 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_125/fi-icl-u2/igt@kms_busy@extended-pageflip-hang-newfb-render-f.html

<3> [64.034928] reset_all_global_seqno:147 GEM_BUG_ON(!intel_engine_is_idle(engine))
<4> [64.035044] ------------[ cut here ]------------
<2> [64.035046] kernel BUG at drivers/gpu/drm/i915/i915_request.c:147!
<4> [64.035054] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [64.035058] CPU: 3 PID: 949 Comm: kms_busy Tainted: G     U  W         4.19.0-rc7-g6c3870cc0454-drmtip_125+ #1
<4> [64.035061] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.2352.A01.1808281852 08/28/2018
<4> [64.035096] RIP: 0010:reset_all_global_seqno.part.5+0x1c5/0x260 [i915]
<4> [64.035099] Code: 7a 99 d3 c8 48 8b 35 7a ff 1b 00 49 c7 c0 cf d2 4c c0 b9 93 00 00 00 48 c7 c2 e0 38 4b c0 48 c7 c7 a0 44 3c c0 e8 9b 28 da c8 <0f> 0b 48 c7 c1 c0 73 4e c0 ba 94 00 00 00 48 c7 c6 e0 38 4b c0 48
<4> [64.035101] RSP: 0018:ffff9e15c09fbd78 EFLAGS: 00010286
<4> [64.035104] RAX: 000000000000000f RBX: ffff8a96d1aca158 RCX: 0000000000000000
<4> [64.035106] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff8a96eea40ff8
<4> [64.035108] RBP: ffff8a96d57477c8 R08: 000000000000df37 R09: ffff8a96eebf8000
<4> [64.035110] R10: 0000000000000000 R11: ffff8a96eea40ff8 R12: 0000000000000000
<4> [64.035112] R13: ffff8a96d5740000 R14: ffff8a96d57477e8 R15: ffffffffc03c435c
<4> [64.035114] FS:  00007fc5b105b980(0000) GS:ffff8a96f0780000(0000) knlGS:0000000000000000
<4> [64.035117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [64.035119] CR2: 000055e88b412cb8 CR3: 00000004a86ba006 CR4: 0000000000760ee0
<4> [64.035121] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [64.035122] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [64.035124] PKRU: 55555554
<4> [64.035126] Call Trace:
<4> [64.035154]  i915_next_seqno_set+0x33/0x60 [i915]
<4> [64.035160]  simple_attr_write+0xb0/0xd0
<4> [64.035165]  full_proxy_write+0x51/0x80
<4> [64.035169]  __vfs_write+0x31/0x180
<4> [64.035172]  ? rcu_lockdep_current_cpu_online+0x8f/0xd0
<4> [64.035175]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [64.035178]  ? rcu_sync_lockdep_assert+0x29/0x50
<4> [64.035180]  ? __sb_start_write+0x152/0x1f0
<4> [64.035183]  ? __sb_start_write+0x168/0x1f0
<4> [64.035186]  vfs_write+0xbd/0x1b0
<4> [64.035189]  ksys_write+0x50/0xc0
<4> [64.035193]  do_syscall_64+0x55/0x190
<4> [64.035197]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [64.035200] RIP: 0033:0x7fc5b07db281
<4> [64.035202] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [64.035204] RSP: 002b:00007ffeab63d1d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [64.035207] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc5b07db281
<4> [64.035209] RDX: 0000000000000001 RSI: 00007fc5b0c5769a RDI: 0000000000000009
<4> [64.035211] RBP: 00007ffeab63d200 R08: 0000000000000000 R09: 0000000000000022
<4> [64.035213] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e88b1da930
<4> [64.035215] R13: 00007ffeab63df20 R14: 0000000000000000 R15: 0000000000000000
<4> [64.035219] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep btusb btrtl snd_hda_core btbcm btintel cdc_ether e1000e usbnet snd_pcm mii bluetooth ecdh_generic prime_numbers
<0> [64.035250] Dumping ftrace buffer:
<0> [64.035252] ---------------------------------
Comment 1 Chris Wilson 2018-10-15 11:11:00 UTC
*** Bug 108367 has been marked as a duplicate of this bug. ***
Comment 2 Chris Wilson 2018-10-15 11:12:10 UTC
Notably, on both it is an idle vecs that explodes. Very, very suspicious hw behaviour.
Comment 3 Chris Wilson 2018-10-15 19:35:23 UTC
Possibly?

commit 9d3eb2c33f03432a25a6a3ab3177f839f25cbaf5 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Oct 15 12:58:56 2018 +0100

    drm/i915: Hold rpm wakeref for debugfs/i915_drop_caches_set
    
    Since we peek into HW state and poke around, it behoves us to acquire a
    runtime pm wakeref beforehand.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=108343
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108364
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181015115856.18590-1-chris@chris-wilson.co.uk
Comment 4 Lakshmi 2018-10-23 13:13:43 UTC
Update: Last seen this issue drmtip_125 (1 week, 4 days / 117 runs ago). Need to wait for 2 more weeks to close this bug and occurred only once.
Comment 5 Martin Peres 2018-11-12 09:40:49 UTC
Still seen in BAT...

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5117/fi-icl-u2/igt@gem_exec_reloc@basic-write-read.html

<3> [115.788834] reset_all_global_seqno:149 GEM_BUG_ON(!intel_engine_is_idle(engine))
<4> [115.788983] ------------[ cut here ]------------
<2> [115.788986] kernel BUG at drivers/gpu/drm/i915/i915_request.c:149!
<4> [115.789012] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [115.789021] CPU: 2 PID: 2181 Comm: gem_exec_reloc Tainted: G     U  W         4.20.0-rc1-CI-CI_DRM_5117+ #1
<4> [115.789030] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.2402.AD3.1810170014 10/17/2018
<4> [115.789083] RIP: 0010:reset_all_global_seqno.part.5+0x1d3/0x220 [i915]
<4> [115.789091] Code: ec a6 e2 e0 48 8b 35 64 45 1c 00 49 c7 c0 e4 d1 3d a0 b9 95 00 00 00 48 c7 c2 80 35 3c a0 48 c7 c7 9e 1a 2d a0 e8 3d 2d e9 e0 <0f> 0b 48 c7 c1 20 75 3f a0 ba 96 00 00 00 48 c7 c6 80 35 3c a0 48
<4> [115.789119] RSP: 0018:ffffc90002733d70 EFLAGS: 00010282
<4> [115.789129] RAX: 000000000000000f RBX: ffff880494f62158 RCX: 0000000000000000
<4> [115.789140] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff8804ae2574e8
<4> [115.789153] RBP: ffff8804946877c0 R08: 000000000008a268 R09: ffff8804ae396000
<4> [115.789162] R10: 0000000000000000 R11: ffff8804ae2574e8 R12: 0000000000000000
<4> [115.789172] R13: ffff880494680000 R14: ffff8804946877e0 R15: ffffffffa02d193e
<4> [115.789183] FS:  00007f7408c10980(0000) GS:ffff8804aff00000(0000) knlGS:0000000000000000
<4> [115.789195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [115.789204] CR2: 00007fffa6305e58 CR3: 0000000494d50005 CR4: 0000000000760ee0
<4> [115.789212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [115.789219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [115.789226] PKRU: 55555554
<4> [115.789230] Call Trace:
<4> [115.789274]  i915_drop_caches_set+0x261/0x270 [i915]
<4> [115.789286]  simple_attr_write+0xb0/0xd0
<4> [115.789297]  full_proxy_write+0x52/0x90
<4> [115.789307]  __vfs_write+0x31/0x180
<4> [115.789315]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [115.789321]  ? rcu_sync_lockdep_assert+0x29/0x50
<4> [115.789328]  ? __sb_start_write+0x152/0x1f0
<4> [115.789334]  ? __sb_start_write+0x163/0x1f0
<4> [115.789340]  vfs_write+0xbd/0x1b0
<4> [115.789346]  ksys_write+0x50/0xc0
<4> [115.789353]  do_syscall_64+0x55/0x190
<4> [115.789360]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [115.789367] RIP: 0033:0x7f74085a0281
<4> [115.789372] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [115.789389] RSP: 002b:00007fffa6309178 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [115.789397] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f74085a0281
<4> [115.789404] RDX: 0000000000000005 RSI: 00007fffa6309200 RDI: 0000000000000007
<4> [115.789411] RBP: 00007fffa63091a0 R08: 0000000000000000 R09: 0000000000000000
<4> [115.789418] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7408589718
<4> [115.789425] R13: 0000000000000003 R14: 00007f740858e628 R15: 00007f740858ad80
<4> [115.789435] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul btusb ghash_clmulni_intel snd_hda_intel btrtl snd_hda_codec btbcm btintel snd_hwdep snd_hda_core snd_pcm bluetooth e1000e cdc_ether usbnet mii ecdh_generic prime_numbers
Comment 6 Martin Peres 2018-11-13 14:43:28 UTC
Also seen on BDW: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_139/fi-bdw-gvtdvm/igt@gem_exec_whisper@normal.html

<3> [55.538979] reset_all_global_seqno:149 GEM_BUG_ON(!intel_engine_is_idle(engine))
<4> [55.539092] ------------[ cut here ]------------
<2> [55.539095] kernel BUG at drivers/gpu/drm/i915/i915_request.c:149!
<4> [55.539115] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [55.539122] CPU: 0 PID: 986 Comm: gem_exec_whispe Tainted: G     U            4.20.0-rc1-gb3838255012c-drmtip_139+ #1
<4> [55.539132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [55.539184] RIP: 0010:reset_all_global_seqno.part.5+0x1d3/0x220 [i915]
<4> [55.539192] Code: ec e1 c8 ea 48 8b 35 4c 46 1c 00 49 c7 c0 e4 e1 57 c0 b9 95 00 00 00 48 c7 c2 80 45 56 c0 48 c7 c7 de 29 47 c0 e8 4d 68 cf ea <0f> 0b 48 c7 c1 18 85 59 c0 ba 96 00 00 00 48 c7 c6 80 45 56 c0 48
<4> [55.539208] RSP: 0018:ffff9acb80a1fa60 EFLAGS: 00010286
<4> [55.539214] RAX: 000000000000000f RBX: ffff97146f460008 RCX: 0000000000000000
<4> [55.539221] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff97147d427a78
<4> [55.539229] RBP: ffff97146f4077a0 R08: 00000000004d6045 R09: ffff97147d42c000
<4> [55.539236] R10: 0000000000000000 R11: ffff97147d427a78 R12: 0000000000000000
<4> [55.539243] R13: ffff97146f400000 R14: ffff97146f4077d8 R15: ffffffffc047287e
<4> [55.539251] FS:  00007f9894a49980(0000) GS:ffff97147da00000(0000) knlGS:0000000000000000
<4> [55.539259] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [55.539265] CR2: 000055a482f11018 CR3: 0000000079a38001 CR4: 00000000003606f0
<4> [55.539274] Call Trace:
<4> [55.539316]  i915_request_alloc+0x4a6/0x7e0 [i915]
<4> [55.539355]  i915_gem_do_execbuffer+0x71a/0x1580 [i915]
<4> [55.539364]  ? deactivate_slab.isra.26+0x74b/0x7a0
<4> [55.539374]  ? ___slab_alloc.constprop.34+0x21c/0x380
<4> [55.539381]  ? ___slab_alloc.constprop.34+0x21c/0x380
<4> [55.539416]  ? i915_gem_execbuffer2_ioctl+0xc4/0x3f0 [i915]
<4> [55.539426]  ? lock_acquire+0xa6/0x1c0
<4> [55.539433]  ? __might_fault+0x38/0x90
<4> [55.539468]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
<4> [55.539502]  i915_gem_execbuffer2_ioctl+0x21b/0x3f0 [i915]
<4> [55.539538]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
<4> [55.539547]  drm_ioctl_kernel+0x81/0xf0
<4> [55.539554]  drm_ioctl+0x2de/0x390
<4> [55.539586]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
<4> [55.539595]  ? _raw_spin_unlock_irq+0x24/0x50
<4> [55.539602]  ? lockdep_hardirqs_on+0xe0/0x1b0
<4> [55.539609]  do_vfs_ioctl+0xa0/0x6e0
<4> [55.539616]  ? __schedule+0x36c/0xb50
<4> [55.539622]  ksys_ioctl+0x35/0x60
<4> [55.539628]  __x64_sys_ioctl+0x11/0x20
<4> [55.539634]  do_syscall_64+0x55/0x190
<4> [55.539640]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [55.539646] RIP: 0033:0x7f98940ef5d7
<4> [55.539651] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
<4> [55.539668] RSP: 002b:00007ffefad6ea98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4> [55.539676] RAX: ffffffffffffffda RBX: 00007ffefad85080 RCX: 00007f98940ef5d7
<4> [55.539684] RDX: 00007ffefad6ec90 RSI: 0000000040406469 RDI: 0000000000000005
<4> [55.539691] RBP: 00007ffefad6ec90 R08: 00007f98943c4230 R09: 00007f98943c4240
<4> [55.539698] R10: 00000000ffffffe2 R11: 0000000000000246 R12: 0000000040406469
<4> [55.539705] R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000000
<4> [55.539715] Modules linked in: i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000 prime_numbers i2c_piix4
Comment 7 Martin Peres 2018-11-13 14:46:43 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_139/fi-icl-u2/igt@gem_eio@in-flight-internal-1us.html

<4> [361.723249] ------------[ cut here ]------------
<2> [361.723252] kernel BUG at drivers/gpu/drm/i915/i915_request.c:149!
<4> [361.723275] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [361.723283] CPU: 1 PID: 1120 Comm: gem_eio Tainted: G     U  W         4.20.0-rc1-gb3838255012c-drmtip_139+ #1
<4> [361.723297] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.2402.AD3.1810170014 10/17/2018
<4> [361.723350] RIP: 0010:reset_all_global_seqno.part.5+0x1d3/0x220 [i915]
<4> [361.723358] Code: ec c1 e4 ca 48 8b 35 4c 46 1c 00 49 c7 c0 e4 01 3c c0 b9 95 00 00 00 48 c7 c2 80 65 3a c0 48 c7 c7 de 49 2b c0 e8 4d 48 eb ca <0f> 0b 48 c7 c1 18 a5 3d c0 ba 96 00 00 00 48 c7 c6 80 65 3a c0 48
<4> [361.723375] RSP: 0018:ffffb897802abd78 EFLAGS: 00010286
<4> [361.723382] RAX: 000000000000000f RBX: ffff948154e82158 RCX: 0000000000000000
<4> [361.723389] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff94816e257a38
<4> [361.723397] RBP: ffff948154e977b8 R08: 000000000001436c R09: ffff94816e37c000
<4> [361.723404] R10: 0000000000000000 R11: ffff94816e257a38 R12: 0000000000000000
<4> [361.723411] R13: ffff948154e90000 R14: ffff948154e977d8 R15: ffffffffc02b487e
<4> [361.723419] FS:  00007f19f6c1c980(0000) GS:ffff94816fe80000(0000) knlGS:0000000000000000
<4> [361.723428] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [361.723435] CR2: 00007ffcb09a0ff8 CR3: 00000004a7974003 CR4: 0000000000760ee0
<4> [361.723442] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [361.723450] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [361.723457] PKRU: 55555554
<4> [361.723461] Call Trace:
<4> [361.723492]  i915_next_seqno_set+0x33/0x60 [i915]
<4> [361.723502]  simple_attr_write+0xb0/0xd0
<4> [361.723510]  full_proxy_write+0x52/0x90
<4> [361.723517]  __vfs_write+0x31/0x180
<4> [361.723524]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [361.723530]  ? rcu_sync_lockdep_assert+0x29/0x50
<4> [361.723537]  ? __sb_start_write+0x152/0x1f0
<4> [361.723543]  ? __sb_start_write+0x163/0x1f0
<4> [361.723550]  vfs_write+0xbd/0x1b0
<4> [361.723556]  ksys_write+0x50/0xc0
<4> [361.723563]  do_syscall_64+0x55/0x190
<4> [361.723570]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [361.723577] RIP: 0033:0x7f19f6193281
<4> [361.723582] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [361.723600] RSP: 002b:00007ffcb09a29b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [361.723608] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f19f6193281
<4> [361.723616] RDX: 0000000000000001 RSI: 00007f19f6817d7a RDI: 0000000000000009
<4> [361.723623] RBP: 00007ffcb09a29e0 R08: 0000000000000000 R09: 0000000000000022
<4> [361.723631] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d692f45c50
<4> [361.723638] R13: 00007ffcb09a32e0 R14: 0000000000000000 R15: 0000000000000000
<4> [361.723648] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec btusb btrtl btbcm btintel snd_hwdep e1000e snd_hda_core bluetooth snd_pcm cdc_ether usbnet mii ecdh_generic prime_numbers
Comment 8 Chris Wilson 2018-12-04 20:46:10 UTC
Bug was unfortunately hijacked by a separate issue.
Comment 9 Francesco Balestrieri 2018-12-11 13:28:23 UTC
The last ten occurrences happened on average daily, now not seen for 1 week and 4 days. I'm going to close it, shout if you disagree.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.