Bug 107512 - [CI][BAT] igt@drv_selftest@live_requests - incomplete - GEM_BUG_ON(!i915_request_completed(rq))
Summary: [CI][BAT] igt@drv_selftest@live_requests - incomplete - GEM_BUG_ON(!i915_requ...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-07 12:09 UTC by Martin Peres
Modified: 2019-01-16 07:36 UTC (History)
1 user (show)

See Also:
i915 platform: CFL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-08-07 12:09:47 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4583/fi-cfl-8109u/igt@drv_selftest@live_requests.html

<3>[  492.089735] process_csb:1030 GEM_BUG_ON(!i915_request_completed(rq))
<4>[  492.089820] ------------[ cut here ]------------
<2>[  492.089822] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:1030!
<4>[  492.089837] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4>[  492.089843] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G     U            4.18.0-rc6-CI-CI_DRM_4583+ #1
<4>[  492.089850] Hardware name: Intel Corporation NUC8i3BEH/NUC8BEB, BIOS BECFL357.86A.0037.2018.0614.2204 06/14/2018
<4>[  492.089911] RIP: 0010:process_csb+0x24b/0x780 [i915]
<4>[  492.089916] Code: 64 df b1 e0 48 8b 35 04 ca 19 00 49 c7 c0 9e 2e 6c a0 b9 06 04 00 00 48 c7 c2 40 91 6a a0 48 c7 c7 46 c6 5d a0 e8 25 70 b8 e0 <0f> 0b 49 8b 8e a8 00 00 00 48 8b 91 c0 02 00 00 48 89 4d d0 8b 92 
<4>[  492.089962] RSP: 0018:ffff8802bdd03e28 EFLAGS: 00010082
<4>[  492.089967] RAX: 000000000000000b RBX: ffff88021311a158 RCX: 0000000000000000
<4>[  492.089972] RDX: 0000000000000000 RSI: 0000000000000044 RDI: 0000000000000000
<4>[  492.089978] RBP: ffff8802bdd03e90 R08: ffffffffa06c2e9e R09: 0000000000000001
<4>[  492.089983] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88029a7fe04c
<4>[  492.089989] R13: 0000000000000001 R14: ffff88029a5a0040 R15: ffff88029a7fe040
<4>[  492.089994] FS:  0000000000000000(0000) GS:ffff8802bdd00000(0000) knlGS:0000000000000000
<4>[  492.090001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  492.090006] CR2: 000056390a298d80 CR3: 0000000005210005 CR4: 00000000003606e0
<4>[  492.090011] Call Trace:
<4>[  492.090014]  <IRQ>
<4>[  492.090057]  __execlists_submission_tasklet+0x2a/0xbf0 [i915]
<4>[  492.090098]  execlists_submission_tasklet+0x46/0x60 [i915]
<4>[  492.090106]  tasklet_action_common.isra.5+0x47/0xb0
<4>[  492.090112]  __do_softirq+0xd9/0x505
<4>[  492.090117]  ? _raw_spin_unlock+0x29/0x40
<4>[  492.090122]  irq_exit+0xa5/0xc0
<4>[  492.090126]  do_IRQ+0x9a/0x120
<4>[  492.090131]  common_interrupt+0xf/0xf
<4>[  492.090134]  </IRQ>
<4>[  492.090139] RIP: 0010:cpuidle_enter_state+0xac/0x360
<4>[  492.090143] Code: 44 00 00 31 ff e8 a4 68 94 ff 45 84 f6 74 12 9c 58 f6 c4 02 0f 85 31 02 00 00 31 ff e8 3d 07 9b ff e8 a8 e5 96 ff fb 4c 29 fb <48> ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 ea b8 ff 
<4>[  492.090189] RSP: 0018:ffffc900000efe90 EFLAGS: 00000206 ORIG_RAX: ffffffffffffffde
<4>[  492.090195] RAX: ffff8802b41c8040 RBX: 000000000001b9b2 RCX: 0000000000000000
<4>[  492.090201] RDX: 0000000000000046 RSI: ffffffff8212855b RDI: ffffffff820d77f7
<4>[  492.090206] RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000000
<4>[  492.090212] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82296978
<4>[  492.090217] R13: ffffe8ffffd04d10 R14: 0000000000000000 R15: 0000007292d3450d
<4>[  492.090226]  do_idle+0x1f3/0x250
<4>[  492.090231]  cpu_startup_entry+0x6a/0x70
<4>[  492.090237]  start_secondary+0x19d/0x1f0
<4>[  492.090242]  secondary_startup_64+0xa5/0xb0
<4>[  492.090248] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic btusb btrtl btbcm snd_hda_codec btintel snd_hwdep x86_pkg_temp_thermal coretemp snd_hda_core crct10dif_pclmul bluetooth crc32_pclmul ghash_clmulni_intel snd_pcm e1000e ecdh_generic mei_me mei prime_numbers [last unloaded: i915]
<0>[  492.090284] Dumping ftrace buffer:
<0>[  492.090289]    (ftrace buffer empty)
<4>[  492.090293] ---[ end trace fa15ddb23cfd4b5e ]---
Comment 1 Chris Wilson 2018-08-07 12:39:39 UTC
Hmm, first thought is whether the earlier fail could have bobby trapped the subsequent driver init. Quite scary in its own right, but that's the only lead here as because of that earlier hang we also lost the tracek buffer.

If there's a better debug example, that may help jog my memory as to what it was ;)
Comment 2 James Ausmus 2018-08-21 00:27:38 UTC
Test is currently showing green for all past 60 runs, closing
Comment 3 Martin Peres 2018-09-07 15:48:46 UTC
(In reply to James Ausmus from comment #2)
> Test is currently showing green for all past 60 runs, closing

Please, do not close issues like this that quickly. We have no idea how reproducible the issue and even if the reproduction rate would be 1%, it is still too high :s

I will re-open this bug since Chris is baffled as to how this may have occurred. We may get some input from other developers.
Comment 4 Francesco Balestrieri 2019-01-09 14:28:45 UTC
5 months later, still no occurrence. Closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.