102035 – [BAT][CI] Incomplete : kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:880!

Bug 102035 - [BAT][CI] Incomplete : kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:880!

Summary: [BAT][CI] Incomplete : kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:880!

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	DRI git
Hardware:	Other All

Importance:	highest critical
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Duplicates (4):	102393 102705 103178 103190 (view as bug list)
Depends on:
Blocks:

Reported:	2017-08-04 10:16 UTC by Martin Peres
Modified:	2018-02-07 12:31 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:	BXT, CNL, GLK
i915 features:	GEM/execlists

Attachments

Description Martin Peres 2017-08-04 10:16:05 UTC

On CI_DRM_2915, the machine fi-bxt-j4205 produced the following kernel BUG when running igt@kms_pipe_crc_basic@nonblocking-crc-pipe-b:

<4>[  412.528644] ------------[ cut here ]------------
<2>[  412.528664] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:604!
<4>[  412.528680] invalid opcode: 0000 [#1] PREEMPT SMP
<4>[  412.528691] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mei_me mii mei lpc_ich prime_numbers i2c_hid pinctrl_broxton pinctrl_intel
<4>[  412.529242] CPU: 2 PID: 4271 Comm: kms_pipe_crc_ba Tainted: G     U  W       4.13.0-rc3-CI-CI_DRM_2915+ #1
<4>[  412.529262] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4205-ITX, BIOS P1.10 09/29/2016
<4>[  412.529283] task: ffff88027663ce40 task.stack: ffffc90000b5c000
<4>[  412.529371] RIP: 0010:intel_lrc_irq_handler+0x25c/0x500 [i915]
<4>[  412.529383] RSP: 0018:ffff88027fd03ea0 EFLAGS: 00010202
<4>[  412.529412] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffff880263ccb1c0
<4>[  412.529426] RDX: 0000000000000006 RSI: ffffc900008223a0 RDI: 00000000ffffffff
<4>[  412.529440] RBP: ffff88027fd03f00 R08: ffff88026f6e0000 R09: 0000000000000000
<4>[  412.529454] R10: ffff88026cae8058 R11: 0000000000000000 R12: ffff88026cae8008
<4>[  412.529468] R13: 0000000000008002 R14: 0000000000000003 R15: ffffc90000822370
<4>[  412.529483] FS:  00007f2e25787a40(0000) GS:ffff88027fd00000(0000) knlGS:0000000000000000
<4>[  412.529501] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  412.529513] CR2: 000000401903a088 CR3: 00000002187f0000 CR4: 00000000003406e0
<4>[  412.529526] Call Trace:
<4>[  412.529534]  <IRQ>
<4>[  412.529547]  tasklet_hi_action+0x93/0x120
<4>[  412.529558]  __do_softirq+0xbb/0x4b0
<4>[  412.529570]  irq_exit+0xa9/0xc0
<4>[  412.529581]  do_IRQ+0x6c/0x130
<4>[  412.529592]  common_interrupt+0x90/0x90
<4>[  412.529602] RIP: 0010:_raw_spin_unlock_irqrestore+0x54/0x60
<4>[  412.529614] RSP: 0018:ffffc90000b5fa48 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff6d
<4>[  412.529634] RAX: ffffffffffffffff RBX: 0000000000000206 RCX: 0000000000000000
<4>[  412.529648] RDX: ffffffff8146d625 RSI: 0000000000000001 RDI: ffffffff8188f292
<4>[  412.529662] RBP: ffffc90000b5fa58 R08: 0000000000000001 R09: 0000000000000000
<4>[  412.529676] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82da4028
<4>[  412.529690] R13: ffffffffa025ae60 R14: ffff880273e1a008 R15: ffff880263cc9348
<4>[  412.529704]  </IRQ>
<4>[  412.529714]  ? __debug_object_init+0x215/0x420
<4>[  412.529725]  ? _raw_spin_unlock_irqrestore+0x52/0x60
<4>[  412.529738]  __debug_object_init+0x215/0x420
<4>[  412.529782]  ? reset_all_global_seqno.part.7+0x108/0x108 [i915]
<4>[  412.529796]  debug_object_init+0x16/0x20
<4>[  412.529834]  __i915_sw_fence_init+0x2e/0x60 [i915]
<4>[  412.529879]  i915_gem_request_alloc+0x191/0x400 [i915]
<4>[  412.529922]  i915_gem_do_execbuffer+0x6ba/0x12b0 [i915]
<4>[  412.529941]  ? lock_acquire+0xb0/0x200
<4>[  412.529953]  ? __might_fault+0x39/0x90
<4>[  412.529995]  i915_gem_execbuffer2+0x9e/0x1a0 [i915]
<4>[  412.530038]  ? i915_gem_execbuffer+0x2b0/0x2b0 [i915]
<4>[  412.530052]  drm_ioctl_kernel+0x64/0xb0
<4>[  412.530063]  drm_ioctl+0x2f4/0x3d0
<4>[  412.530107]  ? i915_gem_execbuffer+0x2b0/0x2b0 [i915]
<4>[  412.530125]  ? mntput_no_expire+0x7f/0x3d0
<4>[  412.530136]  ? mntput+0x1f/0x30
<4>[  412.530147]  do_vfs_ioctl+0x8f/0x660
<4>[  412.530158]  ? task_work_run+0x8f/0xb0
<4>[  412.530169]  SyS_ioctl+0x3c/0x70
<4>[  412.530180]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[  412.530191] RIP: 0033:0x7f2e2397cf07
<4>[  412.530200] RSP: 002b:00007ffe0be94258 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[  412.530218] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f2e2397cf07
<4>[  412.530232] RDX: 00007ffe0be942f0 RSI: 0000000040406469 RDI: 0000000000000004
<4>[  412.530245] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000
<4>[  412.530259] R10: 000000000000001f R11: 0000000000000246 R12: 0000000000000001
<4>[  412.530273] R13: 00007f2e23c45c40 R14: 0000000000000000 R15: 0000000000000000
<4>[  412.530289] Code: ff 41 83 e5 08 0f 85 d8 fe ff ff 0f 0b 0f 0b 0f 0b 48 89 cf 4c 89 45 c0 4c 89 55 c8 e8 be 4c 47 e1 4c 8b 45 c0 4c 8b 55 c8 eb 92 <0f> 0b 0f 0b 49 8d 84 24 40 03 00 00 48 83 e2 fc 48 89 45 a8 74 
<1>[  412.530434] RIP: intel_lrc_irq_handler+0x25c/0x500 [i915] RSP: ffff88027fd03ea0
<4>[  412.530481] ---[ end trace 8d25a06d708c0561 ]---
<0>[  412.740935] Kernel panic - not syncing: Fatal exception in interrupt
<0>[  412.740975] Kernel Offset: disabled

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2915/fi-bxt-j4205/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-b.html

Comment 1 Chris Wilson 2017-08-04 10:50:22 UTC

The HW replied with one more context-switch than we submitted. It is most disturbing. gem_concurrent_blit can typically hit this within a few minutes.

Comment 2 Chris Wilson 2017-08-04 15:39:00 UTC

Hitting the same assertion on bxt/gem_concurrent_blit, always has a trail of breadcrumbs like:

[   61.964007] bcs0(3): 0+ rq.seqno=2275 [ctx=91], count=1
[   61.964050] bcs0(1): [3/3] status=1
[   61.964096] bcs0(1): 0+ rq.seqno=2276 [ctx=91], count=2
[   61.964180] bcs0(1): [4/4] status=8002
[   61.964189] bcs0(1): - rq.seqno=2276 [ctx=91], status=8002, count=2
[   61.977615] bcs0(1): [5/5] status=18
[   61.977640] bcs0(1): - rq.seqno=2276 [ctx=91], status=18, count=1
[   61.995341] bcs0(2): 0+ rq.seqno=2277 [ctx=9b], count=1
[   61.995427] bcs0(2): 0+ rq.seqno=2278 [ctx=9b], count=2
[   61.995623] bcs0(1): [0/1] status=1
[   61.995786] bcs0(1): [1/1] status=8002
[   61.995802] bcs0(1): - rq.seqno=2278 [ctx=9b], status=8002, count=2
[   62.008366] bcs0(1): [2/2] status=18
[   62.008391] bcs0(1): - rq.seqno=2278 [ctx=9b], status=18, count=1
[   62.020052] bcs0(0): 0+ rq.seqno=2279 [ctx=8c], count=1
[   62.020142] bcs0(0): 0+ rq.seqno=2280 [ctx=8c], count=2
[   62.020207] bcs0(1): [3/4] status=12
[   62.020219] bcs0(1): - rq.seqno=2280 [ctx=8c], status=12, count=2

^ This is the bogus event. 0x12 == COMPLETE | PREEMPTED, but it is following the ACTIVE->IDLE notification, where we always expect the IDLE->ACTIVE (status=1) afterwards (see examples above). Following the bogus 0x12 wakeup, the context-switch notification never "catches up", i.e. we never see the final context-switch of ELEMENT_SWITCH or IDLE. At that point only a reset seems to cure it.

[   62.020228] bcs0(1): [4/4] status=8002
[   62.020235] bcs0(1): - rq.seqno=2280 [ctx=8c], status=8002, count=1
[   62.024699] bcs0(3): 1+ rq.seqno=2281 [ctx=96], count=1
[   62.024724] bcs0(3): 0+ rq.seqno=2280 [ctx=8c], count=2
[   62.024753] bcs0(1): [5/5] status=8002
[   62.024774] bcs0(1): - rq.seqno=2280 [ctx=8c], status=8002, count=2


Now, answers on a postcard what we are meant to do to this prevent heading into this cul-de-sac.

Comment 3 Jani Saarinen 2017-09-01 08:14:51 UTC

Based on CI data failure rate of 1 / 105 runs (1 %). Dropping priority.

Comment 4 Chris Wilson 2017-09-13 13:19:23 UTC

*** Bug 102705 has been marked as a duplicate of this bug. ***

Comment 5 Marta Löfstedt 2017-09-20 11:17:53 UTC

Here we go again:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3112/fi-bxt-j4205/igt@kms_busy@basic-flip-c.html

Comment 6 Marta Löfstedt 2017-10-11 07:27:13 UTC

*** Bug 103190 has been marked as a duplicate of this bug. ***

Comment 7 Marta Löfstedt 2017-10-11 07:52:37 UTC

*** Bug 103178 has been marked as a duplicate of this bug. ***

Comment 8 Chris Wilson 2017-10-11 09:41:00 UTC

Glk? I thought glk followed the cnl pattern (random context id reported) rather than this pattern of impossible lite-restore following complete | idle.

Comment 9 Chris Wilson 2017-10-11 14:36:12 UTC

(In reply to Chris Wilson from comment #8)
> Glk? I thought glk followed the cnl pattern (random context id reported)
> rather than this pattern of impossible lite-restore following complete |
> idle.

Nope, see https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5993/fi-glk-1/dmesg-1507731622_Panic_2.log definitely glk shares the bxt failure.

Comment 10 Marta Löfstedt 2017-10-12 06:08:26 UTC

Also,

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3209/shard-apl5/igt@kms_busy@extended-pageflip-hang-oldfb-render-B.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3209/shard-apl5/dmesg-1507719682_Oops_1.log
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3209/shard-apl5/dmesg-1507719682_Panic_2.log

Comment 11 Marta Löfstedt 2017-10-13 13:01:03 UTC

ALso on:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3228/shard-apl7/igt@kms_properties@connector-properties-legacy.html

Note there is an issue that when we get multiple pstore logs on one of the shared machines, you have to compare with dmesg to figure out which one that belong to which shard. In this case it is this pstore backtrace that is relevant:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3228/shard-apl7/dmesg-1507893658_Panic_2.log

Comment 12 Marta Löfstedt 2017-10-16 07:13:12 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3237/shard-apl2/igt@kms_cursor_legacy@cursorA-vs-flipA-atomic.html

Comment 13 Marta Löfstedt 2017-10-16 07:14:09 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3238/shard-apl2/igt@kms_draw_crc@draw-method-xrgb8888-render-xtiled.html

Comment 14 Marta Löfstedt 2017-10-17 06:47:46 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3242/shard-apl2/igt@kms_vblank@query-forked.html

Comment 15 Marta Löfstedt 2017-10-17 06:48:30 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3242/shard-apl4/igt@kms_universal_plane@cursor-fb-leak-pipe-B.html

Comment 16 Marta Löfstedt 2017-10-17 11:12:13 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3248/shard-apl3/igt@kms_flip@flip-vs-modeset-vs-hang.html

Comment 17 Marta Löfstedt 2017-10-18 07:49:12 UTC

reproduced:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3251/shard-apl1/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-ytiled.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3252/shard-apl7/igt@kms_draw_crc@draw-method-xrgb8888-mmap-cpu-untiled.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3253/shard-apl4/igt@kms_cursor_legacy@basic-flip-before-cursor-legacy.html

Comment 18 Marta Löfstedt 2017-10-18 09:07:50 UTC

Reproduced on GLK-shards

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3252/shard-glkb2/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-move.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3252/shard-glkb2/dmesg-1508246195_Oops_1.log

Comment 19 Marta Löfstedt 2017-10-18 10:28:02 UTC

Also,

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3254/shard-glkb1/igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-shrfb-draw-render.html

and:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3254/shard-glkb1/igt@kms_setmode@basic.html

Note both pstore from shard-glkb1 has the same issue.

Comment 20 Marta Löfstedt 2017-10-18 12:20:23 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3256/shard-glkb3/igt@kms_frontbuffer_tracking@fbcpsr-2p-shrfb-fliptrack.html

Comment 21 Marta Löfstedt 2017-10-18 13:06:10 UTC

GLk-shards again:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3257/shard-glkb3/igt@kms_frontbuffer_tracking@psr-2p-primscrn-pri-shrfb-draw-mmap-gtt.html

Comment 22 Marta Löfstedt 2017-10-20 06:07:01 UTC

Also, CI_DRM_3266 fi-skl-6260u:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3266/fi-skl-6260u/igt@kms_cursor_legacy@basic-flip-after-cursor-varying-size.html

<4>[  284.776232] ------------[ cut here ]------------
<2>[  284.776235] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:782!
<4>[  284.776260] invalid opcode: 0000 [#1] PREEMPT SMP
<4>[  284.776268] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp snd_hda_intel snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ghash_clmulni_intel snd_pcm e1000e ptp pps_core mei_me mei prime_numbers pinctrl_sunrisepoint pinctrl_intel i2c_hid
<4>[  284.776320] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G     U          4.14.0-rc5-CI-CI_DRM_3266+ #1
<4>[  284.776329] Hardware name:                  /NUC6i5SYB, BIOS SYSKLi35.86A.0057.2017.0119.1758 01/19/2017
<4>[  284.776338] task: ffff8802652f50c0 task.stack: ffffc900000ac000
<4>[  284.776369] RIP: 0010:intel_lrc_irq_handler+0x304/0x8a0 [i915]
<4>[  284.776375] RSP: 0018:ffff88026ed03ec0 EFLAGS: 00010246
<4>[  284.776382] RAX: ffff8802525545f0 RBX: ffff880252554590 RCX: 0000000000000000
<4>[  284.776389] RDX: 0000000000000000 RSI: ffffffff81d0d974 RDI: ffff8802525542a8
<4>[  284.776396] RBP: ffff88026ed03f18 R08: 0000000000000000 R09: 0000000000000001
<4>[  284.776403] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880259990000
<4>[  284.776409] R13: 0000000000000000 R14: ffffffff81d1849f R15: 0000000000000000
<4>[  284.776417] FS:  0000000000000000(0000) GS:ffff88026ed00000(0000) knlGS:0000000000000000
<4>[  284.776425] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  284.776431] CR2: 00007f6cef3eb000 CR3: 0000000003e0f001 CR4: 00000000003606e0
<4>[  284.776438] Call Trace:
<4>[  284.776442]  <IRQ>
<4>[  284.776447]  ? tasklet_hi_action+0x71/0x120
<4>[  284.776454]  ? __this_cpu_preempt_check+0x13/0x20
<4>[  284.776461]  tasklet_hi_action+0x98/0x120
<4>[  284.776467]  __do_softirq+0xc0/0x4ae
<4>[  284.776473]  irq_exit+0xae/0xc0
<4>[  284.776478]  smp_apic_timer_interrupt+0x9e/0x2e0
<4>[  284.776484]  apic_timer_interrupt+0x9a/0xa0
<4>[  284.776489]  </IRQ>
<4>[  284.776493] RIP: 0010:cpuidle_enter_state+0x136/0x370
<4>[  284.776498] RSP: 0018:ffffc900000afe80 EFLAGS: 00000216 ORIG_RAX: ffffffffffffff10
<4>[  284.776507] RAX: ffff8802652f50c0 RBX: 0000000000020f9c RCX: 0000000000000001
<4>[  284.776514] RDX: 0000000000000000 RSI: ffffffff81d0d974 RDI: ffffffff81cc18d6
<4>[  284.776521] RBP: ffffc900000afeb8 R08: 000000000000071d R09: 0000000000000018
<4>[  284.776528] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
<4>[  284.776536] R13: 0000000000000002 R14: ffff88026ed25430 R15: 000000424df9c186
<4>[  284.776546]  cpuidle_enter+0x17/0x20
<4>[  284.776551]  call_cpuidle+0x23/0x40
<4>[  284.776557]  do_idle+0x192/0x1e0
<4>[  284.776563]  cpu_startup_entry+0x1d/0x20
<4>[  284.776568]  start_secondary+0x11c/0x140
<4>[  284.776574]  secondary_startup_64+0xa5/0xa5
<4>[  284.776581] Code: 4d c8 05 a0 03 00 00 48 03 81 a8 0b 00 00 44 8b 28 44 89 eb 41 c1 ed 08 41 83 e5 07 83 e3 07 45 89 af 84 03 00 00 e9 af fd ff ff <0f> 0b 0f 0b 41 80 bf 68 03 00 00 00 4c 8b 65 c8 74 1e 41 8b b7 
<1>[  284.776664] RIP: intel_lrc_irq_handler+0x304/0x8a0 [i915] RSP: ffff88026ed03ec0
<4>[  284.776673] ---[ end trace 75789ff91a01554d ]---

Comment 23 Marta Löfstedt 2017-10-20 06:09:15 UTC

CI_DRM_3267 fi-skl-6700hq

<14>[  384.567935] [IGT] kms_flip: exiting, ret=0
<4>[  384.570958] ------------[ cut here ]------------
<2>[  384.570961] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:782!

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3267/fi-skl-6700hq/igt@kms_flip@basic-flip-vs-wf_vblank.html

Comment 24 Marta Löfstedt 2017-10-20 06:27:09 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3268/shard-glkb4/igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-indfb-draw-pwrite.html

Comment 25 Marta Löfstedt 2017-10-20 06:28:04 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3268/shard-glkb5/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-indfb-msflip-blt.html

Comment 26 Marta Löfstedt 2017-10-20 06:29:34 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3266/shard-glkb4/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt.html

Comment 27 Marta Löfstedt 2017-10-20 10:51:38 UTC

BAT-machine fi-skl-6700hq hit this while running: igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:

kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:782!

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3269/fi-skl-6700hq/igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence.html

Comment 28 Chris Wilson 2017-10-20 15:17:46 UTC

(In reply to Marta Löfstedt from comment #27)
> BAT-machine fi-skl-6700hq hit this while running:
> igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
> 
> kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:782!
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3269/fi-skl-6700hq/
> igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence.html

That's not connected to this bug. This bug is very specifically about the hw reporting COMPLETED | PREEMPTED, and not about the interrupt being received after we have finished parsing the CSB after parking the engines.

Comment 29 Marta Löfstedt 2017-10-23 07:27:11 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3271/fi-cnl-y/igt@gem_exec_flush@basic-batch-kernel-default-uc.html

Comment 30 Marta Löfstedt 2017-10-23 07:52:11 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3273/shard-glkb4/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc.html

Comment 31 Marta Löfstedt 2017-10-23 07:53:26 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3271/shard-glkb1/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-indfb-pgflip-blt.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3271/shard-glkb1/dmesg-1508538544_Oops_1.log

Comment 32 Marta Löfstedt 2017-10-23 07:55:05 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3273/shard-glkb3/dmesg-1508602166_Oops_1.log
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3273/shard-glkb3/igt@kms_flip@blocking-wf_vblank.html

Comment 33 Marta Löfstedt 2017-10-23 08:07:52 UTC

The: "kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:782!" once are now handled in bug 103410.

Comment 34 Marta Löfstedt 2017-10-23 10:25:39 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3274/shard-glkb4/igt@kms_flip@flip-vs-modeset-interruptible.html

<2>[  322.306569] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:879!

Comment 35 Marta Löfstedt 2017-10-23 10:26:48 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3274/shard-glkb3/igt@kms_busy@extended-modeset-hang-oldfb-render-C.html

<2>[ 3379.405906] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:879!

Comment 36 Marta Löfstedt 2017-10-23 13:03:57 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3275/shard-glkb2/igt@kms_atomic@plane_cursor_legacy.html

41.938315] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:879!

Comment 37 Marta Löfstedt 2017-10-23 13:05:06 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3275/shard-apl2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-mmap-cpu.html

<2>[ 1325.066224] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:879!

Comment 38 Marta Löfstedt 2017-10-23 13:05:46 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3275/shard-glkb2/igt@gem_exec_create@forked.html

<2>[   41.938315] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:879!

Comment 39 Chris Wilson 2017-10-23 14:56:14 UTC

Talking point:

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6b10a01dd371..f5be85a01f83 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -905,6 +905,8 @@ static void intel_lrc_irq_handler(unsigned long data)
                        /* After the final element, the hw should be idle */
                        GEM_BUG_ON(port_count(port) == 0 &&
                                   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
+                       if (status & GEN8_CTX_STATUS_ACTIVE_IDLE)
+                               mdelay(2);
                }
 
                if (head != execlists->csb_head) {

prevents the COMPLETED | PREEMPT occurring after the COMPLETED | ACTIVE_IDLE. (Or just significantly increased the mtbf to mask the issue.)

Comment 40 Jani Saarinen 2017-10-24 14:00:20 UTC

Raised priority

Comment 41 Chris Wilson 2017-10-24 15:39:48 UTC

More complete would be

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index eeb3622803a8..1df66aeca3ba 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -599,7 +599,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
                         * the driver is unable to keep up the supply of new
                         * work).
                         */
-                       if (port_count(&port[1]))
+                       if (port_count(&port[0]))
                                goto unlock;
 
                        /* WaIdleLiteRestore:bdw,skl

Needs a w/a assigned.

Comment 42 Chris Wilson 2017-10-24 15:43:49 UTC

Reminds me of

commit 70962fbe5c75e785d250c04db4d01c18b7316c13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 23 13:05:56 2017 +0000

    drm/i915: Remove disable_lite_restore_wa
    
    This w/a (WaEnableForceRestoreInCtxtDescForVCS) was only used for
    preproduction hw, which is no longer in use.  Remove the workaround to
    simplify the code.

but that is only mentioned for VCS, whereas we see the PREEMPT | COMPLETED on, at least, bcs.

Comment 43 Marta Löfstedt 2017-10-25 06:44:55 UTC

<2>[  366.551741] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:888!

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3279/shard-glkb5/igt@kms_flip_event_leak.html

Comment 44 Marta Löfstedt 2017-10-26 05:47:33 UTC

CI_DRM_3283 fi-bxt-j4205 igt@kms_busy@basic-flip-c

<2>[  364.185707] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:888!

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3283/fi-bxt-j4205/igt@kms_busy@basic-flip-c.html

Comment 45 Marta Löfstedt 2017-10-26 06:10:56 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3285/shard-glkb1/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-move.html

<2>[ 1515.047901] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:888!

Comment 46 Marta Löfstedt 2017-10-26 07:19:49 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3283/shard-apl6/igt@kms_color@ctm-0-5-pipe1.html

<2>[   77.455586] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:888!

Comment 47 Marta Löfstedt 2017-10-27 07:48:21 UTC

CI_DRM_3288 shard-glkb1 igt@kms_flip@flip-vs-panning-vs-hang-interruptible

<2>[  261.125313] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3288/shard-glkb1/dmesg-1509061688_Panic_2.log
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3288/shard-glkb1/igt@kms_flip@flip-vs-panning-vs-hang-interruptible.html

Note new line again!
GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);

Comment 48 Marta Löfstedt 2017-10-27 07:52:08 UTC

CI_DRM_3288 shard-glkb1 igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-draw-pwrite

<2>[  904.259553] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3288/shard-glkb1/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-draw-pwrite.html

Comment 49 Marta Löfstedt 2017-10-30 07:38:50 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3292/shard-glkb2/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-pri-shrfb-draw-mmap-cpu.html

Comment 50 Marta Löfstedt 2017-10-30 07:38:59 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3292/shard-glkb3/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-cpu.html

Comment 51 Marta Löfstedt 2017-10-30 12:35:04 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/shard-glkb1/igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions.html

<2>[ 4502.680683] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

Comment 52 Marta Löfstedt 2017-10-31 12:29:07 UTC


<2>[ 2776.149999] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3300/shard-glkb4/igt@kms_flip@flip-vs-panning-interruptible.html

Comment 53 Marta Löfstedt 2017-10-31 12:32:12 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3300/shard-glkb2/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-onoff.html

<2>[ 1043.960206] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

Comment 54 Marta Löfstedt 2017-11-01 07:22:41 UTC

CI_DRM_3304 fi-glk-1 igt@pm_backlight@basic-brightness

<2>[  523.218649] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3304/fi-glk-1/igt@pm_backlight@basic-brightness.html

Comment 55 Marta Löfstedt 2017-11-01 07:24:52 UTC

CI_DRM_3302 shard-apl8 igt@kms_rotation_crc@bad-tiling

<2>[  703.522499] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3302/shard-apl8/igt@kms_rotation_crc@bad-tiling.html

Comment 56 Marta Löfstedt 2017-11-01 07:26:26 UTC

CI_DRM_3301 shard-glkb4 igt@kms_busy@extended-modeset-hang-newfb-render-C

<2>[ 3461.438418] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3301/shard-glkb4/igt@kms_busy@extended-modeset-hang-newfb-render-C.html

Comment 57 Marta Löfstedt 2017-11-02 06:58:47 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-glkb3/igt@kms_flip@wf_vblank-interruptible.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-glkb3/dmesg-1509580929_Panic_2.log
kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

Comment 58 Marta Löfstedt 2017-11-03 12:34:08 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3311/shard-apl4/igt@kms_busy@extended-modeset-hang-newfb-render-B.html

<2>[   75.821455] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

Comment 59 Marta Löfstedt 2017-11-06 07:00:04 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3313/shard-glkb2/igt@gem_exec_schedule@smoketest-blt.html

<2>[ 1099.391030] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

Comment 60 Marta Löfstedt 2017-11-07 07:19:17 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3317/shard-glkb3/igt@kms_flip@flip-vs-rmfb-interruptible.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3317/shard-glkb3/dmesg-1509998109_Panic_2.log

kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

Comment 61 Marta Löfstedt 2017-11-07 13:44:33 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3318/shard-glkb2/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-shrfb-draw-mmap-gtt.html

<2>[ 1839.277970] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

Comment 62 Marta Löfstedt 2017-11-07 13:45:27 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3318/shard-glkb1/igt@kms_cursor_legacy@cursora-vs-flipa-atomic-transitions.html

<2>[   90.044653] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

Comment 63 Marta Löfstedt 2017-11-08 07:04:02 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3319/shard-glkb4/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-indfb-msflip-blt.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3319/shard-glkb4/dmesg-1510073961_Panic_2.log
<2>[ 3148.546706] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

Comment 64 Marta Löfstedt 2017-11-08 07:05:33 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3319/shard-glkb2/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-cur-indfb-draw-pwrite.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3319/shard-glkb2/dmesg-1510073893_Oops_1.log
<2>[  704.851043] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:891!

Comment 65 Marta Löfstedt 2017-11-14 11:00:49 UTC

Also,
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_479/shard-apl4/igt@kms_flip_event_leak.html

Comment 66 Jani Saarinen 2017-11-17 06:59:21 UTC

Reference from Chris: https://patchwork.freedesktop.org/series/33965/

Comment 67 Jani Saarinen 2017-11-20 10:03:47 UTC

another reference: https://patchwork.freedesktop.org/series/34081/

Comment 68 Chris Wilson 2017-11-20 17:04:02 UTC

commit ba74cb10c775c839f6e1d0fabd1e772eabd9c43f
Author: Michel Thierry <michel.thierry@intel.com>
Date:   Mon Nov 20 12:34:58 2017 +0000

    drm/i915/execlists: Delay writing to ELSP until HW has processed the previous write
    
    The hardware needs some time to process the information received in the
    ExecList Submission Port, and expects us to not write anything more until
    it has 'acknowledged' this new submission by sending an IDLE_ACTIVE or
    PREEMPTED CSB event.
    
    If we do not follow this, the driver could write new data into the ELSP
    before HW had finishing fetching the previous one, putting us in
    'undefined behaviour' space.
    
    This seems to be the problem causing the spurious PREEMPTED & COMPLETE
    events after a COMPLETE like the one below:
    
    [] vcs0: sw rd pointer = 2, hw wr pointer = 0, current 'head' = 3.
    [] vcs0:  Execlist CSB[0]: 0x00000018 _ 0x00000007
    [] vcs0:  Execlist CSB[1]: 0x00000001 _ 0x00000000
    [] vcs0:  Execlist CSB[2]: 0x00000018 _ 0x00000007  <<< COMPLETE
    [] vcs0:  Execlist CSB[3]: 0x00000012 _ 0x00000007  <<< PREEMPTED & COMPLETE
    [] vcs0:  Execlist CSB[4]: 0x00008002 _ 0x00000006
    [] vcs0:  Execlist CSB[5]: 0x00000014 _ 0x00000006
    
    The ELSP writes that lead to this CSB sequence show that the HW hadn't
    started executing the previous execlist (the one with only ctx 0x6) by the
    time the new one was submitted; this is a bit more clear in the data
    show in the EXECLIST_STATUS register at the time of the ELSP write.
    
    [] vcs0: ELSP[0] = 0x0_0        [execlist1] - status_reg = 0x0_302
    [] vcs0: ELSP[1] = 0x6_fedb2119 [execlist0] - status_reg = 0x0_8302
    
    [] vcs0: ELSP[2] = 0x7_fedaf119 [execlist1] - status_reg = 0x0_8308
    [] vcs0: ELSP[3] = 0x6_fedb2119 [execlist0] - status_reg = 0x7_8308
    
    Note that having to wait for this ack does not disable lite-restores,
    although it may reduce their numbers.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102035
    Signed-off-by: Michel Thierry <michel.thierry@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/<20171118003038.7935-1-michel.thierry@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-4-chris@chris-wilson.co.uk
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Comment 69 Marta Löfstedt 2017-11-21 06:53:55 UTC

The patch was integrated to CI_DRM_3364. Let's close and archive this issue!

Comment 70 Chris Wilson 2018-02-07 12:31:33 UTC

*** Bug 102393 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.